Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Firefox now only has one HTML parser (blog.mozilla.org)
200 points by AndrewDucker on Aug 13, 2013 | hide | past | favorite | 57 comments


Does anyone know why about:blank is so magical that can't use the new HTML parser? Can it not be parsed like other pages?


Here's Henri Sivonen (the author of the HTML5 parser that is now Firefox' sole parser) on the subject: http://hsivonen.iki.fi/about-blank/.

From a quick reading, it seems to be a mix of synchronous vs asynchronous parsing, historical reasons, and stuff. He identifies eight different behaviors.


The Joy of about:blank: http://hsivonen.iki.fi/about-blank/


Thanks for the link unwind/paulrouget2.

I think it went a little over my head. What role does about:blank actually play? I'm assuming that the wild behaviour and parsing difficulty is the result of it performing some special function (beyond just returning a blank page).


A browsing context doesn't start its life empty. Instead, when a browsing context is created, if a JS program looks at what's in there, there's an about:blank doc in there. Since you can create a browsing context synchronously (e.g. document.body.appendChild(document.createElement("iframe"))), there has to be a way for the initial about:blank document to materialize synchronously. The HTML parser is always async. (Edit: The HTML parser is always async when loading a URL. Then there are innerHTML, createContextualFragment and DOMParser, which are synchronous.)

Add various events (readystatechange, DOMContentLoaded, load) to added fun. And the fact that browsing contexts that are top-level from the Web perspective are iframe-like from the XUL perspective and the code for dealing with this duality is a mess.


Really this seems incredibly strange.


Typical Mozilla nonsense


Anyone know what the status on multi-threading is? It looks like there are two potential solutions to the problem:

1. Servo--http://www.webmonkey.com/2013/04/mozillas-servo/

2. or, Electrolysis--http://www.internetnews.com/blog/skerner/mozilla-set-to-revi...

Also see https://bugzilla.mozilla.org/show_bug.cgi?id=392073


Servo is a research project, not a project to bring multiprocess support to Gecko. (Presumably you mean multiprocess support, not multithreading—Gecko is already highly multithreaded.)

In any case, Gecko is multiprocess already, and it's fully deployed on Firefox OS (and, as of recently, sandboxed). Firefox is not multiprocess because (generally) the front end has not yet been ported, and the project to do that is called Electrolysis.


Both are under implementation. Servo is nowhere near ready yet as would be expected for a totally new engine (although it is making progress [1]). Electrolysis is in for Firefox Mobile / Firefox OS and is in progress for desktop[2].

[1] https://twitter.com/metajack/status/364571230331875331

[2] https://wiki.mozilla.org/Electrolysis



On the topic of the HTML parser, you may be interested in https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_...



That's single threaded compositing on its own thread.


Which is a step in the right direction. After that they can make it multithreaded. Right now it's all too tied together.


In case anyone else wants to have a look at that (partially) generated C++ code, here's the online source: http://mxr.mozilla.org/mozilla-central/source/parser/html/


This should serve as an example to those that believe "the code is the comments" (regarding the very first point in this article).

Comments are important in non trivial applications. Please stop thinking they are not.


I believe that good code is the comments. From the sounds of it, this code isn't good in that way. Plus it had comments.


I agree that good code self-documents what the code is doing. Good comments document what the code should be doing.

It's important to have both pieces of information in the same place, to minimize the overhead of fixing some subtlety someone might incidentally notice as a side effect of glancing at the code while working on related code.

I can't count the number of times I've run into a complex bit of poorly commented code that looked like it mishandled a subtle corner case, politely emailed the author(s) asking what the intended logic is before claiming I found a bug, gotten the "read the code, dude" response, come back to them with "is the intention really to <insert description of corner case behavior>", and gotten back "my bad, broseph".

There have also been a few times that I've incidentally noticed something looked like it didn't handle a corner case properly in poorly commented code (but not so wrong to obviously be a bug), but failed to follow up with the author due to time pressure on the things I was supposed to be working on, and later had that corner case behavior bite us.


Good code documents the what, but it does not document the rationale.

"Why this way and not that way" is often important, and a comment to that effect can save future maintainers a lot of going down blind alleys.


"HTML5 parser [...] automatic translation from Java to C++"

Do you reuse code from Rhino? (Mozilla's Java based javascript engine)

Why not convert the code from Java to C++ once, and then maintain the C++ code?


Rhino code is not involved.

The portable core of the HTML parser is maintained as Java. However, the translation is not done during the Firefox build process. Instead, the translation is triggered manually when the Java code changes and the output of the translation is committed to the Firefox source repository.

(The Java code is committed to the Firefox source repository, too, to make license compliance easier for downstream projects that opt to distribute the whole app under the GPL. The Java code is the preferred form for making modifications for GPL purposes.)

Edit: As for why not maintain C++ separately, that would mean doing maintenance twice: once for Java and once for C++. The parsing algorithm still changes from time to time. Support for the template element was added. Spec bugs related to how nested HTML inside SVG and MathML work were fixed. I expect a subtle spec change to how correctly deeply nested phrase-level formatting tags are handled in the future, since the part of the spec that handles misnesting breaks correct nesting. Oops. (But it's great that browser are now so spec-compliant that you can see that it's a spec bug, because Firefox, Chrome and IE10 all show the same weirdness.)


I already checked the code and I know from my CS courses that's it's easier to code and unit-test a language lexer/parser in Java than in C++ or C.

Was that the main reason to code it in Java?


The reason why the parser was written in Java in the first place was that it was written for the Validator.nu HTML validator. The validator was written in Java, because Java had the best set of Unicode-correct libraries for the purpose and a common API (SAX) for connecting the libraries.

Even though testing wasn't the original motivation for writing in Java, it's much nicer to unit test the parser as its isolated Java manifestation than as its Gecko-integrated C++ manifestation.


Memory Improvement? Performance Improvement? Code Base Reduction? Binary Size Reduction?

Or it will be so small that doesn't matter?


There won't be much of a runtime effect as most of the ripped-out code was unused anyway. In a stripped binary (32-bit ARM), I saw 40-50kb worth of codesize reduction, which isn't nothing (especially on mobile) but not a game-changing win.

The biggest advantage was really just getting rid of a bunch of unused and unmaintained code.


40-50kb is pretty much nothing, even for mobile


What will be the affects with one parser??..any performance improvements??


IMHO, Firefox has more and more trouble to renew itself.

This browser was great when it come out, fast, reliable and virus-free. I remember the good old time of IE when you have to wonder forever whenever if you'll click on a suspicious link or not. However, I have the feeling since chrome come out, they don't innovate anymore. Extensions are slow and unreliable, a lot of stuff feels like copy/paste from Opera or Chrome, Firebug, once the greatest, seems outdated, they have taken forever to support retina display on the mac, their engine crash or freeze in JS heavy environment... and they removed the blink tag.

My 5 cents.


Oh no, not the BLINK tag!!! Fiends, criminals, collaborators with the devil! /sarcasm

While I share some sentiments in regards to their UI, Mozilla has done nothing but good moves so far - supporting PDF.js, Shumway, Servo, Rust, Mozilla Persona (login), Jetpack, etc. Firefox has performed admirably and I use it as a main browser.


With improvements to the garbage collector [1], the developer tools and the recent emphasis on performance [2], I think Firefox has really benefitted from the competition.

[1]: https://blog.mozilla.org/javascript/2013/07/18/clawing-our-w...

[2]: http://arewefastyet.com/


Your conventional wisdom is about two years out of date.


I've reacted because yesterday we have tried better_errors gem on our rails app. Safari and Chrome was fine but Firefox froze and crash while executing the JS.

Maybe I am wrong, but I keep using side by side with Chrome and Safari (Nerds has a lot of tabs and windows open!) and that's my feeling.


You really ought to log a bug about that.


I recently tried switching back to FF from Chrome and I find FF to lag horribly. Downloads are frequently broken and stall. The <filename>, <filename.part> thing is ridiculous. It feels like there is a molasses powered queue between input events and response, FF feels like a drunken master. I am in Chrome right now, I wish I was in FF.


The <filename>, <filename.part> thing is ridiculous.

Yep. Luckily it’s being fixed as we speak: https://bugzilla.mozilla.org/show_bug.cgi?id=420355


It's actually quiet nice to be able to rename the file and wget -c it.

I don't know if .crdownload has a header, but it doesn't seem to work the same way.


Broken / stalled downloads don't sound like a Firefox problem to me. If you can reliably get a DL to fail in Firefox (that works fine in another browser) and reproduce it, file a bug :-)


I can't get things to reliably fail, admittedly the connections I am on are flaky. I have been traveling around the world for the last 8 months. From direct experience I know that Chrome handles flakey connections much better than FF. I routinely get downloads that are completely stuck in FF, showing 20KB/S as a download speed but the file hasn't been touched in 5 minutes and nothing gets written to disk and the activity monitor shows 0KB/s of network traffic. I would open a bug, but I would have to write a server that would expose the problem, no decent repro case and it wouldn't get fixed.

----

edit, Many times I have to pause a download and then resume to get the dl unstuck.


I refuse to install Chrome, but do install the pure open-source Chromium. I routinely get that behavior with Chrom[e|ium], which leaves files with a crx extension littered throughout my Downloads directory.

But to be fair from both our ends, neither is that useful individual anecdote(s) != empiricism.


Yeah, I have been toying with how to write a server to automated test cases to repro these bugs. Almost all issues I have with browsers is how they behave on slow connections. On the very fast and reliable connections we have in the states, many of the issues are not present.


That's strange the only browser I got problems with downloading files is Chrome. Sometimes it takes a long time between clicking the download and seeing any visual clue from chrome that it started the download.


The main problem is not any of these things. The many problem is that they still have not implemented one-process-per tab which makes the browser severely inferior to Chrome.


I do not consider Firefox inferior to Chrome because of this, instead I consider Firefox superior to Chrome for this very reason. With 6 tabs open, Chromium is using 1.6GB of RAM, while I can browse forever in Firefox until it reaches that kind of memory consumption.

Am I missing something? Why is the one-process-per-tab model considered "severely" superior?


> Am I missing something? Why is the one-process-per-tab model considered "severely" superior?

I'm not sure process-per-tab is "severely" better. However, how are you measuring that 1.6 GB? If you're not careful, you're N-counting the copy-on-write pages that the processes are sharing. Even chrome://memory-redirect has a note that Chrome itself has a bug with N-counting its RAM usage across tabs. (Issue 25454) The RAM overhead of process-per-tab is more than the fanbois will tell you, but it's much less than you see by adding up the sizes you seen in top/taskmgr, and is less than seen in chrome://memory-redirect.

Sandboxing for security and fault tolerance is a big deal; it's certainly worth the difficulty in figuring out how much RAM is actually being used by N tabs, if that's your main complaint.

Edit: I'm notorious for browsing with 30+ tabs open. I currently have 48 tabs open, and I'm not noticing any ill effects from high RAM usage. I really wish there were a keyboard shortcut for pushing the address of a link onto a temporary bookmarks stack, and another shortcut for popping a bookmark and opening a tab.


> Sandboxing for security and fault tolerance is a big deal; it's certainly worth the difficulty in figuring out how much RAM is actually being used by N tabs, if that's your main complaint.

I use Firefox with tab groups and lazy tab loading and have probably over a thousand tabs in all the tab groups I use. This is very useful for me doing research and I haven't yet found a better workflow. I don't believe I would be able to achieve this with Chrome with any reasonable RAM usage.

> I really wish there were a keyboard shortcut for pushing the address of a link onto a temporary bookmarks stack, and another shortcut for popping a bookmark and opening a tab.

I use the Firefox addon "Save-to-Read" for this purpose and works great.


One tab can freeze (because of the js, for example) without killing the rest of the browser.


I don't remember when this has last happened to me. If this is the major reason, I, most definitely, prefer the current Firefox approach: much less memory utilization and the chance that I have to restart the browser once in a blue moon (not a problem for me in real use) vs very high memory usage but I'm protected in the off chance that one tab messes up my browser (and I always have to live with the high memory usage).

So FF devs, if you're listening, this is a vote against Electrolysis (if it means higher memory usage than current FF).


I seem to run into that pretty frequently on Firefox 19 at work, but I don't think I ever notice it when I'm running 23 at home.


I've been on the Aurora channel for some time now and it's been a pretty smooth ride for me. You should give it a try to see how it works out for you.


I still can't understand how a freezing js could stop browser, can't they handle that stuff on a different thread other than browser's main ui thread?


With plugins already in a different process, these situations are far less likely.


But you only get 15 tabs.


I'll be the first to admit that Chrome does have a pretty big memory footprint, I have not compared it to Firefox recently, but I assume Firefox is better. How ever as others have said using Chrome with over 40 tabs open still works well.

My macbook has 8 GB of RAM and my stationary has 12 GB so in my case the memory footprint is far less important than having tabs separated in processes.


If Firefox implemented one process per tab tomorrow, would you switch from Chrome to Firefox? I suspect this particular implementation detail is not a "main problem" for the majority of users comparing Chrome and Firefox.


Actually I probably would consider it. I like and trust mozilla as a company more than Google. Also Firefox has some extensions I would like to use such as No Script. I would probably still use Chrome for development hough.


Good job their extension system allows you to easily restore the blink tag to your browsing, should you wish ;)

https://addons.mozilla.org/en-US/firefox/addon/restore-blink...

Disclaimer: I created the above extension




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: