I love that this project keeps showing how possible it is for a small group to make something amazing. This would be very hard to do in a company with stakeholders.
The project is cool but this post makes me wonder whether this particular approach - starting with something that "does an okay job with well-formed web content" and then trying to work backwards to fix spec and de facto browser behaviour and potential security issues can actually result in a production browser. Which is fine, one can always go back and redo things, especially in a hobby project but it's hard to escape the vague feeling some of this stuff might need to be architected in from the get go.
I don't know. It kind of feels like they are replicating real user (developer) behavior by producing lots and lots of weird, low-quality, and not-to-spec code that a parser will likely have to deal with. By doing so they are simply exposing bugs that real users (bad developers) would have done anyway. Seems like a totally legit way to test a complex product. No assumptions. Just lots of randomized nonsense that shows reality.
As a developer I would love to have a browser that strictly follows specs and doesn’t deal with any historic compatibility issues. I would focus on making sure my web app works best there which _should_ give best compatibility across a wide range of browsers.
I kind of don't buy that argument. The web is not fundamentally different from other programming environments, say Python or Java. It might sometimes be practical to have a python interpreter accept syntactically invalid input because it kinda knows what you mean anyway, but most programming languages don't work that way because it makes things harder in the long run, and the benefits are pretty miniscule.
The problem is that this kind of philosophy is fundamentally incompatible with HTML5.
There was an attempt for a "strict-mode" HTML, it was XML, but it failed (on the web) for various reasons (including IE). HTML5 specifies the exact behavior of what every browser must do upon encountering tag-soup, which is useful because real-world HTML has been tag-soup for a very long time.
I guess the strictest thing you can do is to die upon encountering "validation errors", but I don't think this would help much to simplify your job. (Maybe you can drop the adoption agency?) But now your parser chokes on a lot of websites - likely on hand-written HTML, which has a greater potential for validation errors but also typically simpler layout.
And HTML parsing is still the easy part of writing a browser! Layout is much harder to do, partly because layout is hard, but also because it's under-specified. Implement "undefined behavior" in a way that other browsers don't, and your browser won't work on a lot of pages.
(There have been improvements, but HTML is still miles ahead. e.g. CSS 2 has no automatic table layout algorithm, and AFAICT the CSS 3 version is still "not yet ready for implementation".)
Why would you want a web browser which can't open Facebook, X, or half of the other top websites?
And why would they bother to "fix" their websites when they work fine in Chrome, Edge and Firefox, but not in your very unpopular but super-strict browser?
> The web is not fundamentally different from other programming environments, say Python or Java
To me what makes the web completely different from any programming environment is the very blurry line separating code from data. The very same web page can produce totally different code two hours later just because of a few new articles with links, graphics, media and advertising. The web is that place where data is also code and code is also data; this must come at a price.
I think of the web like I think about Windows. Decades of backwards compatibility. Dubious choices that get dragged along because it is useful for people who can't or won't let go of stuff that works for them. It's a for better or for worse situation.
I'm not talking about the fuzzing but the design approach. As in, can you make a real browser starting with a kind of 'happy path' implementation and then retrofitting it do be a real browser. That part I'm somewhat skeptical of. It's a totally sensible way to learn to make a real browser, no doubt.
"real browser" is doing a lot of work in your comment.
It's not doing nearly as much work as real browsers do!
After all what is a browser other than something that browses? What other characteristics make it "real"?
A real browser is a browser that aspires to be a web browser that can reasonably be used by a (let's say even fairly technical) user to browse the real web. That means handling handling outright adversarial inputs and my point is this is so central to a real browser, it seems it might be hard to retrofit in later.
I gave one example with the null thing, another one would be the section on how the JS API can break the assumptions made by the DOM parser - it similarly sounds like a bug that's really a bug class and a real browser would need a systemic/architecture fix for.
You might as well be describing Safari, Chrome, or Firefox. All are heaping piles of complexity that are tortured into becoming usable somehow. Such is the nature of software. We shoot lightning into rocks and somehow it does useful stuff for us. There's nothing inherently "right" or "wrong" about how we do it. We just do whatever works.
I would say that a "real browser" — which I think is being used here to mean a "production-quality" browser, in contrast to a "toy" browser — would be a robust and efficient browser with a maintainable codebase.
We're well past absurdity on this line of argument.
Given:
A = a goal of just implementing just the latest and most important specs
B = shipping something they want people to use
There is no browser team, Ladybird or otherwise, that is A and not B, or, A and B.
For clarity's sake: Ladybird doesn't claim A.
Let's pretend they do, as I think that'll be hard for people arguing in this thread to accept they don't.
Then, we know they most certainly aren't claiming B. Landing page says it's too unstable to provide builds for. Outside of that, we well-understand it's not "shipping" or intended to be seen as such.
What a weird comment on their progress and being transparent. Better have a demo working and itterate on it right? By your way how one even finish anything?
The spec is so complex at this point, that I'm not sure you can go the other way. It would also force you to implement weird things nobody will ever use before letting people work with a basic page.
I'd love someone to prove me wrong, but I feel like you'd end up with "you can't display a paragraph of basic text, because we're not even done implementing JS interface to conic gradients in HSL space in a fully compliant way".
> it's hard to escape the vague feeling some of this stuff might need to be architected in from the get go.
When I'm developing something, work or otherwise, I find that I often write my worst code when I'm writing something bottom-up i.e. designed, because it usually turns out that the user of that particular code has completely different needs, and the point of integration becomes a point of refactor. I think the top-down approach applied at the project level is much nicer because it allows you to _start from somewhere_ and then iteratively improve things.
That is not to say you shouldn't take precautions. In Ladybird, stuff like image decoding and webpage rendering/JS execution are isolated to their own processes, with OpenBSD style pledge/unveil sandboxing. They aren't perfect of course, but it allows for the kind of development that Ladybird has without much worry about those aspects.
I'm not really suggesting Ladybird is doing something "wrong" or should do something else. Reading something like:
The fix is to make Document::window() return a nullable value, and then handle null in a bajillion places.
makes me think you're going to find something like this and do this kind of fix maybe once, twice, five times and then probably decide you need a more fundamental fix of some sort. Another way of thinking about it is 'What would, say, the Google Chrome team, wish they could do were they starting from scratch?' i.e. aiming for the state of the art, rather than trying to catch up to it later which may turn out to be overwhelming.
I think you're misunderstanding my point, it's not about implementation or spec bugs but design. Forget Ladybird for a moment and think of Firefox. Its core design was something along the lines of 'x-platform toolkit for making enterprise groupware apps' where one of the apps was a web browser. Kind of neat for 1998, by 2008 it was clear that's no longer a good fit for making a browser. Despite heroic efforts and many advances, Firefox has never really been able to close the gap to more recent browsers. And (statistically) nobody makes new browsers based on Firefox, it's effectively a design dead end.
It can be hard to retrofit 'complicated but decent parser with a js runtime attached' to something like 'safe parser of arbitrarily adversarial inputs connected to an open RCE' (i.e. something akin to a modern browser) if the latter wasn't a fundamental design goal to start with.