A large crash spike affecting Firefox users on Linux

Tade0 · on June 24, 2023

> It is interesting though that we find ourselves working around a bug we did not introduce triggered by code we do not control.

I used to be part of a team developing a popular browser WYSIWYG editor. Every release of any of the supported browsers was a coin toss regarding introducing new bugs.

From this perspective developing for the still supported back then IE8 was easier, because there was no chance for it to ever change.

hulitu · on June 24, 2023

Yes. But. We. Need. New. Features. Every. F...ing. Week.

wkat4242 · on June 24, 2023

Well... I was damn happy with 114.0 when it FINALLY introduced Webauthn support on Linux and Mac. That was a really long wait.

Some features really are worth it.

The problem is more that they prioritise the glossy fluff ones. Like time-limited "inspirational" colour schemes.

webstrand · on June 24, 2023

I'm confused. I have been using webauthn with a USB authenticator for at least a year now with Firefox. How is this a new feature?

wkat4242 · on June 24, 2023

What's new is full CTAP2 support in FIDO2. Passwordless with PIN code. This never worked at all.

2FA (FIDO1) has worked for a while yes. But it still requires a username and password and the token is only used for 2FA. But this is not what webauthn is. It's only a small subset that existed under the FIDO name before webauthn was designed and was basically grandfathered in. But it's not really what webauthn is about.

In full passwordless mode you insert the token, enter its pincode and touch it to login. No username nor password needed. It's a bit like a bank card.

Not many sites support this method, for example Paypal only supports normal old FIDO1 2FA (and only one token which is ridiculous). But this support is also needed to finally enable full passkeys in the future. This support is also needed to finally enable full passkeys in the future (I believe 1 or 2 things for that are still coming in a near-future version).

anonymouskimmer · on June 24, 2023

What operating system? The 114.0 (June 6th 2023) release notes say:

https://www.mozilla.org/en-US/firefox/114.0/releasenotes/

> Users on macOS, Linux, and Windows 7 can now use FIDO2 / WebAuthn authenticators over USB. Some advanced features, such as fully passwordless logins, require a PIN to be set on the authenticator.

dathinab · on June 24, 2023

Linux,

Webauthen was supported for quite some while, _but only a subset of it_.

anonymouskimmer · on June 24, 2023

Parent said they've been using it with USB for a year now. While Firefox says it's just now available to use with a USB for macOS, Linux, and Windows 7. Knowing nothing else I'm assuming an operating system other than these three (such as Windows 8+).

wkat4242 · on June 24, 2023

The problem wasn't the connection method. It was the type of authentication.

With CTAP you can insert the stick, enter its pincode, touch it and you're logged in. It replaces even the username.. This never worked.

What did work was using the token as a 2FA token only (FIDO1 method). But that doesn't replace passwords.

dathinab · on June 25, 2023

Yes I have been using it with USB for well over a year, too.

But I also only have been using a subset of the Webauthn standard, specifically the subset generally used when using it for the 2nd factor in 2FA.

But the standard provides other usage methods, too. E.g. like using it as main factor + a PIN. And this methods had not yet been supported in the past.

vanous · on June 24, 2023

Me too. Here is my write up about it...

https://codeberg.org/vanous/YubiKey_On_Linux

remram · on June 24, 2023

It was when it came out, which is the event GP recalls.

derkades · on June 24, 2023

Firefox 114 was just released

semiquaver · on June 24, 2023

~64 KiB of stack ought to be enough for anybody.

https://github.com/torvalds/linux/blob/84df9525b0c27f3ebc2eb...

mike_hock · on June 24, 2023

Yes, it ought to be. I can hardly see that as a "bug" in Linux. Clearly, allocating 20k stack variables can't be right and the bug is on Google's side.

The stack isn't for bulk storage, I thought that was common knowledge.

veave · on June 24, 2023

If a javascript program crashes the interpreter that is a bug in the interpreter. I don't understand how this is even a discussion. In the era of IE6 there were lots of one-liners that could crash the browser and no one would blame whoever wrote them.

mike_hock · on June 24, 2023

In the interpreter, yes, not the kernel that enforces a reasonable stack limit.

realitythreek · on June 24, 2023

It had already been fixed in the kernel. Linus tends to agree with it being a bug.

AshamedCaptain · on June 24, 2023

Its even worse, since they are not increasing the stack pointer and thus firefox is violating some redzone abi somewhere.

asveikau · on June 24, 2023

I seem to recall long ago there was an exploitable bug due to excessive stack use.

Basically the way a kernel detects stack growth is via a guard page that causes a fault on memory access. If the allocation is bigger than the offset to the guard page, and the start of the allocation is accessed before the end, you get this. An attacker might even be able to generate a non-faulting memory access pointing at somebody else's buffer.

I think it's on the language implementation to understand what that offset might be and generate benign memory access on large stack allocations, allowing the kernel to fault and intervene. Consider it part of the ABI where you're running.

leodag · on June 24, 2023

Wasn't that because of Variable-Length Arrays? I remember a while back there was a movement to remove all of them from the kernel.

asveikau · on June 24, 2023

Perhaps. I think there are other reasons to avoid VLAs. I've heard the code generated for them kinda sucks.

But IMO a compiler should generate extra benign reads on a large stack allocation. You mention avoiding VLAs in the kernel, but even user mode code has this problem.

Certainly a browser JITing random JS from the internet should be able to work around such a problem.

remram · on June 24, 2023

Or maybe don't allocate 20kB objects on the stack.

bibanez · on June 24, 2023

The article mentions this was fixed in Kernel 4.20

kramerger · on June 24, 2023

Kind of off topic:

How can we get Mastodon links like this to open in the Mastodon app?

On android this is decided by the url (apps can request certain urls to be forwarded to them) but since there are many different servers in fediverse that becomes impractical

shrx · on June 24, 2023

I'm using Link Eye [1] which lets you open any url in the application of your choice. It's fantastic, also enabling you to open youtube links in NewPipe.

[1] https://f-droid.org/packages/kuesji.link_eye.fdroid/

Aachen · on June 24, 2023

Youtube links open in newpipe also without that app

shrx · on June 24, 2023

Hmm, it could be dependent on which app you're opening the links from. I remember having issues before Link Eye.

tadfisher · on June 24, 2023

Verified Links make it so if the official YouTube app is installed and enabled, other apps won't be launched for the same URL patterns with a simple ACTION_VIEW intent.

Aachen · on June 25, 2023

Didn't know of that one yet. And so the lock-in continues https://android.stackexchange.com/questions/246819/how-to-di...

Thanks for the heads up. I wonder at what android version I'll decide it's no longer worth the breakage to upgrade. I was still happiest with the options of Cyanogenmod 4.4, the only new thing I can remember being happy about since then is "allow permission while using" (and even that is not working well)

jeroenhd · on June 24, 2023

There are ways to do this, i.e. through custom protocol handlers. They could also be used to deal with following accounts on other servers without having to copy/paste all the time.

Mastodon had a custom URI for interacting with it, but it wasn't an opt-in feature and the constant prompt to register the handler annoyed people. That's why the feature and the protocol were removed and they haven't been added back.

I think it's rather silly to first ignore the calls to opt into the protocol popup, then remove it completely because there were too many complaints about the protocol popup, and now refuse to add it back because of "bad UX".

2Gkashmiri · on June 24, 2023

https://ibb.co/v4pw6rw This URL actually resolves in my android fedilab app so don't know if this has hardcodrd instances or uses regex. It shouldn't be difficult to build a regrx, I know of an app that did for peertube

fdgdd · on June 24, 2023

Open With, although it's no longer maintained: https://addons.mozilla.org/en-GB/firefox/addon/open-with/

atoav · on June 24, 2023

I think I had one crash with Firefox on Linux during the past 6 years and that was a weird edge case where I tried to do something during an update process.

bmicraft · on June 24, 2023

The last two times Firefox crashed on me, both of the computers had bad memory

0134340 · on June 24, 2023

Same here. Firefox has always been really stable for me but after moving to Linux I had some crashing which coincided with a new bios update. Turns out my memory timings were a bit too high, it couldn't handle the XMP profile with the new bios.

edflsafoiewq · on June 24, 2023

Same for me until 114, which has been crashing constantly. I can't open deepl.com for example. 114.0.2 fixes several crashes but not mine.

abwizz · on June 24, 2023

same here, rock solid.

not a beta tester thou

tgv · on June 24, 2023

20000 variables in a function? Even for machine generated code that sounds like an exaggeration. Anybody here who knows the reason?

viraptor · on June 24, 2023

It's not exactly variables. The Bugzilla entry is more specific: "... entails copying all the values that are currently on the interpreter's stack (arguments, local variables, intermediate results) from the heap onto the native stack." https://bugzilla.mozilla.org/show_bug.cgi?id=1839139#c8

So it may be a large function with lots of temporary values visible in scope. If I understand correctly, local lambdas would also count and they have their own captured context. I'm sure it's possible to find a pathological-but-not-unreasonable way to reproduce it.

IainIreland · on June 24, 2023

I wrote that comment. One other possibility: if you use spread syntax to call a function with a large array (eg `foo(...Array(20000))`), then all those arguments will be pushed on the stack.

(I didn't dig into the specifics of the Google code because, as weird as it is to have 20000 stack values, we really should be able to handle it. This was, at the end of the day, a bug in our stack probing code.)

idle_zealot · on June 24, 2023

I don't know for sure, but it sounds like the result of a JS obfuscater. I know Google Docs ships obfuscated code so I wouldn't be surprised if Google Image Search does too.

hulitu · on June 24, 2023

When you have 16GB of RAM, why bother ? /s

msla · on June 24, 2023

I use the Firefox from the Ubuntu Mozilla Team's PPA and I haven't seen any crashes at all.

https://launchpad.net/~mozillateam/+archive/ubuntu/ppa

LordShredda · on June 24, 2023

If I make a function with 200 000 stack values and compile it with gcc, people call me an idiot. But when google does it it's fine?

gigel82 · on June 24, 2023

Interestingly, Mozilla openly talks about their telemetry telling them not only about the spike in crashes but apparently also about the specific website and activity users are doing at the time of the crash (Google Image Search).

That's pretty wild, I need to look deeper into how to disable telemetry reporting in Mozilla. I'm pretty sure even Microsoft is sanitizing their crash reports to exclude as much information as possible that could identify the user.

thyrsus · on June 24, 2023

The NYTimes animation of the Canadian fires air quality effect froze Firefox, mouse, and keyboard on CentOS 7 for me. I could ssh in from another system, kill -9 Firefox and it recovered.

XorNot · on June 24, 2023

That's on X windows honestly. It shouldn't be possible for a badly behaved program to bring down the window manager and/or display server.

wkat4242 · on June 24, 2023

If you can do that you should also be able to just open up a virtual terminal with Control-Alt-F1, just saying :)

thyrsus · on June 25, 2023

Nope: I tried that. No response to any common keyboard combo. As suggested in a sibling comment, X11 shares the blame.

arun-mani-j · on June 24, 2023

I use Firefox downloaded from the official website as well as the Flatpak version on my Debian laptop. It crashes frequently, as in every 2 hours or so. I don't do anything heavy on the browser except using it to read documentation. This has been an issue in the last two months and I don't know what's wrong...

Also, does anybody known why Firefox still depends on the deprecated libdbus-glib-1-2 [1] in Debian and based distros?

For example, try to uninstall the package. Then download the latest Firefox from their website [2]. Extract the archive, launch the executable inside it from a terminal. You will see an error message that it is unable to load the DBus library.

1 - https://packages.debian.org/bookworm/libdbus-glib-1-2

2 - https://www.mozilla.org/en-US/firefox/new/

autoexec · on June 24, 2023

I've got Firefox on a widows 10 box for work and ever since whichever ESR update made it just difficult enough to tell which tab is active to be annoying, Firefox crashes for me all the time and it never just takes out a single tab either.

I really blame myself though. It only happens after I've had the browser open for weeks, with at least 3 windows open at once, and literally hundreds of open tabs with hundreds more come and gone since Firefox was restarted. I have about:memory bookmarked, and hitting everything in the Free Memory box seems to delay the inevitable a bit while allowing Javascript seems to make it worse. I doubt many others have the problem and I'm impressed Firefox holds up as well as it does!

dTP90pN · on June 24, 2023

Have you looked at the crash reports via `about:crashes`? This should show you if there's any open bug report associated with the crash(es) you encounter.

Regarding libdbus-glib-1-2, you may want to open a bug. It looks like [1] it's mainly used by the ~12 year old UPowerClient, and more recently, for wake/sleep and timezone change notifications (nsAppShell).

[1] https://searchfox.org/mozilla-central/search?q=DBusG.*&path=...

arun-mani-j · on June 24, 2023

Thanks, I will check it out!

gsatic · on June 24, 2023

Switched to Brave on Linux and Android. It's not bad and even find myself using their Search more than Google quite often.

slig · on June 24, 2023

Brave Search feels like the good, old Google search from mid-00s.

em-bee · on June 24, 2023

and it even works from any browser. as a staunch firefox user i switched to brave search but not brave. very happy with the results. i only wish they would allow linking image and video search to duckduckgo and other search engines, or at least proxy the search instead of linking directly to google and bing

isaacremuant · on June 24, 2023

Same. It's increasingly my default browser in all my devices.

I went from Yahoo to Google and really valued their product but they became so poor overtime. So focused on spiking their results with ads and then, much worse to me, hiding news and other content based on US centric biased narratives that apparently we all need to follow.

It was very apparent and annoying.

rvz · on June 24, 2023

Great choice. Brave seems much more better and does more than what Firefox can do.

Roark66 · on June 24, 2023

Come on now! "a peculiar interpreted code" crashes your interpreter and you blame those that (allegedly) auto generated it? I realise there is a bit of tongue in cheek in there and Firefox is an amazing product, but although it's certainly not normal for a function to declare 20k variables it is not outside the realm of the language. Furthermore, 20k is not that much if you take the amounts of RAM current devices have.

Don't get me wrong. I don't think assigning blame is the most important thing to do when troubleshooting. I'd rather not, but when that process starts it should be factual.

So, sorry, its not Google's fault... Then we also throw Linux under the bus. "it's not our code, it's Linus" here is the code. But, that Linux kernel code that kills a process if it accesses too far from it's stack pointer has the following comment:

"Accessing the stack below %sp is always a bug (...)"

I haven't got time to look at the history how it was changed and why in the Linux kernel and it became "not a bug". If someone knows more, please do explain.

So is it a "bug in Firefox" Or "bug in old Linux"? I can't say with absolute certainty without researching how exactly the stack allocation in old Linux kernel works, how is it documented etc.

So if anything I'd thank Google for exposing the bug ;-)

On a side note, I've recently experienced a similar JS/firefox/web site bug. There is this open-source ecommerce software called shopware. They use symfony (yes, PHP i know...) and the most recent major version simply freezes Firefox when one goes to the admin interface and looses connectivity. Not just freezing one tab, no, freezes entire Firefox, multiple open windows. This is on up to date Arch with a new Linux kernel so it's definitely not this issue,but it does happen in Firefox and not in Chrome.

JavaScript bugs like this are hard to find. I think AI may be one tool that will help us find them faster (intentionally or not).

outwit · on June 24, 2023

It does seem to me that Google is intentionally refusing to test their websites in anything other than Chrome. Even without AI... It can afford to pay a handful of people to test in Firefox and Safari. But will it?

db48x · on June 24, 2023

To be fair this crash is only happening with older kernels. I wouldn’t expect them to test their website in every version of Firefox running on every version of the Linux kernel. Not even Mozilla does that.

jacquesm · on June 24, 2023

No, it's Google's fault. You test for compatibility if you serve 100's of millions of users with all major browsers, including those of your competitors.

Roark66 · on June 24, 2023

If a tab crashed I would be sympathetic to this point of view, but the entire browser is killed. So no, it's not google's fault in my book.

As for testing. Really, you expect them to test against Linux Kernel 4? If so, how about 2 as well?

Just in case you didn't realise. Kernel 4.20 (the one that changes this behaviour so it wouldn't show the Firefox bug) was released in December 2018. That's 4.5 years ago.

rvz · on June 24, 2023

Exactly. That's even worse since the entire browser is crashing. Its not Google's fault if Firefox was unable to prevent a crash or display the page when Safari, Chrome, etc can load and run it without crashing.

Once the user sees this crash constantly, switches to Chrome and it doesn't crash, then they blame the crashing browser which is Firefox. Rightly so.

Really says a lot about these Firefox developers and users immediately blaming Google for their JS code when Firefox was supposed to protect or handle against cases like this without crashing.

db48x · on June 25, 2023

Mozilla isn’t blaming Google here, or they wouldn’t have bothered to fix the bug. When you drill down to the root cause of something, you have to carefully note down all the proximate causes that you found along the way, otherwise you won’t have a complete understanding of the problem. In particular, noticing that Google’s machine–generated javascript is doing something allowed but out of the ordinary is important for explaining why you only just now found out about this bug in code that has been working just fine for years.

db48x · on June 25, 2023

Notice that this only crashes on an out–of–date kernel. No matter how much money Google has, you cannot expect them to test their websites in every combination of browser and kernel version. Not even on a handful of carefully–selected kernel versions. Maybe Mozilla should test Firefox on more kernel versions (an argument could perhaps be made for testing the older kernels from various distros with long–term support, for instance), but really the whole reason Firefox has crash reporting is to catch the really weird combinations that happen in real life but are hard to continuously test for.

barrkel · on June 24, 2023

Compilers generating stack probes for large stack allocations has been a thing for decades. It was required in Windows 95 for 32-bit applications, and you'd do that on a page (4k) granularity.

I am still in the dark as to what the bug was here. Did Firefox stop doing probes for JIT code? Not do them at all, because most JS stack frames are small?

db48x · on June 24, 2023

Firefox was probing ahead in increments of 2048 bytes in order to ensure that the stack was allocated, but it left the stack pointer at the end of the stack for that whole time. This usually worked ok, but certain versions of Linux will bail if the stack probe is greater than 64kb+256b away from the stack pointer. The new code moves the stack pointer incrementally so that the probe is never more than a single page from the stack pointer.

https://phabricator.services.mozilla.com/rMOZILLACENTRAL304d...

BenjiWiebe · on June 24, 2023

Oddly enough, I've had a lot of issues with Firefox on Windows crashing with OOM issues in spite of having GBs of free memory. Multiple different computers, multiple different versions of Windows.

im3w1l · on June 24, 2023

This sounds like blameshifting to me. Javascript is untrusted code, so it's on Firefox to make sure to handle any craziness gracefully.

Stack probing is kind of a weird thing though. I'm kind of surprised C++ compiler isn't doing that properly. Are they using inline assembly for speed?

db48x · on June 24, 2023

It would only be shifting the blame if they weren’t going to work around the problem in Firefox itself.

The C++ compiler may or may not implement stack probing correctly, but this bug is in the Javascript JIT compiler. The fix is to make the JIT compiler update the stack pointer register each time through the loop so that if a page fault happens the kernel won’t see a huge amount of stack get allocated all in one go. Instead it’ll see several page faults each asking for a smaller amount which it is happy to grant.

See https://phabricator.services.mozilla.com/rMOZILLACENTRAL304d...

fathyb · on June 24, 2023

Agreed that the problem here is Firefox, untrusted JS code should not cause a crash.

For the probing, according to the code:

    // Can't push large frames blindly on windows, so we must touch frame memory
    // incrementally, with no more than 4096 - 1 bytes between touches.
    //
    // This is used across all platforms for simplicity.

https://searchfox.org/mozilla-central/rev/c936f47f3a629ae49a...

usr1106 · on June 24, 2023

Technically of course Firefox failed to live with an old kernel and insane JS code.

Morally it's Google. Google is like Bitcoin. A huge natural ressource hog for a questionable benefit. Here the benefit is tracking users to make advertising billions. For that goal a billion of smart phones need several extra GB of memory an significantly larger batteries. What is the ecological footprint of that?

Typing on a seven year old smart phone with 2 GB (SailfishOS, so yes it is maintained. Maybe not perfectly, but better than many Androids half as old). It works quite well on reasonable pages even without add blocking. Of course super heavy pages won't work and for Google search I haven't even consented. Occasionally I get reminded of that when some site embeds it. Well, a good reminder for me not to use them.

fathyb · on June 24, 2023

This does not appear that crazy to me. In fact, today it's almost a recommended practice for large web applications and it's called "tree shaking". ECMAScript modules are inlined into the same scope. It makes the JIT work easier because it now works with symbols instead of `require(foo).bar` calls to speculate on. It makes most web apps run better, both in bandwidth and compute.

It's very likely to have affected users on other websites, but Google is a common denominator for debugging.

My uneducated guess: they're using the Google Closure Compiler to make smaller JavaScript bundles. It saves some bandwidth and allows for better optimizations. It seems like a reasonable engineering decision to ensure product decisions don't affect the user experience too negatively, something a lot of us are familiar with..

Izkata · on June 24, 2023

> In fact, today it's almost a recommended practice for large web applications and it's called "tree shaking". ECMAScript modules are inlined into the same scope. It makes the JIT work easier because it now works with symbols instead of `require(foo).bar` calls to speculate on. It makes most web apps run better, both in bandwidth and compute.

That's bundling, not tree shaking. Tree shaking is an optional additional process during bundling where unreachable code is automatically removed.

fathyb · on June 24, 2023

Bundling does not inline all symbols into a single scope, tree shaking does. I was one of the people implementing tree shaking in the first Parcel bundler. https://github.com/parcel-bundler/parcel/pull/1135

uhryks · on June 24, 2023

Scope hoisting is the technique used there (it's in the title of your link) to simplify tree shaking a.k.a dead code elimination. Also yeah it's unlikely to be what's happening in that bug due to Google using Closure compiler.

usr1106 · on June 24, 2023

Right, optimization is good.

But first building insanely heavy functionality and then optimizing it is not sustainable approach.

The background is of course that increasingly heavy function is moved into the client. Technically and especially economically this makes sense for Google. But the environmental footprint is outsourced to client devices. Greenwashing in a way. Avoidance of computing is a better solution than all types of optimizations. Not everything that is technically feasible is good for the planet and mankind.

veave · on June 24, 2023

Nice rant. Absolutely unrelated to the technical article at hand, but nice rant.

im3w1l · on June 24, 2023

Yeah I got that far but then I wasn't sure what exactly that was doing. Normal C++ doesn't directly move stack pointers around and do manual probing, so I was idly wondering whether those calls hid some inline assembly or how they worked.

Edit: Or perhaps that is not the stack of the C++ program but rather than stack of the Javascript program.

IainIreland · on June 24, 2023

In general, there's no difference between the stack of the C++ program and the stack of the JS program. When SpiderMonkey just-in-time compiles a JS function, the result is native code that creates a stack frame on the same stack as the C++ code that implements SpiderMonkey. JS code and C++ code can be semi-arbitrarily interleaved on the stack: JS calls C++ calls JS calls C++...

The one exception about sharing the same stack is that the first few iterations of a function run in an interpreter implemented in C++, and that interpreter has its own stack that we heap-allocate. This particular bug occurred during the transition from the C++ interpreter to JIT code.

fathyb · on June 24, 2023

It does hide some assembly. Those calls are calling an assembler to generate native code at runtime for JIT compilation. The C++ compiler compiles an assembler, but this assembler runs at runtime. `MacroAssembler` itself is architecture independent, and calls into functions implemented in back-ends such as `MacroAssemblerARM` and `MacroAssemblerX64`.

So the code in this function is not performing the stack-probing, it generates code to perform it instead.

jandrese · on June 24, 2023

I'm curious what Firefox could have done in this case beyond just killing off the tab. The issue was a check in old kernels that was being triggered by some insane code.

This example had me remembering the discussion from earlier today about why modern code is so slow even though the machines are so fast. I doubt there were many Win32 programs that attempted to pass in 20,000 parameters to a function.

pb82 · on June 24, 2023

Using Google Maps causes Firefox (114.0.2) and the whole OS to completely freeze within minutes on my Laptop. This started happening after I upgraded to Fedora 38. Not sure if this is caused by the same bug. Could be Gnome, Wayland, who knows. Has anyone had a similar experience?

db48x · on June 24, 2023

The bug reported here is a crash, not a freeze, so it’s not the same bug.

superkuh · on June 24, 2023

Probably a different issue. Similarly I haven't been able to use Firefox since version 97 without it freezing when using complex applications like Google Maps. Older versions still work but all new (post 97) versions freeze up at the drop of a hat.

classified · on June 24, 2023

They're probably rolling out a new ad engine. Or a bitcoin miner.