> Why do large sites like Facebook, Amazon, Twitter and Instagram all essentially look the same after 10 years but some of them now have 10x the amount of engineers? I think they have so much data and so many dependencies between parts of the system that any fundamental change is extremely hard to pull off. They even cut back on features like API access. But I am pretty sure that most of them have rewritten the whole thing at least 3 times.
I used to work on a Unicorn a few years ago, and this hits close to home. From 2016 to 2020, the pages didn't change one single pixel, however there we had 400 more engineers working on the code and three stack iterations: full-stack PHP, PHP backend + React SSR frontend, Java backend + [redacted] SSR frontend (redacted because only two popular companies use this framework). All were rewrites, and those rewrites were justified because none of them was ever stable, the site was constantly going offline. However each rewrite just added more bloat and failure points. At some point the three of them were running in tandem: PHP for legacy customers, another as main and another on an A/B test. (Yeah, it was a dysfunctional environment and I obviously quit).
I think just common sense and less bullshit rationalisation would have been enough.
They had a billion dollars in cash to burn, so they hired more than they needed. They should have hired as needed, not as requested by Masayoshi Son.
They shouldn't be so dogmatic. Some teams were too overworked, most were underworked (which means over-engineering will ensue), but no mobility was allowed because "ideally teams have N people".
They shouldn't be so dogmatic pt 2. Services were one-per-team, instead of one-per-subject. So yeah, our internal tool for putting balloons and clowns into images lived together with the authentication micro-service, because it's the same team.
Rewriting everything twice without analysis was wrong. The rewrites were because previous versions were "too complex" and too custom-made but newer ones had an even more complex architecture, but "this time it's right, software sometimes need complexity".
Believing that some things were terrible would have gone a long way. Launching the main node.js server locally would take 10 to 20 minutes to launch, while something of the same complexity would often take about 2 or 3 seconds. Of course it would blow up in production! Maybe try to fix instead of ordering another rewrite.
They were good people, I miss the company and still use the product, but it didn't need to be like this.
It comes from a dogmatic reaction against microservices. Microservices were problematic in certain ways, but instead of analysing what went wrong and why, they just went the opposite direction and started doing "big services only". It was a misguided approach, plain and simple.
Interestingly due to internal bureaucracy and understaffing in some teams, there was a lot of "multiple-teams-per-service", which yeah, is another issue in itself.
I used to work on a Unicorn a few years ago, and this hits close to home. From 2016 to 2020, the pages didn't change one single pixel, however there we had 400 more engineers working on the code and three stack iterations: full-stack PHP, PHP backend + React SSR frontend, Java backend + [redacted] SSR frontend (redacted because only two popular companies use this framework). All were rewrites, and those rewrites were justified because none of them was ever stable, the site was constantly going offline. However each rewrite just added more bloat and failure points. At some point the three of them were running in tandem: PHP for legacy customers, another as main and another on an A/B test. (Yeah, it was a dysfunctional environment and I obviously quit).