While that is technically correct, they haven't been running vanilla PHP for quite some time - it's jitted into x86 and run natively on the machine. It's also pretty extensively optimized. And the VM that does the jitting is all C++. Their MySQL deployment is also an internal fork that's been heavily modified for scalability (storage/compute separation, sharding, RocksDB based storage engine instead of InnoDB, Raft for synchronous replication, etc.). Lots of really fantastic engineering work has gone into making their backend infra scale.
Facebook scaled to ~1b daily actives on MySQL + InnoDB. There was lots of engineering work, like schema sharding (denormalization), automation, plenty of bug fixes and patches for MySQL (most or all contributed back to upstream, from what I remember), and of course a massive caching layer; plus throwing crazy hardware at the problem. Nonetheless the underlying engine was something any MySQL user or admin would have recognized. And we backed it all up, every day, in < 24 hours, using an unmodified mysqldump. (FB MySQL team 2009-2012)
(a) make each individual database small, but have a lot of them
(b) There are lots of transactions in flight, but there is a well-ordered sequence of mutations (the binlog) that defines what has and has not been committed. So applying a backup means taking the full backup + replaying the binlogs.
(c) testing can be done by just bringing up a slave from the backup and then comparing consistency with normal replicas.
To expand on this question, I'm wondering how useful daily backups even are for a site like Facebook. I mean, of _course_ you need them, but also, something about reverting all of FB to a state 24 hours ago seems disastrous even if it works. I can't imagine that it's an acceptable thing in anything but an absolute emergency. Imagine every single facebook user got rewound in time to the previous day, every message sent over the past day was lost, etc.
It was a lifetime ago I ever did DB administration (postgres in my case), but the write-ahead-logs being replicated out independently was extremely important for point-in-time recovery, such that you could always take the latest backup, zip forward through the WAL, and recover to any arbitrary point in time you want, so long as the WALs were available. I wonder how much something like this would have been done at FB scale.
What yuliyp wrote is basically it. Although the individual shards weren't really small, even by modern standards.
> Considering all the transactions in flight, and everything?
If I remember, we used --flush-logs --master-data=2 --single-transaction, giving it a consistent point-in-time dump of the schemas, with a recorded starting point for binlog replays, enabling point-in-time and up-to-the-minute restores. Nowadays you have GTIDs so these flags are obsolete (except --single-transaction).
--single-transaction does put extra load on the database—I think it was undo logs? it's been a minute—which caused some hair-pulling, and I believe they eventually moved to xtrabackup, before RocksDB. But binary backups were too space-hungry when I was there, so we made it work.
Another unexpected advantage of mysqldump vs. xtrabackup, besides size, was when a disk error caused silent data corruption on the underlying file system. Mysqldump often read from InnoDB's buffer cache, which still had the good data in it. Or if the bad block was paged back in, it wouldn't have a valid structure and mysqld would panic, so we knew we had to restore.
> And did you ever Test disaster recovery with that setup?
It wasn't the best code (sorry Divij)—the main thing I'm proud of was the recursive acronym and the silly Warcraft theme. But it did the job.
Two things I remember about developing ORC:
1) The first version was an utter disaster. I was just learning Python, and I hit the global interpreter lock super hard, type-error crashes everywhere, etc. I ended up abandoning the project and restarting it a few months later, which became ORC. In the interim I did a few other Python projects and got somewhat better.
2) Another blocker the first version had was that the clients updated their status by doing SELECT...FOR UPDATE to a central table, en masse, which turns out to be really bad practice with MySQL. The database got lock-jammed, and I remember Domas Mituzas walking over to my desk demanding to know what I was doing. Hey, I never said I was a DBA! Anyway, that's why ORC ended up having the Warchief/Peon model—the Peons would push their status to the Warchief (or be polled, I forgot), so there was only a single writer to that table, requiring no hare-brained locking.
More impressive was how Facebook managed so many MySQL instances with such a small DBA team. The average regional bank probably has more Oracle DBAs managing a handful of databases.
It sounds like you were involved in this. Since you were working there so long ago, would you be willing to write up a technical account of the things you did? I'd be interested in learning more about it. I figure the tech from 10 years ago is outdated enough that it wouldn't cause any issues if you made it public.
Appreciate the interest. Honestly, most of the cool stuff was getting to play with toys that all the other talented engineers developed; I had a relatively narrow piece of the pie. I did write up a bit in a sibling reply.
Curiously, VKontakte also started from PHP+MySQL but went another way. PHP is compiled ahead of time with something called KittenPHP. It's open-source. For databases they switched some parts to their own bespoke solutions in C that talk with PHP over the memcached protocol. These are called KittenDB (or "engines") for the simple single-purpose ones, and there was also a more generic MySQL replacement in development when I left in 2016, MeowDB.
I’m not sure why someone of facebook’s scale would want JIT instead of AOT. That’s a lot of servers all JITing the same thing that could be done once at build time.
JIT compilation has the opportunity to do profile-guided optimization at runtime. JIT compilation is also simpler when distributing an application to non-identical servers, as it can optimize for the exact hardware it is running on.
HHVM's predecessor at FB was an AOT PHP to C++ transpiler called HPHPc. There are a lot more optimization opportunities with JIT compilation when dealing with dynamically typed languages like PHP.
It terrifies me that they probably ran their numbers very carefully and realized that this feat of engineering was still cheaper than rewriting their platform using a more manageable tech stack.
The decision is often inspired by two things
A) how near impossible it is to migrate the sheer amount of services and code without breaking something.
B) that often the needed performance is not needed is not having big jumps. So small but constant improvements are the way to go.
While you're technically correct that they aren't running vanilla PHP, the main takeaway here is that if you want to ultimately want to scale to billions of users then you should probably use vanilla PHP, or maybe Python.
My point is that they started with vanilla PHP and then had the most successful scaling story of all time -- so if someone else wants to follow in their footsteps then they should go with what is tried and true.
Well that might be mistaking the chicken for the egg or vice versa. They made sure it works because they were heavily invested. So if anything, that is a story of a tenacious tech team and nothing much more beyond that.
Arguably picking almost any other tech would have worked as well because they would have doubled down on it as well and made sure that all the pieces that they need are in place and are working.
Thanks for elaborating, though I definitely don't share your conclusion.
Some worthwhile references:
[1] https://engineering.fb.com/2016/08/31/core-data/myrocks-a-sp...
[2] https://research.facebook.com/file/529018501538081/hhvm-jit-...
[3] https://research.facebook.com/file/700800348487709/HHVM_ICPE...