At very first glance this would be much closer to neon, with the separated storage and compute.
Crunchy Postgres for Kubernetes is great if you're running Postgres inside Kubernetes, but is more of standard Postgres than something serverless. Citus also not really serverless at all, Citus is more focused on performance scaling where things are very co-located and you're outgrowing the bounds of a single node.
It's not datafusion, much more custom with a number of extensions that underly pieces. And we've got a number of other extensions that will be in the works, to the user it's still a seamless experience but we've seen that smaller extensions that know how to work together are easier to maintain. For example we're working on a map type one that knows how to understand the map types within Parquet files within Postgres. In time we may open source some of these pieces, but we don't have a time frame for that and it's a case by case on each of the extensions.
It's a custom extension and actually a number of custom extensions, with quite a few more planned to further enhance the product experience. All extensions work together as a single unit to compose Crunchy Bridge for Analytics, but under the covers lots of building blocks that work together.
Marco and team worked were the architects behind the Citus extension for Postgres and have quite a bit of experience building advanced Postgres extensions. Marco gave a talk at PGConf EU on all the mistakes you can make when building extensions and best practices to follow–so in short quite a bit gone into the quality of this vs. a quick one off. Even in the standup with the team today it was remarked "we haven't even been able to make it segfault yet, which we could pull off quite quickly and commonly with Citus".
How do AWS credentials get loaded? Is it a static set populated in the CREATE SERVER or can it pull from the usual suspects of AWS credential sources like instance profiles?
The credentials are currently managed via the platform, so you enter them in the dashboard. We wanted to avoid specifying credentials via a SQL interface, because they can easily leak into logs and such. We'll add more authentication options over time.
There is coordination from the Crunchy Bridge control plane to the data plane, that the extension is then aware of.
At this time it's not FOSS, we are going to consider opening some of the building blocks in time, but at the moment they have a pretty tight coupling on both the other extensions and on how Crunchy Bridge operates.
Crunchy Bridge is a managed PostgreSQL service by Crunchy Data available on AWS, Azure, and GCP.
Bridge for Analytics is a special instance/cluster type in Crunchy Bridge with additional extensions and infrastructure for querying data lakes. Currently AWS only.
Fully agree with this sentiment it's very much our focus and goal at Crunchy Data with one big thing I'd add-great support.
I recall seeing them crop up in the early days of building and running Heroku Postgres, they were a very very early managed service provider. To my knowledge they never seemed to grow to massive scale but were a steady business (though I don't know any of the details for sure). That they were still around for over a decade is a testament from a lot of others.
Love the callout to Dataclips. It was easily my favorite least used feature by Heroku customers. Blazer and PgHero both got a bunch of inspiration from some of the early things we built at Heroku and its amazing having Andrew crank out so many high quality projects to make some of the tooling more broadly available.
Marco (author) is probably asleep at this point and could give a deeper perspective. He sort of hits on this when talking about disk latency... Depending on your setup and well just from some personal experience I know it's not crazy for Postgres queries to go at 1ms per query. From there you can start to do some math on how many cores, how many queries per second, etc.
Single node Postgres (with a beefy machine) can definitely manage in the 100k transactions per second. When you're pushing the high 100k into millions read replicas is a common approach.
When we're talking transactions, question of is it simply basic queries, bigger aggregations, and is it writes or reads. Writes if you can manage to do any form of multi-line insert or batching with copy you can push basic Postgres really far... From some benchmarks Citus as mentioned can hit millions of records per second safely with those approaches, and even without Citus can get pretty high write throughput.
The "disappointing" benchmark mentioned in the article is a shame for GigaOm who published it and for Microsoft who paid for it. They compare Citus with no HA to CockroachDB and YugabyteDB with replication factor 3 Multi-AZ, resilient to data center failure. And they run Citus on 16 cores (=32 vCPU) and the others on 16 vCPU.
But your point about "beefy machine" shows the real advantages of Distributed SQL. PostgreSQL and Citus needs downtime to save cost if you don't need that beefy machine all days all year. Scale up and down is downtime, as well as upgrades. Distributed SQL offers elasticity (no downtime to resize the cluster) and high availability (no downtime on failure or maintenance)
RE: "Distributed SQL offers elasticity (no downtime resize"). I'm not sure this is as much of an advantage of distributed databases vs single host databases anymore. Some of the tech to move virtual machines between machines quickly (without dropping TCP connections) is pretty neat. Neon has a blog post about it here[1]. Aurora Serverless V2 does the same thing (but I can't find a detailed technical blog post talking about how it works). Your still limited by "one big host" but its no longer as big of a deal to scale your compute up/down within that limit.
I don't think that's correct at all. Heroku Postgres has a central control plane that is monitoring availability and orchestrating things. There are continual health checks that go back to Heroku. In the event of unavailability it sets off a page to the on-call engineer to investigate if systems haven't restored availability.
My understanding of Fly Postgres is they put a lot into the tools to orchestrate, but there is not centralized monitoring and in the event of a failure it is up to you to realize and remediate.
Disclaimer: Was part of the team that built Heroku Postgres, and know the Fly team pretty well but don't personally use Fly Postgres so it's my understanding from the team. We've had a number of customers leverage Crunchy Bridge (build by a lot of the original Heroku Postgres team) use us for the managed Postgres connected to fly.io via Tailscale.
I think you're talking mostly about Heroku being a managed service while Fly Postgres is unmanaged. It sounds like the new managed Postgres in partnership with Supabase is managed in a similar way where Supabase would handle health checks and all that?
Management is a huge difference of course, but I was mostly asking about the database from the point of view of a user of the database. It doesn't sounds like Fly Postgres is doing anything like running your database globally - you still have single instance of the database.
Apologies if I'm missing some details. I intentionally try to stay out of the technical devops type stuff. I'm the kind of person who just pays Heroku for a Postgres and doesn't think much about it after that.
Yeah! One way to think about it that is almost (not perfectly) correct is that you could build and run all of Fly Postgres yourself; it's almost just a Fly App configuration.
I love the "this guy" piece... Meanwhile he's been the most public person in academia talking about databases over at least the last 5 years, maybe the last 10. He's not only done an awesome job of talking about foundational pieces of databases, but also examining new databases that have come up over the last 10 years or so. He's course is quite open as well, so it's not just the student base that gets to take advantage - https://15445.courses.cs.cmu.edu/fall2023/
The shirts he's generally helped promote and publicize those companies so he has things to hand out to his students, TAs, graduate students.
Back in grad school he got put on probation a second time after hiring a magician off craigslist for admitted students day. The magician showed up drunk and lost his dove in the building.