Hacker Newsnew | past | comments | ask | show | jobs | submit | more craigkerstiens's commentslogin

This is very much why we built the Postgres playground, which has Postgres embedded in your browser with guided tutorials - https://www.crunchydata.com/developers/tutorials


At very first glance this would be much closer to neon, with the separated storage and compute.

Crunchy Postgres for Kubernetes is great if you're running Postgres inside Kubernetes, but is more of standard Postgres than something serverless. Citus also not really serverless at all, Citus is more focused on performance scaling where things are very co-located and you're outgrowing the bounds of a single node.


It's not datafusion, much more custom with a number of extensions that underly pieces. And we've got a number of other extensions that will be in the works, to the user it's still a seamless experience but we've seen that smaller extensions that know how to work together are easier to maintain. For example we're working on a map type one that knows how to understand the map types within Parquet files within Postgres. In time we may open source some of these pieces, but we don't have a time frame for that and it's a case by case on each of the extensions.


It's a custom extension and actually a number of custom extensions, with quite a few more planned to further enhance the product experience. All extensions work together as a single unit to compose Crunchy Bridge for Analytics, but under the covers lots of building blocks that work together.

Marco and team worked were the architects behind the Citus extension for Postgres and have quite a bit of experience building advanced Postgres extensions. Marco gave a talk at PGConf EU on all the mistakes you can make when building extensions and best practices to follow–so in short quite a bit gone into the quality of this vs. a quick one off. Even in the standup with the team today it was remarked "we haven't even been able to make it segfault yet, which we could pull off quite quickly and commonly with Citus".


Do you have a link to the slides or the video of that presentation? I found this, but no links: https://www.postgresql.eu/events/pgconfeu2023/schedule/sessi...


Hmm, let me see if we have a link to slides. There was no video recording unfortunately but we can definitely get slides posted.


How do AWS credentials get loaded? Is it a static set populated in the CREATE SERVER or can it pull from the usual suspects of AWS credential sources like instance profiles?

Is the code for the extension itself available?


The credentials are currently managed via the platform, so you enter them in the dashboard. We wanted to avoid specifying credentials via a SQL interface, because they can easily leak into logs and such. We'll add more authentication options over time.


How does the extension get access to them? Is there some other “master” token for the Crunchy PG server itself that is used to fetch the real token?

The extension is not FOSS right?


There is coordination from the Crunchy Bridge control plane to the data plane, that the extension is then aware of.

At this time it's not FOSS, we are going to consider opening some of the building blocks in time, but at the moment they have a pretty tight coupling on both the other extensions and on how Crunchy Bridge operates.


> Crunchy Bridge for Analytics

what exactly is "Crunchy Bridge for Analytics"? It is some hosted cloud infra? I can't install it locally as extension?


Crunchy Bridge is a managed PostgreSQL service by Crunchy Data available on AWS, Azure, and GCP.

Bridge for Analytics is a special instance/cluster type in Crunchy Bridge with additional extensions and infrastructure for querying data lakes. Currently AWS only.


Fully agree with this sentiment it's very much our focus and goal at Crunchy Data with one big thing I'd add-great support.

I recall seeing them crop up in the early days of building and running Heroku Postgres, they were a very very early managed service provider. To my knowledge they never seemed to grow to massive scale but were a steady business (though I don't know any of the details for sure). That they were still around for over a decade is a testament from a lot of others.


Love the callout to Dataclips. It was easily my favorite least used feature by Heroku customers. Blazer and PgHero both got a bunch of inspiration from some of the early things we built at Heroku and its amazing having Andrew crank out so many high quality projects to make some of the tooling more broadly available.


Shameless plug, but we aim to get pretty close to this on Crunchy Bridge (our hobby-0 with 2 vcores starts at $10 a month) - https://www.crunchydata.com/pricing/calculator.


Marco (author) is probably asleep at this point and could give a deeper perspective. He sort of hits on this when talking about disk latency... Depending on your setup and well just from some personal experience I know it's not crazy for Postgres queries to go at 1ms per query. From there you can start to do some math on how many cores, how many queries per second, etc.

Single node Postgres (with a beefy machine) can definitely manage in the 100k transactions per second. When you're pushing the high 100k into millions read replicas is a common approach.

When we're talking transactions, question of is it simply basic queries, bigger aggregations, and is it writes or reads. Writes if you can manage to do any form of multi-line insert or batching with copy you can push basic Postgres really far... From some benchmarks Citus as mentioned can hit millions of records per second safely with those approaches, and even without Citus can get pretty high write throughput.


The "disappointing" benchmark mentioned in the article is a shame for GigaOm who published it and for Microsoft who paid for it. They compare Citus with no HA to CockroachDB and YugabyteDB with replication factor 3 Multi-AZ, resilient to data center failure. And they run Citus on 16 cores (=32 vCPU) and the others on 16 vCPU. But your point about "beefy machine" shows the real advantages of Distributed SQL. PostgreSQL and Citus needs downtime to save cost if you don't need that beefy machine all days all year. Scale up and down is downtime, as well as upgrades. Distributed SQL offers elasticity (no downtime to resize the cluster) and high availability (no downtime on failure or maintenance)


RE: "Distributed SQL offers elasticity (no downtime resize"). I'm not sure this is as much of an advantage of distributed databases vs single host databases anymore. Some of the tech to move virtual machines between machines quickly (without dropping TCP connections) is pretty neat. Neon has a blog post about it here[1]. Aurora Serverless V2 does the same thing (but I can't find a detailed technical blog post talking about how it works). Your still limited by "one big host" but its no longer as big of a deal to scale your compute up/down within that limit.

[1] https://neon.tech/blog/scaling-serverless-postgres


second yes to that - postgresql warm with plenty of RAM can do some fancy things and return an answer sub-millisecond too

cache is King


but large cache is expensive in the cloud and you cannot scale up/down without downtime


4TB of ram is only $71 per hour on AWS RDS. If you're at planetary scale that's not bad.


I don't think that's correct at all. Heroku Postgres has a central control plane that is monitoring availability and orchestrating things. There are continual health checks that go back to Heroku. In the event of unavailability it sets off a page to the on-call engineer to investigate if systems haven't restored availability.

My understanding of Fly Postgres is they put a lot into the tools to orchestrate, but there is not centralized monitoring and in the event of a failure it is up to you to realize and remediate.

Disclaimer: Was part of the team that built Heroku Postgres, and know the Fly team pretty well but don't personally use Fly Postgres so it's my understanding from the team. We've had a number of customers leverage Crunchy Bridge (build by a lot of the original Heroku Postgres team) use us for the managed Postgres connected to fly.io via Tailscale.


I think you're talking mostly about Heroku being a managed service while Fly Postgres is unmanaged. It sounds like the new managed Postgres in partnership with Supabase is managed in a similar way where Supabase would handle health checks and all that?

Management is a huge difference of course, but I was mostly asking about the database from the point of view of a user of the database. It doesn't sounds like Fly Postgres is doing anything like running your database globally - you still have single instance of the database.

Apologies if I'm missing some details. I intentionally try to stay out of the technical devops type stuff. I'm the kind of person who just pays Heroku for a Postgres and doesn't think much about it after that.


For what it's worth, Fly Postgres isn't single-instance or single-location. (But it's also not managed, which is a big deal).


How does that work? Does Fly just give you the logins for all of the Postgres servers you provision and you manage it yourself?


See the blog post linked upthread.


I think a bit of confusion on Fly Postgres vs. the Supabase offering. The earlier was unmanaged on Fly infra.

I'm not sure the full details on supabase as it's more recent.

This is a pretty good breakdown of various database providers and in particular a lot paired with Fly - https://dancroak.com/webstack/


Yeah! One way to think about it that is almost (not perfectly) correct is that you could build and run all of Fly Postgres yourself; it's almost just a Fly App configuration.


I love the "this guy" piece... Meanwhile he's been the most public person in academia talking about databases over at least the last 5 years, maybe the last 10. He's not only done an awesome job of talking about foundational pieces of databases, but also examining new databases that have come up over the last 10 years or so. He's course is quite open as well, so it's not just the student base that gets to take advantage - https://15445.courses.cs.cmu.edu/fall2023/

The shirts he's generally helped promote and publicize those companies so he has things to hand out to his students, TAs, graduate students.


Back in grad school he got put on probation a second time after hiring a magician off craigslist for admitted students day. The magician showed up drunk and lost his dove in the building.


Wow I like Andy even more now



Yes, the glory of social media. One of the most brilliant minds in databases, and he's "this guy."


not talking about the shirts


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: