More

craigkerstiens · on June 4, 2024

This is very much why we built the Postgres playground, which has Postgres embedded in your browser with guided tutorials - https://www.crunchydata.com/developers/tutorials

craigkerstiens · on May 29, 2024

At very first glance this would be much closer to neon, with the separated storage and compute.

Crunchy Postgres for Kubernetes is great if you're running Postgres inside Kubernetes, but is more of standard Postgres than something serverless. Citus also not really serverless at all, Citus is more focused on performance scaling where things are very co-located and you're outgrowing the bounds of a single node.

craigkerstiens · on April 30, 2024

It's not datafusion, much more custom with a number of extensions that underly pieces. And we've got a number of other extensions that will be in the works, to the user it's still a seamless experience but we've seen that smaller extensions that know how to work together are easier to maintain. For example we're working on a map type one that knows how to understand the map types within Parquet files within Postgres. In time we may open source some of these pieces, but we don't have a time frame for that and it's a case by case on each of the extensions.

craigkerstiens · on April 30, 2024

It's a custom extension and actually a number of custom extensions, with quite a few more planned to further enhance the product experience. All extensions work together as a single unit to compose Crunchy Bridge for Analytics, but under the covers lots of building blocks that work together.

Marco and team worked were the architects behind the Citus extension for Postgres and have quite a bit of experience building advanced Postgres extensions. Marco gave a talk at PGConf EU on all the mistakes you can make when building extensions and best practices to follow–so in short quite a bit gone into the quality of this vs. a quick one off. Even in the standup with the team today it was remarked "we haven't even been able to make it segfault yet, which we could pull off quite quickly and commonly with Citus".

ccleve · on April 30, 2024

Do you have a link to the slides or the video of that presentation? I found this, but no links: https://www.postgresql.eu/events/pgconfeu2023/schedule/sessi...

craigkerstiens · on April 30, 2024

Hmm, let me see if we have a link to slides. There was no video recording unfortunately but we can definitely get slides posted.

koolba · on April 30, 2024

How do AWS credentials get loaded? Is it a static set populated in the CREATE SERVER or can it pull from the usual suspects of AWS credential sources like instance profiles?

Is the code for the extension itself available?

mslot · on April 30, 2024

The credentials are currently managed via the platform, so you enter them in the dashboard. We wanted to avoid specifying credentials via a SQL interface, because they can easily leak into logs and such. We'll add more authentication options over time.

koolba · on April 30, 2024

How does the extension get access to them? Is there some other “master” token for the Crunchy PG server itself that is used to fetch the real token?

The extension is not FOSS right?

craigkerstiens · on April 30, 2024

There is coordination from the Crunchy Bridge control plane to the data plane, that the extension is then aware of.

At this time it's not FOSS, we are going to consider opening some of the building blocks in time, but at the moment they have a pretty tight coupling on both the other extensions and on how Crunchy Bridge operates.

riku_iki · on April 30, 2024

> Crunchy Bridge for Analytics

what exactly is "Crunchy Bridge for Analytics"? It is some hosted cloud infra? I can't install it locally as extension?

mslot · on April 30, 2024

Crunchy Bridge is a managed PostgreSQL service by Crunchy Data available on AWS, Azure, and GCP.

Bridge for Analytics is a special instance/cluster type in Crunchy Bridge with additional extensions and infrastructure for querying data lakes. Currently AWS only.

craigkerstiens · on April 8, 2024

Fully agree with this sentiment it's very much our focus and goal at Crunchy Data with one big thing I'd add-great support.

I recall seeing them crop up in the early days of building and running Heroku Postgres, they were a very very early managed service provider. To my knowledge they never seemed to grow to massive scale but were a steady business (though I don't know any of the details for sure). That they were still around for over a decade is a testament from a lot of others.

craigkerstiens · on Feb 28, 2024

Love the callout to Dataclips. It was easily my favorite least used feature by Heroku customers. Blazer and PgHero both got a bunch of inspiration from some of the early things we built at Heroku and its amazing having Andrew crank out so many high quality projects to make some of the tooling more broadly available.

craigkerstiens · on Feb 27, 2024

Shameless plug, but we aim to get pretty close to this on Crunchy Bridge (our hobby-0 with 2 vcores starts at $10 a month) - https://www.crunchydata.com/pricing/calculator.

craigkerstiens · on Jan 9, 2024

Marco (author) is probably asleep at this point and could give a deeper perspective. He sort of hits on this when talking about disk latency... Depending on your setup and well just from some personal experience I know it's not crazy for Postgres queries to go at 1ms per query. From there you can start to do some math on how many cores, how many queries per second, etc.

Single node Postgres (with a beefy machine) can definitely manage in the 100k transactions per second. When you're pushing the high 100k into millions read replicas is a common approach.

When we're talking transactions, question of is it simply basic queries, bigger aggregations, and is it writes or reads. Writes if you can manage to do any form of multi-line insert or batching with copy you can push basic Postgres really far... From some benchmarks Citus as mentioned can hit millions of records per second safely with those approaches, and even without Citus can get pretty high write throughput.

franckpachot · on Jan 9, 2024

The "disappointing" benchmark mentioned in the article is a shame for GigaOm who published it and for Microsoft who paid for it. They compare Citus with no HA to CockroachDB and YugabyteDB with replication factor 3 Multi-AZ, resilient to data center failure. And they run Citus on 16 cores (=32 vCPU) and the others on 16 vCPU. But your point about "beefy machine" shows the real advantages of Distributed SQL. PostgreSQL and Citus needs downtime to save cost if you don't need that beefy machine all days all year. Scale up and down is downtime, as well as upgrades. Distributed SQL offers elasticity (no downtime to resize the cluster) and high availability (no downtime on failure or maintenance)

AdamProut · on Jan 9, 2024

RE: "Distributed SQL offers elasticity (no downtime resize"). I'm not sure this is as much of an advantage of distributed databases vs single host databases anymore. Some of the tech to move virtual machines between machines quickly (without dropping TCP connections) is pretty neat. Neon has a blog post about it here[1]. Aurora Serverless V2 does the same thing (but I can't find a detailed technical blog post talking about how it works). Your still limited by "one big host" but its no longer as big of a deal to scale your compute up/down within that limit.

[1] https://neon.tech/blog/scaling-serverless-postgres

mistrial9 · on Jan 9, 2024

second yes to that - postgresql warm with plenty of RAM can do some fancy things and return an answer sub-millisecond too

cache is King

franckpachot · on Jan 9, 2024

but large cache is expensive in the cloud and you cannot scale up/down without downtime

datadrivenangel · on Jan 9, 2024

4TB of ram is only $71 per hour on AWS RDS. If you're at planetary scale that's not bad.

craigkerstiens · on Dec 15, 2023

I don't think that's correct at all. Heroku Postgres has a central control plane that is monitoring availability and orchestrating things. There are continual health checks that go back to Heroku. In the event of unavailability it sets off a page to the on-call engineer to investigate if systems haven't restored availability.

My understanding of Fly Postgres is they put a lot into the tools to orchestrate, but there is not centralized monitoring and in the event of a failure it is up to you to realize and remediate.

Disclaimer: Was part of the team that built Heroku Postgres, and know the Fly team pretty well but don't personally use Fly Postgres so it's my understanding from the team. We've had a number of customers leverage Crunchy Bridge (build by a lot of the original Heroku Postgres team) use us for the managed Postgres connected to fly.io via Tailscale.

leros · on Dec 15, 2023

I think you're talking mostly about Heroku being a managed service while Fly Postgres is unmanaged. It sounds like the new managed Postgres in partnership with Supabase is managed in a similar way where Supabase would handle health checks and all that?

Management is a huge difference of course, but I was mostly asking about the database from the point of view of a user of the database. It doesn't sounds like Fly Postgres is doing anything like running your database globally - you still have single instance of the database.

Apologies if I'm missing some details. I intentionally try to stay out of the technical devops type stuff. I'm the kind of person who just pays Heroku for a Postgres and doesn't think much about it after that.

tptacek · on Dec 15, 2023

For what it's worth, Fly Postgres isn't single-instance or single-location. (But it's also not managed, which is a big deal).

satvikpendem · on Dec 15, 2023

How does that work? Does Fly just give you the logins for all of the Postgres servers you provision and you manage it yourself?

tptacek · on Dec 16, 2023

See the blog post linked upthread.

craigkerstiens · on Dec 15, 2023

I think a bit of confusion on Fly Postgres vs. the Supabase offering. The earlier was unmanaged on Fly infra.

I'm not sure the full details on supabase as it's more recent.

This is a pretty good breakdown of various database providers and in particular a lot paired with Fly - https://dancroak.com/webstack/

tptacek · on Dec 15, 2023

Yeah! One way to think about it that is almost (not perfectly) correct is that you could build and run all of Fly Postgres yourself; it's almost just a Fly App configuration.

craigkerstiens · on Nov 27, 2023

I love the "this guy" piece... Meanwhile he's been the most public person in academia talking about databases over at least the last 5 years, maybe the last 10. He's not only done an awesome job of talking about foundational pieces of databases, but also examining new databases that have come up over the last 10 years or so. He's course is quite open as well, so it's not just the student base that gets to take advantage - https://15445.courses.cs.cmu.edu/fall2023/

The shirts he's generally helped promote and publicize those companies so he has things to hand out to his students, TAs, graduate students.

bhickey · on Nov 27, 2023

Back in grad school he got put on probation a second time after hiring a magician off craigslist for admitted students day. The magician showed up drunk and lost his dove in the building.

achileas · on Nov 27, 2023

Wow I like Andy even more now

bhickey · on Nov 27, 2023

Here's his first trip to probation: https://www.cs.cmu.edu/~pavlo/slides/graffiti-dc401-oct12.pd...

snapetom · on Nov 27, 2023

Yes, the glory of social media. One of the most brilliant minds in databases, and he's "this guy."

bigdict · on Nov 27, 2023

not talking about the shirts