This is specifically what they said about sharding > The primary rationale is th...

manquer · 2026-01-23T05:39:05 1769146745

> potentially taking months or even years

On one hand OAI sell coding agents and constantly hype how easy it will replace developers and most of the code written is by agents, on the other hand they claim it will take years to refactor

Both cannot be true at the same time.

simonw · 2026-01-23T03:15:14 1769138114

Genuinely sounds like the kind of challenge that could be solved with a swarm of Codex coding agents. I'm surprised they aren't treating this as an ideal use-case to show off their stack!

gloflo · 2026-01-23T07:46:10 1769154370

Oh snap! Maybe it's all a great deception for making money?

csto12 · 2026-01-23T03:57:23 1769140643

I read your message, guessed the author, and I’m happy to announce I guessed correctly.

Ozzie_osman · 2026-01-23T05:50:24 1769147424

Getting the sharing in-place, yes, but maintaining it operationally would still be a headache. Things like schema migrations across shards, resharding, and even observability.

aisuxmorethanhn · 2026-01-23T04:11:31 1769141491

It wouldn’t work.

zozbot234 · 2026-01-23T02:10:06 1769134206

I know they said that, but in fact sharding is entirely a database-level concern. The application need not be aware of it at all.

EB66 · 2026-01-23T02:40:12 1769136012

Sharding can be made mostly transparent, but it's not purely a DB-level concern in practice. Once data is split across nodes, join patterns, cross-shard transactions, global uniqueness, certain keys hit with a lot of traffic, etc matter a lot. Even if partitioning handles routing, the application's query patterns and its consistency/latency requirements can still force application-level changes.

awesome_dude · 2026-01-23T07:01:19 1769151679

> Once data is split across nodes, join patterns, cross-shard transactions, global uniqueness, certain keys hit with a lot of traffic

If you're having trouble there then a proxy "layer" between your application and the sharded database makes sense, meaning your application still keeps its naieve understanding of the data (as it should) and the proxy/database access layer handles that messiness... shirley

zozbot234 · 2026-01-23T08:26:11 1769156771

> mostly transparent, but it's not purely a DB-level concern in practice ...

But how would any of that change by going outside Postgres itself to begin with? That's the part that doesn't make much sense to me.

londons_explore · 2026-01-23T10:14:37 1769163277

When sharded, anything crossing a shard boundary becomes non-transactional.

Ie. if you shard by userId, then a "share" feature which allows a user to share data with another user by having a "SharedDocuments" table cannot be consistent.

That in turn means you're probably going to have to rewrite the application to handle cases like a shared document having one or other user attached to it disappear or reappear. There are loads of bugs that can happen with weak consistency like this, and at scale every very rare bug is going to happen and need dealing with.

zozbot234 · 2026-01-23T10:20:04 1769163604

> When sharded, anything crossing a shard boundary becomes non-transactional.

Not necessarily? You can have two-phase commit for cross-shard writes, which ought to be rare anyway.

londons_explore · 2026-01-23T11:19:48 1769167188

Two-phase commit provides an eventual consistency guarantee only....

Other clients (readers) have to be able to deal with inconsistencies in the meantime.

Also, 2PC in postgres is incompatible with temporary tables, which rules out use with longrunning batch analysis jobs which might use temporary tables for intermediate work and then save results. Eg. "We want to send this marketing campaign to the top 10% of users" doesn't work with the naive approach.

awesome_dude · 2026-01-23T23:03:55 1769209435

Sorry, what am I missing here, this complaint is true for all architectures, because the readers are always going to be out of sync with the state in the database until they do another read.

The nanosecond that the system has the concept of readers and writers being different processes/people/whatever it has multiple copies, the one held by the database, and the copies held by the readers when they last read.

It does not matter if there is a single DB lock, or a multi shared distributed lock.

ants_a · 2026-01-23T14:00:48 1769176848

These are limitations in the current PostgreSQL implementation. It's quite possible to have consistent commits and snapshots across sharded databases. Hopefully some day in PostgreSQL too.