I don't know how I feel about this implementation. On the one hand, I love its simplicity and the fact that you still have Serializable transactions. On the other hand, having conflicts at the page level means that the occurrence of conflicts may be surprising because it's based on this fuzzy notion of having values "close together". Close together changes based on your page size configuration and, as the article points out, is not necessarily correlated with a natural key in ROWID tables. I would be very interested to see this system in use in a few real applications to get an sense of how much this really matters.
I agree that it's not ideal, but knowing that you can concurrently insert into separate tables would create some very welcome optimisation opportunities now that SQLite is increasingly used on servers.
The issues you describe only arise with concurrent inserts/updates into the same table. Essentially, what we would get if this got merged into the main branch is table level locking.
Just to clarify, these are not serializable transactions, they're snapshot isolation. The serialization this document talks about refers to BEGIN CONCURRENT allowing multiple writing transactions to co-exist, but COMMITs do not happen in parallel; at any given point in time only one COMMIT can be in-progress. Hence, serialized.
Yeah, I agree. It seems like the implementation simplicity comes at the cost of making it unreliable in the sense that depending on your schema design / data patterns, you could end up with much worse performance instead of much better performance. Also, if you do achieve better performance by really spreading out what pages you hit, you're decreasing your SSD lifespan through write amplification. [1]
On the bright side, the interface is simple enough, and maybe they could just swap out the implementation underneath for something more sophisticated later.
[1] Or on HDD, you really end up with lots of seeks for reading as well as writing. But that's probably unfair to consider: if you're using HDD, probably wanting more concurrency isn't your biggest opportunity to improve.
It's optimistic concurrency. If there's a conflict, the client gets SQLITE_BUSY_SNAPSHOT and has to roll back and try again. There's no guarantee the second try will succeed either. They might need to back off (as in, sleep between attempts) and/or entirely stop using "BEGIN CONCURRENT" after n tries.
edit:
originally also wrote above: In fact, without some way of ensuring transactions write to pages in a consistent order, I don't think there's any guarantee any transaction will make progress. It could degenerate to livelock.
...but I think I was wrong about this part. I think SQLITE_BUSY_SNAPSHOT means another transaction actually committed touching these pages; so progress was made in the system as a whole.
Practically, rollbacks due to transactions accessing an overlapping set of pages shouldn't be unexpected: the same client programmers who asked for BEGIN CONCURRENT are responsible for knowing what their queries do and dealing with their own conflicts by retrying the transaction or by avoiding or organizing concurrent access.
Moreover, high performance applications that want to saturate disk writing rates can usually afford to organize their architecture around making massively parallel modifications interference-free (for example, support for the nonconsecutive key assignments the article suggests has vast ramifications).
This won't be the first database that has page level artifacting in transactions. I can't recall which but I do recall this coming up as a potential scenario when learning about different transaction levels.
It's quite dangerous IMO since you run the risk of deadlock even in unexpected cases. There are times this makes sense, but most applications won't need it. Those cases are mainly where you have some long lived transactions that need to do a bunch of writes and you can guarantee they won't conflict since they are touching different tables.
My reading is that no deadlocks should be possible since there is only one lock (pages are "locked" optimistically, meaning that the tx is aborted if the page has changed). "Live locks" are possible, where two repeatedly-reissued transactions cause each other to abort forever.
Above, scottlamb came to the conclusion that live lock isn't possible after all, because one transaction will always have made progress. Intuitively, that makes sense, but intuition alone is always dangerous with concurrency. Which is it?
After reading the post a bit more carefully, I think livelock is impossible since transactions are only aborted if one or more read pages have been modified since they were read. That implies forward progress must be made another transaction in order for a transaction to be aborted.
However, unless they do some very careful accounting, this sounds equivalent to snapshot isolation which has anomalies not found with serializable isolation, though the chance of this happening is reduced by using page level locking.
I'd guess using uuids for the keys instead of ROWIDs might help. I'd also guess that UNIQUE constraints must also trigger a conflict and that "close together" in index btrees also does. If that's true then using a uuid primary key is defeated by the indices that you'd use to work around it.
A uuid key would crush performance vs rowid, though, especially for ordered scans, and you’d still have conflicts if parallel r/w accesses are temporally correlated (you wrote X and Y at the ~same time, and tend to read them at the ~same time)