These sorts of heat balancing sharding schemes are very difficult to implement and very expensive. As you see hot keys you need to split the hash space and rebalance within that space by reshuffling the shard data.
I’d note that also Google doesn’t bother keeping a perfect index because perfect fidelity isn’t necessary, unlike in a lot analytic or similar system where replication of ground truth is important. It’s much more important for Google to maintain high fidelity at the less frequent token side of the distribution and very low fidelity at the high frequency side. Logs can’t do that.
It’s actually quite hard. It starts with being able to detect a hot key at all. It’s also not the case that heat is symmetric with size, in fact in an inverted index single entries can be very hot. Then it’s not about simply shuffling data (which isn’t simple as you outline - you need to salt the keys and they shuffle randomly, otherwise you don’t get uniformity), then you need to create cumulatively eventually consistent write replicas to balance write load while answering queries online in a strongly consistent way. Add to this any dynamic change in the index like this requires consistent online behavior (I.e., ingestion and queries don’t stop because you need to rebalance), and the hot keys are necessarily “large” in volume so back pressure can be enormous and queue draining itself can be expensive. Add to it you need stateful elastic infrastructure.
There are definitely products that offer these characteristics. S3 and dynamodb both do, even if you can’t see it. But it took many years of very intensive engineering to get it to work, and they have total control over the infrastructure and runtime behind an opaque api. Elastic search and Splunk are general purpose software packages that are installed by customers, and their data models are much more complex than objects or tables.
> It’s also not the case that heat is symmetric with size, in fact in an inverted index single entries can be very hot.
I think you mixed two orthogonal topics: you first talked about frequent tokens, and now switched to hot keys(tokens which are frequently queried).
As for frequent tokens, I think I well described algorithm, and it looks simple, and I don't see any issues there, if your metadata store (where you store info about shards and replicas) allows some kind of transactions (e.g. cocroachdb or similar).
For hot keys/shards, as you pointed out, solution is to increase replication factor, but I think if shard is relatively small(10m IDs as in my example), adding another replica online is also fast, can be done in single transaction, and may not require all these movement you described.
I’d note that also Google doesn’t bother keeping a perfect index because perfect fidelity isn’t necessary, unlike in a lot analytic or similar system where replication of ground truth is important. It’s much more important for Google to maintain high fidelity at the less frequent token side of the distribution and very low fidelity at the high frequency side. Logs can’t do that.