- *"And something like zobrist hashing for looking up a given position"* Which g...

alfalfasprout · on March 15, 2024

I mean, chess engines use transposition tables to store encountered positions. In this case you can use the zobrist hash as the index to a hash table and it's an O(1) lookup for a given position.

Typically that's going to be the best way to index into full games. Since you can point to an entry via a game tree search (eg; each node of the tree is a position that you can use as an index to a big transposition table).

You can probably use a bloom filter to check if a given position has ever been encountered (eg; as a layer on top of the transposition table) though. For something like storing 10 billion positions w/ a 10^-6 false positive probability (w/ reasonable number of hashes) you're looking at 30+GiB bloom filters.

perihelions · on March 15, 2024

Right! The hash table you're describing is O(1) lookup, but its memory size is a constant multiple larger than the Bloom filter. A Bloom filter doesn't tell you anything about where the needle is–only whether the needle exists; so, if used for lookup, it reduces to O(n) brute-force search. In between the two, you can create an array of M partitions of the haystack, each with their own Bloom filter: this structure has something like O(M + n/M) lookup, but with the smaller memory size of the Bloom filter.

When I implemented this, I deliberately avoided constructing the hash table you're describing, because (to my recollection) it was larger than my remaining disk space! :)