If you really want speed, then I would recommend using a good implementation of Dancing Links for solving constraint satisfaction problems. Don Knuth proposes a doubly linked list structure to speed up recursive state space exploration: www-cs-faculty.stanford.edu/~uno/papers/dancing-color.ps.gz.
It's not hyper-fast (for speed I actually implemented it as a c-extention to ruby, but it's a long time ago and I don't think I have the code around by now) but being ruby it's fairly easy to read.
Can really recommend reading the paper on Dancing Links and playing around with the algorithm, its such a great feeling once you start visualizing how the linked list trick works :)
Dancing Links is definitely a fun algorithm to implement. I wrote one in Python a while back and included an option to generate graphs as it went through the recursion tree. I put up a quick page at http://cavernum.net/dlsudoku/ demonstrating them.