Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reading comments in this thread, it feels like people still can't believe that in some case neural networks can be faster on CPU instead of GPU.

In fact, there is already real world use for neural networks that only optimized for CPU: an chess engine. More specifically, chess engine that use NNUE (Efficiently Updated Neural Networks) [1] like Stockfish 12 [2]. It run much faster while consuming less watts compared to GPU, can be run on average CPU, and managed to beat GPU based neural networks! [3]

This model (NNUE) already exist far earlier than the model discussed in this thread, yet there is almost no discussion about it on HN nor Reddit's r/MachineLearning

[1] https://www.chessprogramming.org/NNUE

[2] https://www.chessprogramming.org/Stockfish_NNUE

[3] https://www.chess.com/blog/the_real_greco/evolution-of-a-che...



NNUE is weird. No one outside the chess/shogi community talks about it because they seem to have a very case of strong not-invented-here syndrome. It's hand-optimised CPU code that doesn't (yet) run on the GPU (which is why it is "faster").

To be fair, they do want to embed it into a consumer friendly application, and the integration for embedding TF or something that can run pytorch models on a GPU without python is non-trivial.

If someone ported it to a GPU, it almost certainly would be faster. See http://www.talkchess.com/forum3/viewtopic.php?f=7&t=76986

There is a PyTorch port available, but no benchmarks unfortunatly. It does seem to be fairly widely used for training though, which is indicative of the speed gains available.


GPU have high latency. For a chess engine like stockfish which is designed to search as many positions as possible, the latency of a GPU is a big problem.

Engines like LC0 that do use the GPU work by searching fewer positions but with a heavier eval function. This makes the latency less relevant because it is a smaller percentage of the GPU time.


3D computer games are much more latency sensitive than chess, and work on GPUs very successfully.

This seems like a solvable problem.


Board game AI works by searching through the state space, and evaluating each state with the neural network, then picking the move with best expectation.

So it needs to load the comparatively tiny game state (chess board) into the GPU for each evaluation. The more game states it can evaluate per move, the better it is. It can be in the order of millions.


Is NNUE used by any Go AI? I've only heard of it being used for shogi and chess.


Yes you are correct. I wrote "go" when I meant "shogi" - fixed.


Was the NNUE trained on the CPU though? It’s intuitive that with the small size of a chessboard and the ability to use incremental update of evaluations, it will be faster to do neural network evaluation on CPU. Training, on the other hand, is typically highly parallelizable, so it would be a big development if we are able to do it faster on CPU.



World of difference btwn "can be run" and "is trained on"


You will find most developers don't actually understand how fast a single x86 core can be when used appropriately. Most are too busy hiding from real hardware concerns to think about such things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: