I didn’t think of that, but I guess the conclusion as to performance of the mode...

janalsncm · 2024-11-19T18:56:24 1732042584

Reading over your article again, it seems like you made the same “mistake” as I did. In other words, the evals you’re seeing in the Lichess PGN aren’t the raw outputs of the Stockfish evaluation function, they’re the output of the search, which is a highly nonlinear function of millions of evals. If I recall from their docs, it’s something like depth 18, so your chances of distilling it with 20k parameters is essentially zero.

(I put mistake in quotes here because Deepmind showed that with a server farm of TPUs it is possible to distill the search as well. So it’s not impossible per se.)

But that’s ok! You’ve created a super fast evaluation function. Instead of trying to distill the depth 18 output, it will be much more realistic to distill the depth zero output, the NNUE. If you rerun those FENs in Stockfish you can pretty quickly create an easier dataset.