Reminder that AlphaGo and its successors have *not* solved Go and that reinforce...

viraptor · 2024-08-08T08:51:27 1723107087

I wouldn't say it sucks. You just need to keep training it for as long as needed. You can do adversarial techniques to generate new paths. You can also use the winning human strategies to further improve. Hopefully we'll find better approaches, but this is extremely successful and far from sucking.

Sure, Go is not solved yet. But RL is just fine continuing to that asymptote for as long as we want.

The funny part is that this applies to people too. Masters don't like to play low ranked people because they're unpredictable and the ELO loss for them is not worth the risk. (Which does rise questions about how we really rank people)

shakna · 2024-08-08T10:12:23 1723111943

> I wouldn't say it sucks. You just need to keep training it for as long as needed.

As that timeline can approach infinity, just adding extra training may not actually be a sufficient compromise.

thanhdotr · 2024-08-08T08:30:35 1723105835

well, as yann lecun said :

"Adversarial training, RLHF, and input-space contrastive methods have limited performance. Why? Because input spaces are BIG. There are just too many ways to be wrong" [1]

A way to solve the problem is projecting onto latent space and then try and discriminate/predict the best action down there. There's much less feature correlation down in latent space than in your observation space. [2]

[1]:https://x.com/ylecun/status/1803696298068971992 [2]: https://openreview.net/pdf?id=BZ5a1r-kVsf