Hacker News new | past | comments | ask | show | jobs | submit login

You are right, anyway I think that people playing with a bandit machine are going to continue playing more time if they are getting a lot of money that if they are loosing money, so when people are involved in games there is a hidden state, the mental state of the player. But if you decide up front the number of steps and you don't change your strategy depending of your mood, then this formal algorithm work as stated.



Contextual bandits, on the other hand, allow you to put your mood as a context (features) and your strategy depends on the features. You still have that simple expectation maximization (instead of a brutally hard to optimize loss), yet much more flexibility.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: