Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of the coolest programs I ever made was a tic-tac-toe with an AI that learned the optimal strategy by playing itself over and over. I wrote that in the process of learning Lisp, which made writing the program easy, and I got to exercise a lot of the language features.

What was so cool about it was watching it learn, and realizing that unless I programmed it to make random mistakes, it would settle for sub-optimal strategies.



An aside, but in case you weren't aware of it, I believe the AI strategy you describe is known as "simulated annealing".


It's simulated annealing specifically when you start with more tolerance for unfavorable random mistakes, then gradually move toward only accepting random variation that improves.

If you haven't already, check out _Essentials of Metaheuristics_ (http://cs.gmu.edu/~sean/book/metaheuristics/), a free textbook / set of lecture notes. :)


>I believe the AI strategy you describe is known as "simulated annealing".

Not necessarily. You could get much the same effect (the program learning by playing itself over and over) with many Reinforcement Learning algorithms (like TD learning, say).

There are major differences between Value Function (loosely, perceived_state -> perceived_long_term_reward map) based RL algorithms and algorithms that work only in the policy (loosely, the perceived-state -> chosen_action map) space like Simulated Annealing (or Genetic Algorithms). Barto and Sutton (somewhat loosely) use the term "evolutionary algorithms" to distinguish Value Function based algorithms from those that only manipulate policy




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: