One of the coolest programs I ever made was a tic-tac-toe with an AI that learne...

brown9-2 · on Sept 24, 2010

An aside, but in case you weren't aware of it, I believe the AI strategy you describe is known as "simulated annealing".

silentbicycle · on Sept 25, 2010

It's simulated annealing specifically when you start with more tolerance for unfavorable random mistakes, then gradually move toward only accepting random variation that improves.

If you haven't already, check out _Essentials of Metaheuristics_ (http://cs.gmu.edu/~sean/book/metaheuristics/), a free textbook / set of lecture notes. :)

plinkplonk · on Sept 25, 2010

>I believe the AI strategy you describe is known as "simulated annealing".

Not necessarily. You could get much the same effect (the program learning by playing itself over and over) with many Reinforcement Learning algorithms (like TD learning, say).

There are major differences between Value Function (loosely, perceived_state -> perceived_long_term_reward map) based RL algorithms and algorithms that work only in the policy (loosely, the perceived-state -> chosen_action map) space like Simulated Annealing (or Genetic Algorithms). Barto and Sutton (somewhat loosely) use the term "evolutionary algorithms" to distinguish Value Function based algorithms from those that only manipulate policy