These MIT Tech Review articles, alas, emphasize hype: "No one had ever demonstra...

cryptoz · on Dec 2, 2014

Aren't these quite different tasks, though? There's a big difference in 'learning to play a specific game well' vs. 'learning to play arbitrary games'; such a big difference that I think they're entirely different disciplines. Correct me if I'm wrong, but the software in the research you reference was given the ruleset to the game, right? And DeepMind's software is not given that information, I think. I doubt they intentionally omitted that work, I think it's more likely they didn't consider it relevant enough.

mturmon · on Dec 2, 2014

Thanks for the correction. The "arbitrary" qualifier is not in TFA, but (as, indeed, you said) that's the point of the demo, e.g.: https://www.youtube.com/watch?v=EfGD2qveGdQ Note that they're using just the video signal from the game as input.

It's really a sad comment on the state of reporting at MIT Tech Review that you learn more about the tech from a youtube video than from an article.

(My complaint is not with the DeepMind people, it's with the article, which should put the work in context.)

gumby · on Dec 3, 2014

> It's really a sad comment on the state of reporting at MIT Tech Review that..

I feel compelled to point out that the only connection between the "MIT" tech review and MIT is that the magazine licenses the name from the alumni association. It's how the alumni associations funds itself and every MIT grad gets a lifetime subscription to a version of the magazine with the alumni notes bound into the back. I doubt many of us read it. I don't know how many people other than MIT grads read it, but I would imagine vanishingly few.

A friend of mine calls it "the magazine of things that will never happen" which I think is dead on. It's a shame because the editor, Jason Pontin, as actually a good guy so it's surprising that the magazine continued to suck after he took it over.

There are many reasons to criticize MIT (don't I know it!) but you can't judge the institute by this magazine.

ghaff · on Dec 3, 2014

I'm going to disagree a bit here. Tech Review does tend to focus on the possibilities of technology and to highlight potentially exciting research. Almost by definition, a lot of this stuff is never going to amount to anything commercially interesting. I suppose that TR could insert more implicit or explicit disclaimers to that effect but I find it a good source for insights into what's going on in the labs.

Personally, I think that Jason has brought a lot of positive changes to a magazine that, for a long time, tended toward a technology policy wonkish orientation.

So I think it's fair that a lot of what's written about "will never happen." But I'm not sure that's really avoidable if you cover cutting-edge research.

jessriedel · on Dec 2, 2014

I like your comments and I am all about holding Tech Review to a high standard, but I think I am going to side with them on this. The key part is "from scratch". I'd venture that there are lots of AI projects that similarly relevant precursors to DeepMind, all of which (including your Backgammon examples) do not actually accomplish the same from-scratch abilities described.

mturmon · on Dec 2, 2014

The thing about Tesauro's backgammon work that excited the community is that the system trained by playing itself (http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node108.htm... -- "To apply the learning rule we need a source of backgammon games. Tesauro obtained an unending sequence of games by playing his learning backgammon player against itself.").

Also, it didn't use an elaborate set of features and heuristics adapted for backgammon, just a simple representation of the state of the board (a list of 0/1 variables encoding how many pieces of each color are on each position).

This is pretty close to "from scratch", and I think the article would have done well to point out what is actually new here.

fragsworth · on Dec 2, 2014

They are different tasks, but I'm not seeing any really clear descriptions of what DeepMind can and cannot do. It's possible that their software is only good at a very specific kind of thing that happened to be in line with what Google wants. And for all we know, they could be at a complete loss as to how to progress.

I mean, to what extent did they restrict what it means to be an "arbitrary game"? I highly doubt their software can play Pictionary, for instance, but I haven't found anything that really explains their limitations.

Because of this, I am leaning towards the cynical and assume it's just hype, and not actually that incredible.

Houshalter · on Dec 2, 2014

It's basically the same algorithm, or at least very similar. The main difference is they use huge neural networks running on GPUs, and they feed it raw video data, rather than the game board state directly.

It's not any less impressive though, to my knowledge no one had done anything like that before. That is, beating video games with raw video data and reinforcement learning.

jessriedel · on Dec 2, 2014

Did they hard-code the rules of backgammon into the software, or only the board state? I think there's a sort of conceptual ladder visual input --> game state --> games rules --> game strategy and it's very important to specify which rungs the software started on.

Houshalter · on Dec 3, 2014

Just the position of the pieces on the board. They did give it some other features to help it. I forget what they were though, but just simple stuff that was calculated directly from the board state.

JabavuAdams · on Dec 3, 2014

From what I've read, DeepMind's approach is to just feed in the raw pixel data and the score. No rules, or anything like that.

amoruso · on Dec 2, 2014

Here's the original research paper if you're interested.

http://arxiv.org/abs/1312.5602

I'll just quote their introduction instead of trying to summarize the paper:

"Our goal is to create a single neural network agent that is able to successfully learn to play as many of the games as possible. The network was not provided with any game-specific information or hand-designed visual features, and was not privy to the internal state of the emulator; it learned from nothing but the video input, the reward and terminal signals, and the set of possible actions—just as a human player would. Furthermore the network architecture and all hyperparameters used for training were kept constant across the games. So far the network has outperformed all previous RL algorithms on six of the seven games we have attempted and surpassed an expert human player on three of them."

mturmon · on Dec 3, 2014

My gripe is with the post, not the paper. But you're right, the best way to figure out what's new is to go to the source.

The paper does a good job going over related work (section 3), beginning with the example I gave.

falcor84 · on Dec 3, 2014

Indeed. I am also surprised that no one mentioned Tom Murphy's Sigbovik paper from April 1st 2013 - "The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travel ... after that it gets a little tricky" http://www.cs.cmu.edu/~tom7/mario/mario.pdf

Murphy created an agent that can play arbitrary games by inspecting the RAM and attempting to maximize the score.

See also this writeup on Ars Technica - http://arstechnica.com/gaming/2013/04/this-ai-solves-super-m...

Houshalter · on Dec 4, 2014

While interesting, he uses a brute force approach (try every possible combination of moves so many seconds into the future and see which one is the best.)