Playing Atari with Deep Reinforcement Learning [pdf]

JabavuAdams · on Dec 12, 2014

Submitted previously, here: https://news.ycombinator.com/item?id=7802661

... and here: https://news.ycombinator.com/item?id=8484313

plurby · on Dec 12, 2014

Here's the video Deepmind artificial intelligence @ FDOT14: https://www.youtube.com/watch?v=EfGD2qveGdQ

rrtwo · on Dec 12, 2014

Is there any example code available for this or a similar task being 'solved' using deep and reinforcement learning?

iandanforth · on Dec 12, 2014

http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo...

rrtwo · on Dec 14, 2014

Thanks!

zeidrich · on Dec 12, 2014

Academic papers use a lot of formulas and equations that can be easily described in English, but instead are described mathematically.

I often find that when I read them, it really slows me down.

"We refer to a neural network function approximator with weights θ as a Q-network. A Q-network can be trained by minimising a sequence of loss functions Li(θi) that changes at each iteration i, L_i (θ_i) = E_s,a∼ρ(·)[(y_i − Q (s, a; θ_i))^2]"

I get to a place like this and I really have to stop and let it process. Ok, theta is the set of weights for the network function approximator. L_i is the loss function at i, wait. what is rho? something about expected return. Wait, what is y? Oh, ok, it's just E_s∼E [r + γ max_a' ; Q(s', a'; θ_i−1)|s, a] ok, so that's I guess the equation before this one with respect to those weights for the previous iteration. So I guess that minus the square of the function approximator based on the weights of the current iteration? Oh, wait it says rho(s,a) is the behavior distribution or the probability distirirbution over s and a.

So I guess I kind of get the point of the loss function? I think it estimates some difference between the approximated expectation of the current iteration and the approximated expectation where it should be after the previous iteration?

I guess I wonder if there are other people, even people who study in the same domain who can read a paper like that and just understand the formulas more easily than if it were explained a bit more in English.

I mean, I don't typically work with machine learning, and it's been a long while since I've been in Academia, but even when I was an undergrad, the formulas that I used were mostly done because I felt that good papers include confusing formulas. They were always valid and important, but presenting as little as necessary almost seemed like a challenge to other people, to show how I was smarter than them. But it also meant when I read papers by peers even though I knew what all the parts meant I still had to sit down and think through the formulas when they were shown.

I don't mean that this is the intention of this paper. But especially since I'm not very familiar with conventions when talking about artificial neural networks and machine learning it's especially apparent how challenging it is to understand, but once I do sort through the jargon and process the intent, I start to understand it.

I guess I just wonder whether someone reasonably familiar with the subject matter can read those things without issue (in which case, that's great, because that's the intended audience) or whether authors could communicate their point more clearly in English, and use the formulas more as reference or proof. Or whether it was kind of like me, and they like to leave it to the reader as a sort of "if it takes you a while to get it, that's your problem, we're smarter."

My guess is that it could be communicated more clearly, and that it's more a result of academic style than anything else. But I do wonder if people who are engrossed in the subject also stumble when it comes to those things.

CJefferson · on Dec 12, 2014

I work in A.I., on a different area of search. This paper made no sense to me either. I have a sneaky feeling the authors might be trying to fancy up something very simple (that accusation is based on nothing more than reading many accademic papers and finding this one particularly unreadable, when I feel this paper should lend itself to a particular easy outline of their technique at least).