I think the point he makes is that RL doesn't work as the world believes it to. ...

stochastic_monk · on Feb 16, 2018

That's a little unfair to RL. Reinforcement Learning is great even outside of autonomous control. [It's become quite important in NLP, for example.] It should be seen as a valuable set of approaches in a toolbox, not a silver bullet.

Eridrus · on Feb 16, 2018

> [It's become quite important in NLP, for example.]

[citation needed]

Maybe it's the specific NLP tasks I've been paying attention to (goal oriented dialog), but most of the RL for NLP work I've seen has not been super impressive.

stochastic_monk · on Feb 17, 2018

Williams, 1992.

http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92s...

This is even within dialogue generation. I’ve found plenty more recent citations, but it’s been a very important part of that subfield for decades.

I can’t speak for goal-oriented dialogue generation, however. Perhaps your area of work has benefited less or given it less attention.

Eridrus · on Feb 17, 2018

There's plenty of work applying RL techniques, it's just not actually useful for building real systems.

It's basically in the same state as the rest of this blog post where it's so horribly sample inefficient you're usually better off with supervised learning if you're paying annotators. Or you're doing REINFORCE for some proxy metric or simulator which you're probably over-fitting and not actually improving your system with.

The Alexa prize is basically the only situation where you're getting anywhere near enough RL rewards to be meaningful, because everywhere else the feedback is rare enough to not be helpful due to the sample inefficiency.

Which is why I disagree with the characterization that it's important. There's a lot of it, and it can squeeze a little bit of performance on whatever dataset you're looking at, but it's had nowhere near the impact of, say, word vectors or (Bi)LSTMs.

stochastic_monk · on Feb 17, 2018

Thanks for the feedback! I'm more familiar with NLP through my colleagues and coursework, so I appreciate the real world experience.