Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here it is: https://arxiv.org/abs/1611.06188

RNN outputs "confidence" bit which can guide computation to perform more steps to obtain more confidence in the result. Essentially, RNN asks "let me think about that some more".

But, separate ablation study found that if you just drop confidence bit altogether and allow RNN to compute some more every time (e.g., always perform 4 computations on single input for 1 output), you get same or better results without extra complexity of training.

There is also Microsoft Research's paper I can't find right now about the variable computation for image classification where there is a "confidence" bit at some of the final layers - if lower layer is cinfident enough, it's output will be used for classification, otherwise the output of that layer will be passed into more transformation of upper layers.



> But, separate ablation study found that if you just drop confidence bit altogether and allow RNN to compute some more every time (e.g., always perform 4 computations on single input for 1 output), you get same or better results without extra complexity of training.

Do they saw what happens if you do both? Perhaps the “benefit from a higher computation/per cycle” phenomena and the “benefit from signalling relative computation resource allocation” one are different.

I guess I’ll have to try and read the paper, but I’m new to the literature and am clueless about the current state of research.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: