Here it is: https://arxiv.org/abs/1611.06188 RNN outputs "confidence" bit which ...

rablackburn · on March 30, 2022

> But, separate ablation study found that if you just drop confidence bit altogether and allow RNN to compute some more every time (e.g., always perform 4 computations on single input for 1 output), you get same or better results without extra complexity of training.

Do they saw what happens if you do both? Perhaps the “benefit from a higher computation/per cycle” phenomena and the “benefit from signalling relative computation resource allocation” one are different.

I guess I’ll have to try and read the paper, but I’m new to the literature and am clueless about the current state of research.