While I agree with the beginning of your post, you lost me here: > The bitter le...

gamegoblin · on July 6, 2023

I agree with you that learning certain things is wasteful.

For instance, one could imagine an RNN that learned to do some approximation of tree search for game playing Chess and Go. But we have very good reason to think that tree search is basically exactly what you want, so even systems like AlphaGo have the tree search implemented outside the neural net, but still using a learned system to heuristically guide the tree search.

The reference to the bitter lesson here is that feature engineering has, thus far, typically lost out to more general end-to-end methods in the long run.

This paper tries to do feature engineering by hand-coding an exponentially decaying mechanism, where tokens further in the past are assumed to be less important.

My comment is that this type of hand-engineering will lose out to methods that are more end-to-end learned. These methods do not necessarily need to be hugely computationally intensive ("buy more GPUs").

That said, I could see it being the case that in the short term, we do just buy more GPUs, learn a general end-to-end algorithm, but eventually figure out how to re-implement that end-to-end learned algorithm in code significantly more efficiently.

famouswaffles · on July 6, 2023

By and large, we don't really know what inductive biases we ought to be shoving in to models. Sometimes we think we do, but we're wrong more often than not. So methods with the least inductive biases work better.