My guess is that tiny LSTM is less computationally expensive than a tiny transfo...

lettergram · on Oct 25, 2022

Transformers are more complicated than RNNs and require more fine tuning. I’m guessing the RNNs were used to simplify the problem. I’m not even sure transformers would work here given we’re theyre dropping them into a process

laichzeit0 · on Oct 25, 2022

If you want it tiny and performant and choose LSTM over transformers why not just skip LSTM and use GRUs?