> RWKV The current versions of RWKV slowly go insane when exposed to sequences t...

		LoganDark on July 7, 2023 \| parent \| context \| favorite \| on: Scaling Transformers to 1B Tokens > RWKV The current versions of RWKV slowly go insane when exposed to sequences that are too long, because the state slowly diverges over time as you increase past the context length of the training session. They are experimenting with ways to avoid this though: https://github.com/Blealtan/RWKV-LM-LoRA/tree/dev-infctx

Can you share more details about the divergence?