Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> RWKV

The current versions of RWKV slowly go insane when exposed to sequences that are too long, because the state slowly diverges over time as you increase past the context length of the training session. They are experimenting with ways to avoid this though: https://github.com/Blealtan/RWKV-LM-LoRA/tree/dev-infctx



Can you share more details about the divergence?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: