Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ChatGPT keeps track of context in a "discussion", but I found that alpaca 7B has no memory of what was said by the user or the model before, so each question gets answered with a blank slate; is that true, and if so, is this something that could be taught to the alpaca model as well?


> ChatGPT keeps track of context in a "discussion", but I found that alpaca 7B has no memory of what was said by the user or the model before

Neither does GPT. The whole conversation is fed back to it everytime. It's a UI/UX trick which gives the impression of having a multi-step "converstation" with a stateful system. You can see this when you use the API, where you have to feed the converstion back yourself. This can be replicated in ChatLLaMA.


More than thaught it would be a matter of feeding such context back instead of only the new chat/response. Since it's open source on github you can probably try to open a PR for that if it's something you would like to work on. Sounds a good improvement to give -memory- to the sessions


to add state (memory), u can either:

* inject the running chat log into the prompt * inject the summary of the chat into the prompt


Or perhaps a progressive summary, where the most recent messages are full fidelity, and older messages get “compressed” into a summary.

You can also fine tune the model to incorporate larger amounts of data, but that may be more expensive (and slower)

This kind of sounds like human short term and long term memory. Maybe “fine tuning” is analogous what happens to our memory when we sleep.


you just explained how human memory works and i thought about implementing that in a future model that allows for more max input tokens. the further back the text, the more it goes through a "summarize this text: ..." prompt. GPT4 has 28k Token limit, so it has the brain of a cat maybe, but future models will have more max. tokens and might be able to have a human like memory that gets worse the older the memory is.

Alternatives are maybe architectures using langchain or toolformer to retrieve "memories" from a database by smart fuzzy search. But that's worse, because reasoning would only be done on that context, instead of all memories it ever had.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: