While you don't strictly "need" a vector db to do RAG, as others have pointed ou...

phillipcarter · on Sept 14, 2023

Ehhh, I don't think you're telling the whole story here. Vectors aren't really a complete solution here either. Consider a use case like ours where we need to support extremely vague inputs (since users give us extremely vague inputs): https://twitter.com/_cartermp/status/1700586154599559464/

Cosine similarity across vectors isn't enough here, but when combined with an LLM we get the right behavior. As you mention, without the vector store reducing the size of data we pass to the LLM, hallucinations happen more often. It's a balancing act.

The other nasty one to consider is when people write "how do I not turn off reverse breaking". Again, a comparison will show that as very similar to your input, but it's really the opposite. And so if implementers aren't careful to account for that, they've now got a nasty subtle bug on their hands.

danielbln · on Sept 14, 2023

A neat way of dealing with sparse input is to take the entire chat history (if any) into account and ask the LLM to expand the query so that the semantic search has more to work with. Generally, using the LLM to add more data to the user query based on context, previous conversation, or just having it produce a fake document all together based on the sparse query can work well to improve the vectors you use in the similarity search. A concern with this strategy is latency, as you need to add another generation hop before you can query the vector db.

brandall10 · on Sept 14, 2023

Interesting. Do you have specific examples or a link to a post detailing this?

danielbln · on Sept 14, 2023

The approach is based on hypothetical document embeddings (HyDE). Here is a good description of it in the context of langchain: https://python.langchain.com/docs/use_cases/question_answeri...

The original paper proposing this technique can be found here: https://arxiv.org/pdf/2212.10496.pdf