Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While you don't strictly "need" a vector db to do RAG, as others have pointed out, vector databases excel when you're dealing with natural language - which is ambiguous.

This will be the case when you're exposing an interface to end users that they can submit arbitrary queries to - such as "how do I turn off reverse breaking".

By converting the user's query to vectors before sending it to your vector store, you're getting at the user's actual intent behind their words - which can help you retrieve more accurate context to feed to your LLM when asking it to perform a chat completion, for example.

This is also important if you're dealing with proprietary or non-public data that a search engine can't see. Context-specific natural language queries are well suited to vector databases.

We wrote up a guide with examples here: https://www.pinecone.io/learn/retrieval-augmented-generation...

And we've got several example notebooks you can run end to end using our free-tier here: https://docs.pinecone.io/page/examples



Ehhh, I don't think you're telling the whole story here. Vectors aren't really a complete solution here either. Consider a use case like ours where we need to support extremely vague inputs (since users give us extremely vague inputs): https://twitter.com/_cartermp/status/1700586154599559464/

Cosine similarity across vectors isn't enough here, but when combined with an LLM we get the right behavior. As you mention, without the vector store reducing the size of data we pass to the LLM, hallucinations happen more often. It's a balancing act.

The other nasty one to consider is when people write "how do I not turn off reverse breaking". Again, a comparison will show that as very similar to your input, but it's really the opposite. And so if implementers aren't careful to account for that, they've now got a nasty subtle bug on their hands.


A neat way of dealing with sparse input is to take the entire chat history (if any) into account and ask the LLM to expand the query so that the semantic search has more to work with. Generally, using the LLM to add more data to the user query based on context, previous conversation, or just having it produce a fake document all together based on the sparse query can work well to improve the vectors you use in the similarity search. A concern with this strategy is latency, as you need to add another generation hop before you can query the vector db.


Interesting. Do you have specific examples or a link to a post detailing this?


The approach is based on hypothetical document embeddings (HyDE). Here is a good description of it in the context of langchain: https://python.langchain.com/docs/use_cases/question_answeri...

The original paper proposing this technique can be found here: https://arxiv.org/pdf/2212.10496.pdf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: