This approach also has limitations. Namely, your ability to retrieve information is limited by your ability to search for snippets using embeddings.
This could be solved with using different search methodologies, using multiple gpt requests to summarize available info or using a structured knowledge framework to prepare prompts (instead of just raw text).
There's so much scope for creativity and improvement here - that's one of the things that excites me about this technique, it's full of opportunities for exploring new ways of using language models.
In my experience semantic search is great for finding implicit relationships (bad guy => villain) but sometimes fails in unpredictable ways for more elementary matches (friends => friend). That's why it can be good to combine semantic search with something like BM25, which is what I use in my blog search [1]. N-gram text frequency algorithms like TF-IDF and BM25 are also lightning fast compared to semantic search.
gpt_index does that. A tree of document chunks (leafs) is built with parent nodes as increasingly summarized versions of the child nodes built with GPT.
The tree is then traversed to find the most relevant chunk asking GPT to compare entries based on relevance to the question. This results in an original document chunk, which is given as context in a final prompt asking to answer the query.
This is great and powerful, but very not cost effective. Log(n) requests to completion API, for n documents.
The embedding search is probably necessary for bigger datasets.
This could be solved with using different search methodologies, using multiple gpt requests to summarize available info or using a structured knowledge framework to prepare prompts (instead of just raw text).
Any other ideas that I'm missing?