Which is the most difficult part of building a Retrieval Augmented Generation (RAG) system?
HelloRAG
1 reply
Building Retrieval Augmented Generation (RAG) system poses several challenges, but perhaps the most daunting ones are text segmentation and vectorization. While organized knowledge bases like FAQs are relatively straightforward to handle and yield good results, real-world documents such as regulations, methodologies, and plans present unique difficulties. These documents tend to be lengthy, containing dispersed knowledge points throughout.When doing Knowledge Bases, the segmentation extraction of documents is very headache-inducing, and the matching effect is also very general. How did everyone do it?
Replies
Gurkaran Singh@thestarkster
Segmentation and vectorization can indeed be tricky when building a Retrieval Augmented Generation (RAG) system. Dealing with real-world documents like regulations and plans can feel like finding a needle in a haystack! Handling FAQs may be a walk in the park compared to these intricate challenges. It's no surprise that text segmentation gives you a headache, but hey, the struggle is real for all of us in the RAG game.
Share