Understanding RAG Pipelines
RAG Pipelines in Vectorize
Vectorize RAG Pipelines enable you to quickly and easily build vector search indexes from unstructured data sources like documents, PDFs, and knowledge bases. This feature integrates directly with your own database, giving you complete ownership and control over your data. By converting unstructured data into vector embeddings and storing them in your vector database, the RAG Pipeline enables fast, real-time retrieval of relevant information.
Fast and Easy Setup
With the RAG Pipeline feature, your vector indexes can be fully populated within minutes, allowing you to quickly transform your unstructured data into a searchable format. The setup process is streamlined and efficient, so you can begin querying your data almost immediately, without the need for complex configurations or long delays.
High Observability and Visibility
The RAG Pipeline feature provides high observability and real-time visibility into how your data is processed and indexed. As changes occur in your unstructured data sources, these updates are reflected in the vector indexes to ensure that they remain synchronized. This ensures that your vector indexes are always up to date with the latest changes in your data, providing confidence in the accuracy and relevance of the information retrieved.
Key Components of the RAG Pipeline Feature
Data Ingestion and Extraction: The RAG Pipeline can ingest unstructured data from a variety of sources, such as Amazon S3, Google Cloud, or local file systems. The data is then processed to extract meaningful text.
Chunking and Embedding: Once the text is extracted, it is split into chunks, and vector embeddings are generated using models such as OpenAI’s
text-embedding-3-small
ortext-embedding-3-large
. These embeddings are stored in your vector database, allowing you to retain full control over your data.Vector Indexing: The generated embeddings are indexed in your own vector database (e.g., Pinecone, DataStax Astra), enabling fast, efficient real-time retrieval when querying for relevant information.
Benefits
Full Data Ownership: The RAG Pipeline uses your own vector database, so you retain complete control and ownership over your data.
Quick Setup: Populate your vector indexes within minutes, transforming your unstructured data into a searchable format almost immediately.
Real-time Updates: Ensure all changes to your unstructured data sources are reflected in the vector indexes, providing up-to-date information with full visibility.
Last updated