Vectorize Architecture
Last updated
Last updated
Vectorize provides a structured architecture tailored to facilitate the development and deployment of Retrieval-Augmented Generation (RAG) systems at scale. By integrating the processes of data ingestion, vectorization, automatic updates, and real-time retrieval, the platform helps teams work with unstructured data sources efficiently and generate accurate, contextually aware AI responses. This architecture is designed to simplify the complexities involved in building reliable RAG pipelines and deploying them in production environments.
The architecture of Vectorize is built around several core components designed to solve the challenges of managing unstructured data for use in RAG pipelines:
Your data is unique and benchmarks are often non-representative of real-world scenarios or performance. For these reasons, Vectorize offers a data-driven approach to RAG evaluation.
RAG Evaluation starts by providing a sample of the documents you want to use in your RAG application and configuring the embedding models and chunking strategies you want to test as part of your evaluation.
Vectorize then uses your sample documents and this configuration to create and populate a vector index for each of these configurations.
In parallel, the RAG Evaluation process also generates a set of synthetic questions from your sample documents. This allows you to better understand the questions your documents are best sutied to answer and you can decide if this matches your expectations for the types of questions you expect your users to have about these documents.
These questions are then clustered using a k-means clustering technique so you can also quickly assess if the general question topics are in line with your expectations.
Once the question generation pipeline completes and the vector indexes have all been populated, the RAG Evaluation process will loop through the questions, perform an retrieval, then use a re-ranking model to assess the relevance and ordering of the returned results.
The scores are shown for each of the vector indexes that were included in your RAG Evaluation configuration:
Key benefits:
Extremely Fast: RAG Evaluations in Vectorize take around a minute to complete, usually less. This replaces hours/days of writing custom scripts and doing ad-hoc data analysis.
Data-Driven: You can test different embedding models and chunking strategies to determine which configurations yield the highest retrieval accuracy for your unique data.
Comparative Metrics: The platform provides detailed metrics on performance, including relevance and accuracy scores, to ensure the selected configuration is aligned with the specific use case.
De-risk Your Application: The RAG Evaluation process prevents the costly mistake of building a production pipeline only to discover that it underperforms or introduces hallucinations. Instead, the system is optimized from the start to ensure reliable results.
Once the evaluation is complete, Vectorize provides tools to construct production-ready RAG pipelines. These pipelines are designed to work with unstructured data sources and vector databases, enabling seamless integration with language models for accurate, real-time information retrieval.
Pipeline Components:
Data Ingestion and Extraction: The system can ingest data from a wide variety of sources, including cloud storage (e.g., Amazon S3, Google Cloud), file systems, and APIs. Unstructured data such as PDFs, documents, and knowledge bases are processed and extracted to retrieve meaningful text.
Chunking and Embedding: The extracted text is chunked and vectorized using embedding models. The chunking strategy is critical because it determines how well the system can retrieve relevant information during runtime. Embeddings are generated using models such as OpenAI’s text-embedding-v3 large and small, ensuring the data is represented semantically for effective vector search.
Vector Search and Indexing: The resulting embeddings are stored in a vector database (e.g., Pinecone, Elastic, etc.). These databases enable semantic search, where queries are matched to the most contextually relevant data rather than relying on keyword-based searches.
Unlike traditional batch processing systems, Vectorize operates on an event-streaming architecture using Apache Pulsar. This allows for real-time updates to vector indexes and immediate ingestion of new data as it becomes available. Event streaming ensures that the system can scale dynamically and handle bursts of data efficiently, while also providing better control over write operations to vector databases.
Vectorize also provides a number of built in architectural capabilities to ensure reliable, accurate processing of data into your vector search indexes.
Shock Absorber Vectorize acts as a buffer between your unstructured data sources and your vector database. This has an important consquence. Vectorize can act as a rate limiter when a sudden surge of changes floods into the system.
This provides a sort of "shock absorber" effect so that your vector database doesn't get overwhelmed in the event of a spike of changes in your unstructured data sources.
Guaranteed Delivery and Change Processing Vectorize uses an architecture that separates compute and storage of your RAG pipeline. This is important because when it's acting as a buffer in your system, you don't want that buffer to overflow. When overflows occur it results in lost data. In the case of a RAG pipeline, this would mean your vector indexes no longer reliably represent the state of the unstructured data sources they are being populated from.
Vectorize uses a distributed ledger to temporarily write multiple copies of change events that occur in your unstructured data until receiving confirmation that the message has been safely processed in your vector search index.
This architecture has a number of key advantages in terms of reliable processing in your RAG Pipeline:
It becomes very difficult to accumulate so many change events that it causes the pipeline to overflow. The limit is the available storage on the public cloud provider.
Every change event is written to multiple ledger nodes so even in the event of a physical node failure on the storage layer, change events won't be lost.
When events are processed they are deleted from the ledger shortly after - Vectorize does not retain or use your data for anything other than processing your RAG Pipeline.
If your vector database is healthy and a spike of change events come in, the RAG Pipeline compute layer is dynamically scaled to avoid processing delays.
If your vector database becomes unavailable, your RAG Pipeline can store change events until service is restored to ensure eventually consistency between your unstructured data stores and your vector search indexes.
Each component in the RAG Pipeline is independently scalable and deployed to operate reliably even when failure conditions occur.
Error Handling and Retries
Every distributed system will invariably encounter error conditions. Vector databases will become unavailable, API calls to generate embeddings will fail, networks will experience interuptions. These are just the realities of operating software. While there are no known remedies to guarantee these failures never occur, Vectorize can minimize their impact and gracefully handle these conditions so you don't have to.
When errors occur, the logical thing to do is retry processing. This allows you to determine if the failure was a temporary glitch that self-corrects. However, when a component in your architecture becomes strained, continuous retries can exacerbate the problem, creating a cascading failure scenario where upstream components back up and the overall system stability suffers.
For this reason, Vectorize leverages dead-letter topics for irrecoverable errors as well. After reaching the retry threshold in a RAG Pipeline, errors will be dead lettered and the failure condition is surfaced in the form of pipeline notifications. This allows you to investigate and attempt reprocessing manually later once a root cause can be identified and corrected.
Exponential Backoffs
Vectorize smartly handles retries when failure conditions occur. Vectorize will attempt a retry immediately when a failure condition occurs. However, if the failure repeats, a Vectorize RAG Pipeline will introduce delayed re-processing of a change event to avoid overwhelming the failing dependency.
Scalability: Vectorize’s event-driven architecture dynamically scales ingestion and vectorization processes based on workload. By leveraging Apache Pulsar, the system can handle high data volumes and bursts of change events without overwhelming infrastructure, ensuring consistent performance under varying loads.
Fault Tolerance: The separation of compute and storage layers ensures that the system remains operational even in the face of failures. By replicating change events across multiple ledger nodes, Vectorize guarantees no data is lost, even if a node goes down. Additionally, components in the RAG pipeline are independently scalable and can continue functioning despite failures in other parts of the system.
Resilience to Spikes: Vectorize acts as a rate limiter by buffering incoming data, preventing the vector database from being overwhelmed during spikes in data ingestion. This "shock absorber" effect ensures that changes to your unstructured data sources are processed smoothly without creating bottlenecks.
Guaranteed Delivery: Vectorize uses a distributed ledger to store multiple copies of each change event, ensuring that events are delivered and processed reliably. This guarantees eventual consistency between your unstructured data sources and your vector search indexes, even during system outages or vector database unavailability.
Error Handling and Retries: Vectorize implements intelligent error handling through exponential backoff and retry mechanisms. If a failure occurs, retries are attempted with incremental delays to avoid overwhelming strained components. Dead-letter queues ensure that events that persistently fail can be isolated and reviewed, preventing them from blocking the entire system.
One of the challenges of maintaining a production RAG system is keeping vector indexes synchronized with evolving data sources. The Vectorize architecture includes automated processes for refreshing vector indexes, ensuring that the retrieved data used to answer queries remains accurate and up to date.
Synchronization Process: Change Detection: When changes occur in the data source (e.g., a knowledge base update or new customer records), the system triggers an event that processes the new data and updates the vector database.
Depending on the source system, Vectorize will use either a polling approach to check for updates while your pipeline is running or will accept incoming change event notifications (e.g. webhook callbacks)
When a change is detected, Vectorize will perform incremental updates of just the portion of the vector search indexes impacted.
When upstream documents and data are deleted, RAG Pipelines in Vectorize are also able to surgically remove just the entries that are affected by the deletion.
Scheduled or Real-Time: Vectorize offers price/performance tunability based on your requirements. For systems that require immediate updates 24/7 then pipelines can be configured to run continuously and process change events as soon as they occur. For less sensitive applications, you can configure your RAG Pipeline to update daily, weekly, or manually trigger an udpate when it's needed.
Vectorize populates your vector database. You own the data, you control the data, and you can of course query your vector database directly in your RAG application. However, Vectorize offers a Retrieval Endpoint for each RAG Pipeline that can simplify your application architecture and allow you to achieve better performance without adding complexity to your application.
The retrieval endpoint provides a convenient API that:
Automatically vectorizes the input query and performs a k-ANN search against your vector database
Applies any metadata filtering requested
Provides an abstraction layer between you and your vector database provider creating a looser coupling should you ever want to change your vector database implementation down the road.
Provides built in integration with re-ranking models
Includes relevancy scores for the returned chunks and filtering of low relevancy chunks
Is being actively improved with features such as query re-writing, chunk templating and more.
Vectorize's architecture is designed to address the specific challenges of building and scaling RAG systems in production:
RAG Evaluation & Optimization: Vectorize makes it painless and fast to identify the best strategy to vectorize your data.
Handling Unstructured Data: RAG Pipelines are built to work with a wide variety of unstructured data formats, enabling seamless integration of documents, knowledge bases, and customer data with new connectors being added frequenly.
Preventing Stale Vector Indexes: Automated processes ensure that your vector search indexes are kept in sync with data sources, preventing stale or outdated data from being used in AI responses.
Better Retrieval Performance: Built-in capabilities simplify your application architecture while allowing you to achieve high performance in your RAG applications.
Reliable, Resilient Operations: RAG Pipelines that are built on a fault-tolerant data architecture that reflect data engineering best practices for operating distributed data pipelines at scale.