RAG Pipeline Management
In this guide, we will walk through the steps to manage an existing RAG pipeline in Vectorize. This includes viewing pipeline details, monitoring its health, and updating its settings as needed.
Getting to the RAG Pipeline Status
Navigate to the RAG Pipelines section in the dashboard. Here you will see a list of all configured pipelines. Click on the name of the pipeline you want to manage. In this example, we'll select
friends-scripts
.
RAG Pipeline Overview
Once inside the pipeline view, you will land on the Overview tab. This page gives you a detailed look at the pipeline's performance and configuration.
Pipeline Stats and Header
At the top of the page, you will see key metrics and details about the pipeline:
Total Documents: The number of documents ingested into the pipeline from the source system (e.g., Amazon S3).
Total Vectors: The number of vector embeddings currently stored in the vector database.
Change Today: The number of vectors that were added, updated, or deleted today.
RAG Pipeline Status: The current status of the pipeline (e.g., Listening, Deployed, Backfilling).
These metrics provide an overview of the pipeline's activity and its current operational state.
Monitoring Activity in the Index
The graph of activity provides a visual representation of the indexing activity over time. It shows how many vectors have been added, updated, or deleted on each day. The different colored bars allow you to track changes in your data at a glance.
Monitoring Pipeline Status
On the right-hand side, you will see the Integration Health section. This section shows the connectivity and status of the systems that make up your pipeline, including:
Vector Databases: The health of the vector database (e.g., DataStax Astra).
Source Connectors: The status of data sources (e.g., Amazon S3).
AI Integrations: Whether the AI platform (e.g., OpenAI) is functioning properly, including checks for API key validity.
Green indicators mean everything is working fine, while any connectivity issues will be flagged here.
Managing Pipeline Schedule
The Current Schedule section provides information about how often the pipeline is set to run (e.g., weekly, daily) and the duration of each run. You can see an estimate of the total monthly hours used and any free hours available. To adjust the schedule, click Edit Schedule to update the timing or frequency of the pipeline runs.
Viewing Event Logs
Scroll down to the Event Logs section to track the actions taken by the pipeline. This section logs important status changes, such as:
Status changes (e.g., from Deployed to Listening)
Pipeline shutdowns or start-ups
The number of chunks written to the vector database
Source system updates
Each log is timestamped, allowing you to review and trace the pipeline's activity over time.
Conclusion
Managing a RAG pipeline in Vectorize is simple and intuitive. The dashboard provides all the necessary information to monitor, update, and troubleshoot your pipeline. By following the steps outlined above, you can ensure that your pipeline remains healthy and up to date, providing reliable data for your applications.
Last updated