RAG Pipeline Management

In this guide, we will walk through the steps to manage an existing RAG pipeline in Vectorize. This includes viewing pipeline details, monitoring its health, and updating its settings as needed.

Getting to the RAG Pipeline Status

  1. Navigate to the RAG Pipelines section in the dashboard. Here you will see a list of all configured pipelines. Click on the name of the pipeline you want to manage. In this example, we'll select friends-scripts.

RAG Pipeline Overview

  1. Once inside the pipeline view, you will land on the Overview tab. This page gives you a detailed look at the pipeline's performance and configuration.

Pipeline Stats and Header

  1. At the top of the page, you will see key metrics and details about the pipeline:

    • Total Documents: The number of documents ingested into the pipeline from the source system (e.g., Amazon S3).

    • Total Vectors: The number of vector embeddings currently stored in the vector database.

    • Change Today: The number of vectors that were added, updated, or deleted today.

    • RAG Pipeline Status: The current status of the pipeline (e.g., Listening, Deployed, Backfilling).

    These metrics provide an overview of the pipeline's activity and its current operational state.

Monitoring Activity in the Index

  1. The graph of activity provides a visual representation of the indexing activity over time. It shows how many vectors have been added, updated, or deleted on each day. The different colored bars allow you to track changes in your data at a glance.

Monitoring Pipeline Status

  1. On the right-hand side, you will see the Integration Health section. This section shows the connectivity and status of the systems that make up your pipeline, including:

    • Vector Databases: The health of the vector database (e.g., DataStax Astra).

    • Source Connectors: The status of data sources (e.g., Amazon S3).

    • AI Integrations: Whether the AI platform (e.g., OpenAI) is functioning properly, including checks for API key validity.

    Green indicators mean everything is working fine, while any connectivity issues will be flagged here.

Managing Pipeline Schedule

  1. The Current Schedule section provides information about how often the pipeline is set to run (e.g., weekly, daily) and the duration of each run. You can see an estimate of the total monthly hours used and any free hours available. To adjust the schedule, click Edit Schedule to update the timing or frequency of the pipeline runs.

Viewing Event Logs

  1. Scroll down to the Event Logs section to track the actions taken by the pipeline. This section logs important status changes, such as:

    • Status changes (e.g., from Deployed to Listening)

    • Pipeline shutdowns or start-ups

    • The number of chunks written to the vector database

    • Source system updates

    Each log is timestamped, allowing you to review and trace the pipeline's activity over time.

Conclusion

Managing a RAG pipeline in Vectorize is simple and intuitive. The dashboard provides all the necessary information to monitor, update, and troubleshoot your pipeline. By following the steps outlined above, you can ensure that your pipeline remains healthy and up to date, providing reliable data for your applications.

Last updated