Vector Databases

Vector databases are a critical component of RAG (Retrieval-Augmented Generation) pipelines in Vectorize. They store and index vector embeddings, enabling efficient similarity searches crucial for retrieval operations.

What are Vector Databases?

Vector databases are specialized storage systems designed to handle high-dimensional vector data. In the context of RAG pipelines, they store the vector representations of your documents, allowing for quick and efficient similarity searches when retrieving relevant information.

Available Vector Database Integrations

Vectorize supports several vector database integrations to suit different needs and preferences:

  • Astra DB: DataStax's cloud-native Cassandra-based vector database.

  • Couchbase Capella: Couchbase's fully managed NoSQL database with vector search capabilities.

  • Elastic Cloud: Elasticsearch's cloud offering with vector search functionality.

  • Pinecone: Purpose-built vector database for machine learning and AI applications.

Configuring Vector Database Integrations

You can configure vector database integrations in two ways:

1. From the Vector Databases Section

  1. Navigate to the Vectorize dashboard.

  2. In the left sidebar, under "Integrations," click on "Vector Databases."

    Vector Databases Menu
  3. You'll see a list of currently configured vector database integrations in your workspace.

  4. To add a new integration, click the "New Vector Database" button.

    Vector Databases Menu
  5. Choose from the list of available vector database options.

Vector Databases Options
  1. Follow the prompts to configure the selected vector database integration.

2. While Creating a RAG Pipeline

  1. During the RAG Pipeline creation process, you'll reach a step to configure the vector database.

  2. You can either select an existing vector database integration or create a new one.

    RAG Pipeline Vector Database
  3. If creating a new integration, choose from the available options and follow the configuration steps.

RAG Pipeline Vector Databases Options

Note: Vector database integrations configured as part of a RAG Pipeline will automatically appear in the Vector Databases list for your organization and can be reused in future RAG pipelines.

Configuration Details

When setting up a vector database integration, you'll typically need to provide:

  • Connection details (e.g., endpoint URL, region)

  • Authentication credentials (e.g., API keys, access tokens)

  • Index or collection name

  • Any specific configuration options for the chosen database

Some integrations may also allow you to configure metadata filtering options to refine your vector searches.

For detailed information on configuring specific vector database integrations, please refer to their individual documentation pages linked above.

Last updated