Skip to main content

Azure Blob Storage

The Azure Blob Storage Source Connector allows you to integrate Azure Blob Storage as a data source for your pipelines. This guide explains the configuration options available when setting up a Azure Blob Storage connector.

Before you begin

Before starting, you'll need:

  • Your Azure storage account name.
  • Your Azure storage account key.
  • The Azure container's name.

If you don't have Azure Blog Storage set up already,, check out our guide How to Configure Azure Blob Storage.

Configure the Connector

  1. Click Source Connectors from the main menu.

  2. Click New Source Connector from the Source Connectors page.

  3. Select the Azure Blob Storage card. Azure Blob Storage Card

  4. Enter connection parameters in the form using the Azure Blob Parameters table below as a guide.

Azure Blob Storage Configuration

  1. Click Create Azure Blob Integration to test connector connectivity and save your configuration.

Azure Blob Parameters

FieldNotesRequired
NameA descriptive name to identify the connector within Vectorize.Yes
ContainerYour source data files must be inside a container.Yes
Storage Account NameYour Azure Blob Storage account name.Yes
Storage Account KeyYour Azure Blob Storage instance key.Yes
EndpointYour Azure Blob Storage endpoint.No

When you specify your configured Azure Blob Storage source in your pipeline configuration, Vectorize ingests all compatible files at the specified endpoint.

Configuring the Azure Blob Storage Connector in a RAG Pipeline

When you configure a Azure Blob Storage connector in a Retrieval-Augmented Generation (RAG) pipeline, you need to provide two parts of configuration. The first part is the general configuration that is common to all connectors, such as the name of the connector.

The second part is the configuration that is specific to your RAG Pipeline, such as which files and directories should be processed.

Configuring Azure Blob Storage for RAG Pipeline

The following table outlines the fields available when configuring a Azure Blob Storage source for use within a Retrieval-Augmented Generation (RAG) pipeline.

FieldDescriptionRequired
File ExtensionsSpecifies the types of files to be included (e.g., PDF, DOCX, HTML, Markdown, Text).Yes
Polling IntervalInterval (in seconds) at which the connector will check Azure Blob Storage for updates.No
Path PrefixA prefix path to filter the files in the Blob (optional).No
Path Metadata RegexA regex pattern used to extract metadata from the file paths (optional).No
Path Regex Group NamesGroup names for the regex pattern (used in the Path Metadata Regex) to label extracted metadata (optional).No

What's next?

  • If you haven't yet built a connector to your vector database, go to Configuring Vector Database Connectors and select the platform you prefer to use for storing output vectors.

    OR

  • If you're ready to start producing vector embeddings from your input data, head to Pipeline Basics. Select your new connector as the data source to use it in your pipeline.

Was this page helpful?