RAG Pipeline Quickstart

Approximate time to complete: 5-10 minutes, excluding prerequisites

This quickstart will walk you through creating and scheduling a pipeline that uses a web crawler to ingest data from the Vectorize documentation, creates vector embeddings using an OpenAI embedding model, and writes the vectors to a Pinecone vector database.

Go to the Pinecone homepage (https://www.pinecone.io).
Click the Sign Up button located at the top right corner of the page.

Choose your preferred signup method, either:
- Continue with Google
- Continue with GitHub
- Continue with Microsoft Or enter your email address to create an account manually.

Signup Methods

Step 2: Set Up Pinecone

Access Your Pinecone Dashboard

After successfully signing up, log into your Pinecone dashboard.
In the dashboard, click on API keys in the sidebar under the Manage section.

API Keys

Generate and Copy API Key

Click the Create API Key button if you don't have an existing key.
After your key is generated, click the copy icon to save your API key. You will need it in the upcoming steps.

Copy API Key

Make sure to store your API key securely as it is necessary to interact with Pinecone from your application.

Step 3: Create a RAG Pipeline on Vectorize

Create a New RAG Pipeline

After logging into the platform at https://platform.vectorize.io, navigate to RAG Pipelines on the left-hand sidebar.
Click on New RAG Pipeline.

New RAG Pipeline

On the next screen, name your pipeline. For example, "quickstart-pipeline".

Name your Pipeline

Configure the Pipeline

Under Select Vector Database, click on New Vector DB.

New Vector DB

From the list of vector databases, select Pinecone.

Select Pinecone

Name your Pinecone integration (e.g., "quickstart-pinecone") and paste your Pinecone API key that you copied in earlier steps.
Click Create Pinecone Integration.

Pinecone Integration

Provide an Index Name (e.g., "vectorize-quickstart"). If the index does not exist, it will be created automatically.

Index Name

Select AI Platform

Click on New AI Platform to select the platform for generating text embeddings.

Select AI Platform

Select OpenAI from the AI platform options.

Select OpenAI

In the OpenAI configuration screen:
- Enter a descriptive name for your OpenAI integration.
- Enter your OpenAI API Key.

Configure OpenAI

Accept the default values for embedding model, chunking strategy, and chunk size.

Add Source Connector

Add Source Connectors

Click on Add Source Connector.

Web Crawler Source

Choose the type of source connector you'd like to use. In this example, select Web Crawler.

Choose Web Crawler

Configure Web Crawler Integration

Name your web crawler source connector, e.g., vectorize-docs.
Set Seed URL(s) to https://docs.vectorize.io.

Configure Web Crawler

Click Create Web Crawler Integration to proceed.

Configure Web Crawler Pipeline

Accept all the default values for the web crawler pipeline configuration:
- Throttle Wait Between Requests: 500 ms
- Maximum Error Count: 5
- Maximum URLs: 1000
- Maximum Depth: 50
- Reindex Interval: 3600 seconds

Web Crawler Pipeline Configuration

Click Save Configuration.

Verify Source Connector and Schedule Pipeline

Verify that your web crawler connector is visible under Source Connectors.
Click Next: Schedule RAG Pipeline to continue.

Verify Source Connector

Schedule RAG Pipeline

Accept the default schedule configuration
Click Create RAG Pipeline.

Schedule RAG Pipeline

Step 4: Monitor and Test Your Pipeline

Monitor Pipeline Creation and Backfilling

The system will now create, deploy, and backfill the pipeline.
You can monitor the status changes from Creating Pipeline to Deploying Pipeline and Starting Backfilling Process.

Pipeline Creation

Once the initial population is complete, the RAG pipeline will begin crawling the Vectorize docs and writing vectors to your Pinecone index.

Pipeline Backfilling

View RAG Pipeline Status

Once the website crawling is complete, your RAG pipeline will switch to the Listening state, where it will stay until more updates are available.

Pipeline Listening State

Your vector index is now populated and we can try it out using the RAG Sandbox, to do so click on RAG Pipelines from the left hand menu.

RAG Pipelines Page

Test Your Pipeline in the RAG Sandbox

After your pipeline is running, open the RAG Sandbox for the pipeline by clicking the magnifying glass icon on the RAG Pipelines page.

Open RAG Sandbox

In the RAG Sandbox, you can ask questions about the data ingested by the web crawler.
Type a question into the input field (e.g., "What are the key features of Vectorize?"), and click Submit.

Ask Questions in Sandbox

The system will return the most relevant chunks of information from your indexed data, along with an LLM response.

This completes the RAG pipeline quickstart. Your RAG pipeline is now set up and ready for use with Pinecone and Vectorize.

Step 1: Sign Up for Pinecone​

Step 2: Set Up Pinecone​

Access Your Pinecone Dashboard​

Generate and Copy API Key​

Step 3: Create a RAG Pipeline on Vectorize​

Create a New RAG Pipeline​

Configure the Pipeline​

Select AI Platform​

Add Source Connectors​

Configure Web Crawler Integration​

Configure Web Crawler Pipeline​

Verify Source Connector and Schedule Pipeline​

Schedule RAG Pipeline​

Step 4: Monitor and Test Your Pipeline​

Monitor Pipeline Creation and Backfilling​

View RAG Pipeline Status​

Test Your Pipeline in the RAG Sandbox​

Step 1: Sign Up for Pinecone

Step 2: Set Up Pinecone

Access Your Pinecone Dashboard

Generate and Copy API Key

Step 3: Create a RAG Pipeline on Vectorize

Create a New RAG Pipeline

Configure the Pipeline

Select AI Platform

Add Source Connectors

Configure Web Crawler Integration

Configure Web Crawler Pipeline

Verify Source Connector and Schedule Pipeline

Schedule RAG Pipeline

Step 4: Monitor and Test Your Pipeline

Monitor Pipeline Creation and Backfilling

View RAG Pipeline Status

Test Your Pipeline in the RAG Sandbox