Skip to main content

SingleStore Quickstart

Approximate time to complete: 5-10 minutes, excluding prerequisites

This quickstart will walk you through creating a pipeline that prepares your data for AI agents. You'll set up a pipeline that transforms content from the Vectorize documentation into structured, searchable context in SingleStore - giving agents the foundation they need to reason over your data, not just retrieve it.

Before you begin

Before starting, ensure you have access to the credentials, connection parameters, and API keys as appropriate for the following:

Step 1: Create a SingleStore Deployment

SingleStore offers two types of workspaces. Starter Workspaces are best for small-scale or experimental projects, while Standard Workspaces are designed for applications that need higher resources, scalability, and support for production environments.

When you create your SingleStore account, a Starter Workspace and a database will be created and deployed for you. You can use these for this quickstart, or can create a Standard Workpace to work with instead.

If you're using a Starter Workspace:

  1. The Starter Workspace and database are automatically created when you create your SingleStore account.

    Starter Workspace and Database

  2. Select the workspace, then click Access.

    Access

  3. Save the username, then click the 3 dots and select Reset Password. Copy and securely save the password.

    Save Username and Password

  4. Go to the Overview tab, then click Connect and select Your App.

    Get Connection Info

  5. Save the host and the port from the connection string. You'll use these when you create your data pipeline in Vectorize.

    Save Username and Password

If you're using a Standard Workspace:

  1. Navigate to the SingleStore Cloud Portal and click + New Deployment.

    New Deployment

  2. Name your Workspace Group, select your cloud provider and region, and click Next.

    New Workspace Group

  3. Name the Workspace, optionally adjust the size and settings, then click Create Workspace.

    Create Workspace

  4. If you're on SingleStore's free trial, a database containing MarTech data will be automatically added to the workspace you just created. You can ignore this for the purpose of the Vectorize quickstart. If you'd like to remove it, click on the 3 dots, then click Drop Database.

    Drop Database

  5. Click + Create Database.

    Create Database

  6. Name your database, make sure it's attaching to the correct workspace, then click + Create Database.

    Create Database

  7. Go to your workspace, then click Access.

    Access

  8. Copy and save the username. Click Reset Password to set the password, then securely save the password.

    Access

  9. Go to Workspaces, click the 3 dots, then click Connect Directly.

    Access

  10. Save the host and the port from the connection string. You'll use these when you create your data pipeline in Vectorize.

    Save Username and Password

Step 2: Create a data pipeline on Vectorize

Create a New Data Pipeline

  1. Open the Vectorize Application Console ↗

  2. From the dashboard, click on + New RAG Pipeline under the "RAG Pipelines" section.

    New RAG Pipeline

  3. Enter a name for your pipeline. For example, you can name it quickstart-pipeline.

  4. Click on + New Vector DB to create a new vector database.

    Name Pipeline

  5. Select Singlestore from the list of vector databases.

    SingleStore Card

  6. In the Singlestore configuration screen, enter the parameters in the form using the SingleStore Parameters table below as a guide, then click Create SingleStore Integration.

    SingleStore Card

    SingleStore Parameters

    FieldDescriptionRequired
    NameA descriptive name to identify the integration within Vectorize.Yes
    HostThe host URL from your workspace's connection string.Yes
    PortThe port from your workspace's connection string.Yes
    DatabaseThe name of your database.Yes
    UsernameThe username to access your database.Yes
    PasswordThe password you'll use to access this database.Yes

Configure AI Platform

  1. Click on + New AI Platform.

    New AI Platform

  2. Select OpenAI from the AI platform options.

    Select OpenAI

  3. In the OpenAI configuration screen:

    • Enter a descriptive name for your OpenAI integration.
    • Enter your OpenAI API Key.

    Configure OpenAI

  4. Leave the default values for embedding model, chunk size, and chunk overlap for the quickstart.

    Set Embedding Model

Add Source Connectors

  1. Click on Add Source Connector.

Web Crawler Source

  1. Choose the type of source connector you'd like to use. In this example, select Web Crawler.

Choose Web Crawler

Configure Web Crawler Integration

  1. Name your web crawler source connector, e.g., vectorize-docs.
  2. Set Seed URL(s) to https://docs.vectorize.io.

Configure Web Crawler

  1. Click Create Web Crawler Integration to proceed.

Configure Web Crawler Pipeline

  1. Accept all the default values for the web crawler pipeline configuration:
    • Throttle Wait Between Requests: 500 ms
    • Maximum Error Count: 5
    • Maximum URLs: 1000
    • Maximum Depth: 50
    • Reindex Interval: 3600 seconds

Web Crawler Pipeline Configuration

  1. Click Save Configuration.

Verify Source Connector and Schedule Pipeline

  1. Verify that your web crawler connector is visible under Source Connectors.
  2. Click Next: Schedule RAG Pipeline to continue.

Verify Source Connector

Schedule Data Pipeline

  1. Accept the default schedule configuration
  2. Click Create RAG Pipeline.

Schedule RAG Pipeline

Step 3: Monitor and Test Your Pipeline

Monitor Pipeline Creation and Backfilling

  1. The system will now create, deploy, and backfill the pipeline.
  2. You can monitor the status changes from Creating Pipeline to Deploying Pipeline and Starting Backfilling Process.

Pipeline Creation

  1. Once the initial population is complete, the data pipeline will begin crawling the Vectorize docs and writing vectors to your SingleStore index.

Pipeline Backfilling

View Data Pipeline Status

  1. Once the website crawling is complete, your data pipeline will switch to the Listening state, where it will stay until more updates are available.

Pipeline Listening State

Step 4: Test Your Pipeline in the RAG Sandbox

Access the RAG Sandbox

  1. From the main pipeline overview, click on the RAG Pipelines menu item to view your active pipelines.

Open RAG Pipeline Menu

  1. Find your pipeline in the list of pipelines.
  2. Click on the magnifying glass icon under the RAG Sandbox column to open the sandbox for your selected pipeline.

Open RAG Sandbox

Query Your Data

  1. In the sandbox, you can ask questions about the data you've ingested.
  2. Type a question related to your dataset in the Question field. For example, "What is Vectorize?" since you're working with the Vectorize documentation.
  3. Click Submit to send the question.

Ask a Question

Review Results

  1. After submitting your question, the sandbox will retrieve relevant chunks from your vector database and display them in the Retrieved Context section.
  2. The response from the language model (LLM) will be displayed in the LLM Response section.
    • The Retrieved Context section shows the chunks that were matched with your question.
    • The LLM Response section provides the final output based on the retrieved chunks.

Retrieved Chunks and LLM Response

  1. You can continue to ask different questions or refine your queries to explore your dataset further.
  2. The sandbox allows for dynamic interactions with the data stored in your vector database.

That's it! You've successfully created a data pipeline that transforms your content into structured context, ready for AI agents to reason over and make intelligent decisions.

Was this page helpful?