OpenAI-Compatible Chat Completions Endpoint (Beta)

📢 Note: The OpenAI-compatible chat completions endpoint is currently in Beta.

The OpenAI-compatible chat completions endpoint allows you to use Vectorize with applications that support OpenAI's Chat Completions API. This endpoint accepts the same payload as the OpenAI chat completions endpoint (not the completions endpoint), making it easy to integrate Vectorize with a wide range of applications that are built to work with OpenAI.

By default, the endpoint will search your Vectorize pipeline for the top 5 chunks that match the last user query, automatically incorporating the context of the retrieved chunks into the answer. This enhances your LLM responses with relevant context from your data.

Accessing the OpenAI-Compatible Endpoint

Navigate to the RAG Pipelines section in the Vectorize dashboard.
Click on the name of your desired pipeline.
In the pipeline details view, click the Connect tab.
You will see details about the retrieval endpoint, including the URL and options to manage access tokens.

Retrieval Endpoint

Using the Endpoint

The OpenAI-compatible endpoint accepts the same payload structure as OpenAI's chat completions API, including all OpenAI parameters. This makes it easy to switch from using OpenAI directly to using Vectorize with your existing code and applications.

Endpoint URL

To use the OpenAI-compatible endpoint, you need to:

Take the retrieval endpoint URL from the Connect tab
Remove the /retrieval portion from the end
Use this as your base URL in any library or tool that supports OpenAI

For example, if your retrieval endpoint is:

https://client.app.vectorize.io/v1/org/ORG-ID/pipelines/PIPELINE-ID/retrieval

Your OpenAI-compatible base URL would be:

https://client.app.vectorize.io/v1/org/ORG-ID/pipelines/PIPELINE-ID

This URL replaces the default OpenAI base URL (https://api.openai.com/v1) in your API calls.

Example Request

curl -L \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What are the key features of Vectorize?"
      }
    ]
  }' \
  "https://client.app.vectorize.io/v1/org/ORG-ID/pipelines/PIPELINE-ID/chat/completions"

Using the OpenAI SDK

You can use the official OpenAI SDKs with Vectorize by simply changing the base URL. This allows you to use familiar OpenAI client libraries while leveraging your Vectorize RAG pipeline.

Python Example

Install the OpenAI client library:

pip install openai

Generate a token and set it in the environment:

export OPENAI_API_KEY=YOUR_VECTORIZE_API_KEY

Run the client with a custom base URL:

import openai
import os

client = openai.Client(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="https://api-dev.vectorize.io/v1/org/ORG-ID/pipelines/PIPELINE-ID"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "How to call the API?",
        },
    ],
)

print(response.choices[0].message.content)

Important Notes

The endpoint does not currently support streaming responses.
By default, it automatically retrieves the top 5 chunks that match the last user query.
You must use a valid access token. See Managing Retrieval Endpoint Tokens for details.
The model parameter is not configurable and will be ignored if passed. You can simply pass any value, such as "vectorize-chat" as shown in the example.

Customizing Retrieval Behavior

You can customize the retrieval behavior by passing a tool of type "function" with the name "vectorize" in your request:

Important: To set a custom value for any parameter, you must include a default field with your desired value in the parameter definition. The parameter will use this value instead of the system default.

curl -L \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-token" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What are the key features of Vectorize?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "vectorize",
          "parameters": {
            "type": "object",
            "properties": {
              "numResults": {
                "type": "integer",
                "description": "Number of results to retrieve from the pipeline (default: 3)",
                "default": 10  # This sets numResults to 10 instead of using the default of 3
              },
              "customPrompt": {
                "type": "string",
                "description": "Custom prompt to use for generation. Must include the pattern {documents} which will be replaced with retrieved documents."
              },
              "metadata-filters": {
                "type": "array",
                "description": "Metadata filters to narrow down search results",
                "default": [{"origin": "web-crawler"}]  # This sets a custom metadata filter
              }
            }
          }
        }
      }
    ]
  }' \
  "https://client.app.vectorize.io/v1/org/ORG-ID/pipelines/PIPELINE-ID/chat/completions"

Available Customization Parameters

Parameter	Type	Default	Description
numResults	integer	5	Number of results to retrieve from the pipeline
customPrompt	string	-	Custom prompt template that must include the `{documents}` pattern, which will be replaced with the retrieved documents
metadata-filters	array	-	Metadata filters to narrow down search results, e.g. `[{"origin": "web-crawler"}]`

Integration with OpenWebUI

You can use Vectorize's OpenAI-compatible chat completions endpoint with OpenWebUI to enhance your chat experience with context from your organization's data.

Setup Steps

Create a RAG pipeline in Vectorize that processes the documents/sources you want to use for RAG.
Once your pipeline is created and has processed all your documents, click on the pipeline details and then the Connect tab.
Generate a token for accessing the retrieval endpoint.
In OpenWebUI, go to the Admin settings and add an OpenAI connection:
- For the URL, use the base URL from the Vectorize Connect tab (do not include the /retrieval part)
- Enter your generated token
Test the connection using the circular arrows button. Once confirmed working, click Save.
Since the API doesn't currently support streaming responses, you need to disable them for the model:
- Go to the Models settings in OpenWebUI
- Select Advanced Params
- Toggle "Stream Chat Responses" to Off
- Save the changes
When you select the "vectorize-chat" model in a chat conversation, it will search the documents in your RAG pipeline for relevant information and automatically incorporate that context when answering questions.

What's next?

Learn more about Using the Retrieval Endpoint for advanced features
Explore Metadata Filtering to narrow down search results
Try Advanced Retrieval techniques like query rewriting

Accessing the OpenAI-Compatible Endpoint​

Using the Endpoint​

Endpoint URL​

Example Request​

Using the OpenAI SDK​

Python Example​

Important Notes​

Customizing Retrieval Behavior​

Available Customization Parameters​

Integration with OpenWebUI​

Setup Steps​

What's next?​