Skip to main content

OpenAI-Compatible Chat Completions Endpoint (Beta)

📢 Note: The OpenAI-compatible chat completions endpoint is currently in Beta.

The OpenAI-compatible chat completions endpoint allows you to use Vectorize with applications that support OpenAI's Chat Completions API. This endpoint accepts the same payload as the OpenAI chat completions endpoint (not the completions endpoint), making it easy to integrate Vectorize with a wide range of applications that are built to work with OpenAI.

By default, the endpoint will search your Vectorize pipeline for the top 5 chunks that match the last user query, automatically incorporating the context of the retrieved chunks into the answer. This enhances your LLM responses with relevant context from your data.

Accessing the OpenAI-Compatible Endpoint

  1. Navigate to the RAG Pipelines section in the Vectorize dashboard.

  2. Click on the name of your desired pipeline.

  3. In the pipeline details view, click the Connect tab.

  4. You will see details about the retrieval endpoint, including the URL and options to manage access tokens.

Retrieval Endpoint

Using the Endpoint

The OpenAI-compatible endpoint accepts the same payload structure as OpenAI's chat completions API, including all OpenAI parameters. This makes it easy to switch from using OpenAI directly to using Vectorize with your existing code and applications.

Endpoint URL

To use the OpenAI-compatible endpoint, you need to:

  1. Take the retrieval endpoint URL from the Connect tab
  2. Remove the /retrieval portion from the end
  3. Use this as your base URL in any library or tool that supports OpenAI

For example, if your retrieval endpoint is:

https://client.app.vectorize.io/v1/org/ORG-ID/pipelines/PIPELINE-ID/retrieval

Your OpenAI-compatible base URL would be:

https://client.app.vectorize.io/v1/org/ORG-ID/pipelines/PIPELINE-ID

This URL replaces the default OpenAI base URL (https://api.openai.com/v1) in your API calls.

Example Request

curl -L \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-token" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What are the key features of Vectorize?"
}
]
}' \
"https://client.app.vectorize.io/v1/org/ORG-ID/pipelines/PIPELINE-ID/chat/completions"

Important Notes

  • The endpoint does not currently support streaming responses.
  • By default, it automatically retrieves the top 5 chunks that match the last user query.
  • You must use a valid access token. See Managing Retrieval Endpoint Tokens for details.
  • The model parameter is not configurable and will be ignored if passed. You can simply pass any value, such as "vectorize-chat" as shown in the example.

Customizing Retrieval Behavior

You can customize the retrieval behavior by passing a tool of type "function" with the name "vectorize" in your request:

curl -L \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-token" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What are the key features of Vectorize?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "vectorize",
"parameters": {
"type": "object",
"properties": {
"numResults": {
"type": "integer",
"description": "Number of results to retrieve from the pipeline (default: 5)"
},
"customPrompt": {
"type": "string",
"description": "Custom prompt to use for generation. Must include the pattern {documents} which will be replaced with retrieved documents."
},
"metadata-filters": {
"type": "array",
"description": "Metadata filters to narrow down search results",
"default": [{"origin": "web-crawler"}]
}
}
}
}
}
]
}' \
"https://client.app.vectorize.io/v1/org/ORG-ID/pipelines/PIPELINE-ID/chat/completions"

Available Customization Parameters

ParameterTypeDefaultDescription
numResultsinteger5Number of results to retrieve from the pipeline
customPromptstring-Custom prompt template that must include the {documents} pattern, which will be replaced with the retrieved documents
metadata-filtersarray-Metadata filters to narrow down search results, e.g. [{"origin": "web-crawler"}]

Integration with OpenWebUI

You can use Vectorize's OpenAI-compatible chat completions endpoint with OpenWebUI to enhance your chat experience with context from your organization's data.

Setup Steps

  1. Create a RAG pipeline in Vectorize that processes the documents/sources you want to use for RAG.

  2. Once your pipeline is created and has processed all your documents, click on the pipeline details and then the Connect tab.

  3. Generate a token for accessing the retrieval endpoint.

  4. In OpenWebUI, go to the Admin settings and add an OpenAI connection:

    • For the URL, use the base URL from the Vectorize Connect tab (do not include the /retrieval part)
    • Enter your generated token
  5. Test the connection using the circular arrows button. Once confirmed working, click Save.

  6. Since the API doesn't currently support streaming responses, you need to disable them for the model:

    • Go to the Models settings in OpenWebUI
    • Select Advanced Params
    • Toggle "Stream Chat Responses" to Off
    • Save the changes
  7. When you select the "vectorize-chat" model in a chat conversation, it will search the documents in your RAG pipeline for relevant information and automatically incorporate that context when answering questions.

What's next?

Was this page helpful?