Skip to main content

Pipelines

note

📢 Note: The API is currently in Beta.

In Vectorize you can create pipelines to ingest data from multiple sources into a Vector Database. In this guide, we will deploy a pipeline that will ingest a local file.

Source: Create a File Upload connector

Make sure to include the code and imports from the Getting Started page.

First, we create a File Upload connector that will hold our file.

connectors_api = v.ConnectorsApi(api)
response = connectors_api.create_source_connector(org, [v.CreateSourceConnector(
name="From api",
type=v.SourceConnectorType.FILE_UPLOAD)
]
)
source_connector_id = response.connectors[0].id

Then, we can upload the file:

import urllib3, json, os

file_path = "path/to/file.pdf"

http = urllib3.PoolManager()
uploads_api = v.UploadsApi(api)
metadata = {"created-from-api": True}
upload_response = uploads_api.start_file_upload_to_connector(
org, source_connector_id, v.StartFileUploadToConnectorRequest(
name=file_path.split("/")[-1],
content_type="application/pdf",
# add additional metadata that will be stored along with each chunk in the vector database
metadata=json.dumps(metadata))
)

with open(file_path, "rb") as f:
response = http.request("PUT", upload_response.upload_url, body=f, headers={"Content-Type": "application/pdf", "Content-Length": str(os.path.getsize(file_path))})
if response.status != 200:
print("Upload failed: ", response.data)
else:
print("Upload successful")

AI Platform and Vector Database

We will use the Built-In AI Platform and Vector Database. Since they already exist in the platform, we need to retrieve their IDs.

ai_platforms = connectors_api.get_ai_platform_connectors(org)
builtin_ai_platform = [c.id for c in ai_platforms.ai_platform_connectors if c.type == v.AIPlatformType.VECTORIZE][0]

vector_databases = connectors_api.get_destination_connectors(org)
builtin_vector_db = [c.id for c in vector_databases.destination_connectors if c.type == v.DestinationConnectorType.VECTORIZE][0]

Configure and deploy the pipeline

Now we need to configure the pipeline property and deploy it.

response = pipelines.create_pipeline(org, v.PipelineConfigurationSchema(
source_connectors=[v.SourceConnectorSchema(id=source_connector_id, type=v.SourceConnectorType.FILE_UPLOAD, config={})],
destination_connector=v.DestinationConnectorSchema(id=builtin_vector_db, type=v.DestinationConnectorType.VECTORIZE, config={}),
ai_platform=v.AIPlatformSchema(id=builtin_ai_platform, type=v.AIPlatformType.VECTORIZE, config={}),
pipeline_name="My Pipeline From API",
schedule=v.ScheduleSchema(type="manual")
))
pipeline_id = response.data.id
pipeline_id

The pipeline will be deployed and our file will be ingested into the Vector Database.

Next steps

Now you can either decide to perform a Vector Search or generate a Private Deep Research.

Was this page helpful?