Generate a Private Deep Research on a pipeline
Beta Feature
The API and Deep Research are currently in beta. Features and configuration may change.
Prerequisites
Before you begin, you'll need:
- A Vectorize account
- An API access token (how to create one)
- Your organization ID (see below)
- A pipeline ID (see below)
Finding your Organization ID
Your organization ID is in the Vectorize platform URL:
https://platform.vectorize.io/organization/[YOUR-ORG-ID]
For example, if your URL is:
https://platform.vectorize.io/organization/ecf3fa1d-30d0-4df1-8af6-f4852bc851cb
Your organization ID is: ecf3fa1d-30d0-4df1-8af6-f4852bc851cb
Finding your Pipeline ID
Navigate to your pipeline in the Vectorize platform. The pipeline ID is shown in:
- The URL:
https://platform.vectorize.io/organization/[org-id]/pipeline/[PIPELINE-ID]
- The pipeline details page
- The "Connect" tab of your pipeline
Generate the Deep Research
With your pipeline ID ready, you can now start a deep research task. This will analyze your pipeline's data to generate comprehensive insights based on your query.
- Python
- Node.js
import vectorize_client as v
# Create API interface
pipelines_api = v.PipelinesApi(apiClient)
# Start deep research
response = pipelines_api.start_deep_research(
organization_id,
pipeline_id,
v.StartDeepResearchRequest(
query="What is the meaning of life?",
web_search=True # Enable web search for comprehensive results
)
)
research_id = response.research_id
print(f"Research started with ID: {research_id}")
// This snippet uses async operations and should be run in an async context
(async () => {
const vectorize = require('@vectorize-io/vectorize-client')
const { PipelinesApi } = vectorize;
// Create API interface
const pipelinesApi = new PipelinesApi(apiClient);
// Start deep research
const response = await pipelinesApi.startDeepResearch({
organizationId: "your-org-id",
pipelineId: pipelineId,
startDeepResearchRequest: {
query: "What is the meaning of life?",
webSearch: true // Enable web search for comprehensive results
}
});
const researchId = response.researchId;
console.log(`Research started with ID: ${researchId}`);
})();
Get the Deep Research result
Deep Research tasks run asynchronously. Use the research ID returned from the previous step to check the status and retrieve your results.
- Python
- Node.js
import vectorize_client as v
import time
# Create API interface
pipelines_api = v.PipelinesApi(apiClient)
# Check research status and get results
max_attempts = 60 # Maximum 5 minutes (60 * 5 seconds)
attempt = 0
while attempt < max_attempts:
try:
response = pipelines_api.get_deep_research_result(
organization_id,
pipeline_id,
research_id
)
if response.ready:
if response.data.success:
print("Research completed successfully!")
print(response.data.markdown)
else:
print("Research failed:", response.data.error)
break
print("Research in progress...")
time.sleep(5) # Wait 5 seconds before checking again
attempt += 1
except Exception as e:
print(f"Error checking research status: {e}")
raise
if attempt >= max_attempts:
print("Research timed out after 5 minutes")
// This snippet uses async operations and should be run in an async context
(async () => {
const vectorize = require('@vectorize-io/vectorize-client')
// COMPLETE_EXAMPLE_PREREQUISITES:
// - env_vars: VECTORIZE_API_KEY, VECTORIZE_ORGANIZATION_ID
// - notes: Requires a pipeline ID configured for deep research
// - description: Perform deep research queries on your data
const { PipelinesApi } = vectorize;
// Create API interface
const pipelinesApi = new PipelinesApi(apiClient);
// Check research status and get results
const maxAttempts = 60; // Maximum 5 minutes (60 * 5 seconds)
let attempt = 0;
let response;
while (attempt < maxAttempts) {
try {
response = await pipelinesApi.getDeepResearchResult({
organizationId: "your-org-id",
pipelineId: pipelineId,
researchId: researchId
});
if (response.ready) {
if (response.data.success) {
console.log("Research completed successfully!");
console.log(response.data.markdown);
} else {
console.log("Research failed:", response.data.error);
}
break;
}
console.log("Research in progress...");
await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5 seconds
attempt += 1;
} catch (error) {
console.log(`Error checking research status: ${error.message}`);
throw error;
}
}
if (attempt >= maxAttempts) {
console.log("Research timed out after 5 minutes");
}
})();
Complete Example
Here's all the code from this guide combined into a complete, runnable example:
- Python
- Node.js
Required Environment Variables:
• `VECTORIZE_API_KEY`
• `VECTORIZE_ORGANIZATION_ID`
Additional Requirements:
• Requires a pipeline ID configured for deep research
• `VECTORIZE_API_KEY`
• `VECTORIZE_ORGANIZATION_ID`
Additional Requirements:
• Requires a pipeline ID configured for deep research
#!/usr/bin/env python3
"""
Complete example for deep research queries.
This is a hand-written example that corresponds to the test file:
api-clients/python/tests/pipelines/deep_research.py
IMPORTANT: Keep this file in sync with the test file's snippets!
"""
import os
import sys
import time
import vectorize_client as v
def get_api_config():
"""Get API configuration from environment variables."""
organization_id = os.environ.get("VECTORIZE_ORGANIZATION_ID")
api_key = os.environ.get("VECTORIZE_API_KEY")
if not organization_id or not api_key:
print("🔑 Setup required:")
print("1. Get your API key from: https://app.vectorize.io/settings")
print("2. Set environment variables:")
print(" export VECTORIZE_ORGANIZATION_ID='your-org-id'")
print(" export VECTORIZE_API_KEY='your-api-key'")
sys.exit(1)
# Always use production API
configuration = v.Configuration(
host="https://api.vectorize.io/v1",
access_token=api_key
)
return configuration, organization_id
def create_pipeline_for_research(api_client, organization_id):
"""Create a pipeline for deep research."""
print("🚀 Creating pipeline for deep research...")
# Get required connector IDs from environment
ai_platform_connector_id = os.environ.get("VECTORIZE_AI_PLATFORM_CONNECTOR_ID")
destination_connector_id = os.environ.get("VECTORIZE_DESTINATION_CONNECTOR_ID")
if not ai_platform_connector_id or not destination_connector_id:
print("❌ Missing required connector IDs")
print(" Please set:")
print(" - VECTORIZE_AI_PLATFORM_CONNECTOR_ID")
print(" - VECTORIZE_DESTINATION_CONNECTOR_ID")
print("\n💡 Run get_vectorize_connectors.py to find your VECTORIZE connector IDs")
sys.exit(1)
# First, create a source connector
connectors_api = v.SourceConnectorsApi(api_client)
try:
# Create file upload connector
file_upload = v.FileUpload(
name="deep-research-source",
type="FILE_UPLOAD",
config={}
)
request = v.CreateSourceConnectorRequest(file_upload)
source_response = connectors_api.create_source_connector(
organization_id,
request
)
source_connector_id = source_response.connector.id
print(f"✅ Created source connector: {source_connector_id}")
except Exception as e:
print(f"❌ Error creating source connector: {e}")
raise
# Create the pipeline
pipelines_api = v.PipelinesApi(api_client)
pipeline_configuration = v.PipelineConfigurationSchema(
pipeline_name="Deep Research Pipeline",
source_connectors=[
v.PipelineSourceConnectorSchema(
id=source_connector_id,
type="FILE_UPLOAD",
config={}
)
],
ai_platform_connector=v.PipelineAIPlatformConnectorSchema(
id=ai_platform_connector_id,
type="VECTORIZE",
config={}
),
destination_connector=v.PipelineDestinationConnectorSchema(
id=destination_connector_id,
type="VECTORIZE",
config={}
),
schedule=v.ScheduleSchema(type="manual")
)
try:
response = pipelines_api.create_pipeline(
organization_id,
pipeline_configuration
)
pipeline_id = response.data.id
print(f"✅ Created pipeline: {pipeline_id}")
print(f" Name: Deep Research Pipeline")
# Wait for pipeline to be ready
print("⏳ Waiting for pipeline to be ready...")
max_wait = 60 # 60 seconds
for i in range(max_wait):
pipeline = pipelines_api.get_pipeline(organization_id, pipeline_id)
status = pipeline.data.status
if status in ["LISTENING", "IDLE"]:
print(f"✅ Pipeline is ready! Status: {status}\n")
break
elif status in ["ERROR_DEPLOYING", "SHUTDOWN"]:
print(f"❌ Pipeline failed to deploy: {status}")
sys.exit(1)
if i % 10 == 0:
print(f" Current status: {status}")
time.sleep(1)
return pipeline_id, source_connector_id
except Exception as e:
print(f"❌ Error creating pipeline: {e}")
# Clean up source connector if pipeline creation failed
try:
connectors_api.delete_source_connector(organization_id, source_connector_id)
except:
pass
raise
def start_deep_research(api_client, organization_id, pipeline_id):
"""Start a deep research query."""
# Create API interface
pipelines_api = v.PipelinesApi(api_client)
# Start deep research
response = pipelines_api.start_deep_research(
organization_id,
pipeline_id,
v.StartDeepResearchRequest(
query="What is the meaning of life?",
web_search=True # Enable web search for comprehensive results
)
)
research_id = response.research_id
print(f"Research started with ID: {research_id}")
return research_id
def get_deep_research_result(api_client, organization_id, pipeline_id, research_id):
"""Get deep research results."""
# Create API interface
pipelines_api = v.PipelinesApi(api_client)
# Check research status and get results
max_attempts = 60 # Maximum 5 minutes (60 * 5 seconds)
attempt = 0
while attempt < max_attempts:
try:
response = pipelines_api.get_deep_research_result(
organization_id,
pipeline_id,
research_id
)
if response.ready:
if response.data.success:
print("Research completed successfully!")
print(response.data.markdown)
return response.data.markdown
else:
print("Research failed:", response.data.error)
return None
break
print("Research in progress...")
time.sleep(5) # Wait 5 seconds before checking again
attempt += 1
except Exception as e:
print(f"Error checking research status: {e}")
raise
if attempt >= max_attempts:
print("Research timed out after 5 minutes")
return None
def start_research_without_web_search(api_client, organization_id, pipeline_id):
"""Demonstrate research without web search."""
pipelines_api = v.PipelinesApi(api_client)
try:
response = pipelines_api.start_deep_research(
organization_id,
pipeline_id,
v.StartDeepResearchRequest(
query="Explain quantum computing in simple terms",
web_search=False # Only use your pipeline's data
)
)
print(f"✅ Research started without web search: {response.research_id}")
return response.research_id
except Exception as e:
print(f"❌ Error starting research without web search: {e}")
return None
def main():
"""Main function demonstrating deep research functionality."""
print("=== Deep Research Example ===\n")
try:
# Get configuration
configuration, organization_id = get_api_config()
print(f"⚙️ Configuration:")
print(f" Organization ID: {organization_id}")
print(f" Host: {configuration.host}\n")
# Initialize API client
# Initialize API client with proper headers for local env
with v.ApiClient(configuration) as api_client:
# Create a pipeline for deep research
pipeline_id, source_connector_id = create_pipeline_for_research(api_client, organization_id)
# Example 1: Deep research with web search
print("🧠 Starting Deep Research with Web Search")
print(" Query: 'What is the meaning of life?'")
print(" Web Search: Enabled\n")
research_id = start_deep_research(api_client, organization_id, pipeline_id)
if research_id:
print("\n📊 Getting Research Results")
result = get_deep_research_result(api_client, organization_id, pipeline_id, research_id)
if result:
print(f"\n📄 Research Results Summary:")
print("=" * 60)
# Show first few lines of the result
result_lines = result.split('\n')[:10]
for line in result_lines:
if line.strip():
print(line)
if len(result.split('\n')) > 10:
print("... (truncated for display)")
print("=" * 60)
print(f"✅ Complete research result: {len(result)} characters")
else:
print("❌ Failed to get research results")
# Example 2: Research without web search
print(f"\n🔬 Deep Research without Web Search")
print(" Query: 'Explain quantum computing in simple terms'")
print(" Web Search: Disabled (uses only your pipeline's data)\n")
research_id_no_web = start_research_without_web_search(api_client, organization_id, pipeline_id)
if research_id_no_web:
print("⏳ Getting results (this may take a moment)...")
# You could also get the results for this query, but for brevity we'll just start it
print("✅ Research started successfully")
# Optional: Wait for and display these results too
# result_no_web = get_deep_research_result(api_client, organization_id, pipeline_id, research_id_no_web)
print(f"\n🎉 Deep research examples completed!")
print(" ✅ Demonstrated research with web search enhancement")
print(" ✅ Demonstrated research using only pipeline data")
print(" 💡 Deep research combines your data with AI reasoning for comprehensive insights")
# Clean up resources
print("\n🧹 Cleaning up resources...")
try:
pipelines_api = v.PipelinesApi(api_client)
pipelines_api.delete_pipeline(organization_id, pipeline_id)
print(" ✅ Pipeline deleted")
connectors_api = v.SourceConnectorsApi(api_client)
connectors_api.delete_source_connector(organization_id, source_connector_id)
print(" ✅ Source connector deleted")
except Exception as e:
print(f" ⚠️ Cleanup warning: {e}")
except ValueError as e:
print(f"❌ Configuration Error: {e}")
print("\n💡 Make sure to set the required environment variables:")
print(" export VECTORIZE_ORGANIZATION_ID='your-org-id'")
print(" export VECTORIZE_API_KEY='your-api-key'")
except Exception as e:
print(f"❌ Error: {e}")
sys.exit(1)
if __name__ == "__main__":
main()
Required Environment Variables:
• `VECTORIZE_API_KEY`
• `VECTORIZE_ORGANIZATION_ID`
Additional Requirements:
• Requires a pipeline ID configured for deep research
• `VECTORIZE_API_KEY`
• `VECTORIZE_ORGANIZATION_ID`
Additional Requirements:
• Requires a pipeline ID configured for deep research
#!/usr/bin/env node
/**
* Complete example for deep research operations.
* This is a hand-written example that corresponds to the test file:
* api-clients/javascript/tests/pipelines/deep_research.js
*
* IMPORTANT: Keep this file in sync with the test file's snippets!
*/
const vectorize = require('@vectorize-io/vectorize-client');
const fs = require('fs');
const path = require('path');
// For test environment, use test configuration
function getApiConfig() {
// Check if we're in test environment
if (process.env.VECTORIZE_TEST_MODE === 'true') {
const testConfigPath = path.join(__dirname, '../common/test_config.js');
if (fs.existsSync(testConfigPath)) {
const { getApiClient } = require(testConfigPath);
const { apiConfig, config } = getApiClient();
return { apiClient: apiConfig, organizationId: config.organization_id };
}
}
// Fall back to environment variables
const organizationId = process.env.VECTORIZE_ORGANIZATION_ID;
const apiKey = process.env.VECTORIZE_API_KEY;
if (!organizationId || !apiKey) {
throw new Error("Please set VECTORIZE_ORGANIZATION_ID and VECTORIZE_API_KEY environment variables");
}
const configuration = new vectorize.Configuration({
basePath: 'https://api.vectorize.io/v1',
accessToken: apiKey
});
return { apiClient: configuration, organizationId };
}
async function main() {
// Initialize the API client
const { apiClient: apiConfig, organizationId } = getApiConfig();
const { PipelinesApi, SourceConnectorsApi } = vectorize;
// Create a pipeline for deep research
console.log('🚀 Creating pipeline for deep research...');
// Get required connector IDs from environment
const aiPlatformConnectorId = process.env.VECTORIZE_AI_PLATFORM_CONNECTOR_ID;
const destinationConnectorId = process.env.VECTORIZE_DESTINATION_CONNECTOR_ID;
if (!aiPlatformConnectorId || !destinationConnectorId) {
console.log('❌ Missing required connector IDs');
console.log(' Please set:');
console.log(' - VECTORIZE_AI_PLATFORM_CONNECTOR_ID');
console.log(' - VECTORIZE_DESTINATION_CONNECTOR_ID');
console.log('\n💡 Run get_vectorize_connectors.py to find your VECTORIZE connector IDs');
process.exit(1);
}
// Create source connector
const sourceConnectorsApi = new SourceConnectorsApi(apiConfig);
const sourceResponse = await sourceConnectorsApi.createSourceConnector({
organizationId: organizationId,
createSourceConnectorRequest: {
name: 'deep-research-source',
type: 'FILE_UPLOAD',
config: {}
}
});
const sourceConnectorId = sourceResponse.connector.id;
console.log(`✅ Created source connector: ${sourceConnectorId}`);
// Create pipeline
const pipelinesApi = new PipelinesApi(apiConfig);
const pipelineResponse = await pipelinesApi.createPipeline({
organizationId: organizationId,
pipelineConfigurationSchema: {
pipelineName: 'Deep Research Pipeline',
sourceConnectors: [{
id: sourceConnectorId,
type: 'FILE_UPLOAD',
config: {}
}],
aiPlatformConnector: {
id: aiPlatformConnectorId,
type: 'VECTORIZE',
config: {}
},
destinationConnector: {
id: destinationConnectorId,
type: 'VECTORIZE',
config: {}
},
schedule: { type: 'manual' }
}
});
const pipelineId = pipelineResponse.data.id;
console.log(`✅ Created pipeline: ${pipelineId}`);
console.log(` Name: Deep Research Pipeline`);
console.log(` Status: ${pipelineResponse.data.status}\n`);
// Wait for pipeline to be ready
console.log('⏳ Waiting for pipeline to be ready...');
const maxWait = 60;
for (let i = 0; i < maxWait; i++) {
const pipeline = await pipelinesApi.getPipeline({
organizationId: organizationId,
pipelineId: pipelineId
});
const status = pipeline.data.status;
if (["LISTENING", "IDLE"].includes(status)) {
console.log(`✅ Pipeline is ready! Status: ${status}\n`);
break;
} else if (["ERROR_DEPLOYING", "SHUTDOWN"].includes(status)) {
console.log(`❌ Pipeline failed to deploy: ${status}`);
process.exit(1);
}
if (i % 10 === 0) {
console.log(` Current status: ${status}`);
}
await new Promise(resolve => setTimeout(resolve, 1000));
}
// ============================================================================
// SNIPPET: start_deep_research
// Start a deep research query with web search enabled
// ============================================================================
console.log('🧠 Starting deep research...');
let researchId;
{
// Start deep research
const response = await pipelinesApi.startDeepResearch({
organizationId: organizationId,
pipelineId: pipelineId,
startDeepResearchRequest: {
query: "What is the meaning of life?",
webSearch: true // Enable web search for comprehensive results
}
});
researchId = response.researchId;
console.log(`Research started with ID: ${researchId}`);
}
// Give the research a moment to start processing
await new Promise(resolve => setTimeout(resolve, 2000));
// ============================================================================
// SNIPPET: get_deep_research_result
// Poll for research results and retrieve the final output
// ============================================================================
console.log('\n📊 Checking research status...');
{
// Check research status and get results
const maxAttempts = 60; // Maximum 5 minutes (60 * 5 seconds)
let attempt = 0;
let response;
while (attempt < maxAttempts) {
try {
response = await pipelinesApi.getDeepResearchResult({
organizationId: organizationId,
pipelineId: pipelineId,
researchId: researchId
});
if (response.ready) {
if (response.data.success) {
console.log("Research completed successfully!");
console.log("\n📝 Research Results:\n");
console.log(response.data.markdown);
} else {
console.log("Research failed:", response.data.error);
}
break;
}
console.log("Research in progress...");
await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5 seconds
attempt += 1;
} catch (error) {
console.log(`Error checking research status: ${error.message}`);
throw error;
}
}
if (attempt >= maxAttempts) {
console.log("Research timed out after 5 minutes");
}
}
// Demonstrate research without web search
console.log('\n\n🔬 Starting research without web search...');
{
const response = await pipelinesApi.startDeepResearch({
organizationId: organizationId,
pipelineId: pipelineId,
startDeepResearchRequest: {
query: "Explain quantum computing in simple terms",
webSearch: false // Disable web search - uses only pipeline data
}
});
console.log(`Research started (no web search): ${response.researchId}`);
console.log('Note: This will search only within your pipeline\'s data sources');
}
console.log('\n✅ Deep research example completed!');
// Clean up resources
console.log('\n🧹 Cleaning up resources...');
try {
await pipelinesApi.deletePipeline({
organizationId: organizationId,
pipelineId: pipelineId
});
console.log(' ✅ Pipeline deleted');
await sourceConnectorsApi.deleteSourceConnector({
organizationId: organizationId,
connectorId: sourceConnectorId
});
console.log(' ✅ Source connector deleted');
} catch (error) {
console.log(` ⚠️ Cleanup warning: ${error.message}`);
}
}
// Run the example
if (require.main === module) {
main().catch(error => {
console.error('❌ Error:', error);
process.exit(1);
});
}
module.exports = { main };