Real-Time Pipelines
Real-time pipelines provide continuous data synchronization and processing, enabling your AI applications to work with the most current information without waiting for scheduled sync intervals.
Overview
Real-time pipelines continuously monitor your data sources for changes and immediately process new or updated content. Unlike scheduled pipelines that run at predefined intervals, real-time pipelines ensure your vector database is always up-to-date.
Key Benefits
- Continuous Data Sync: Process changes as soon as they're detected
- Always-On Processing: Continuous monitoring without manual intervention
- Reduced Latency: Minimize wait times between data updates and availability
- Event-Driven: Respond immediately to data changes in your sources
Real-Time vs Scheduled Pipelines
Feature | Scheduled Pipeline | Real-Time Pipeline |
---|---|---|
Data Freshness | Updated on schedule (hourly, daily, etc.) | Updated within minutes |
Processing Model | Batch processing at intervals | Continuous stream processing |
Resource Usage | Bursts during scheduled runs | Steady, continuous processing |
Best For | Static datasets, periodic reports | Live data, time-sensitive applications |
Creating a Real-Time Pipeline
Prerequisites
- Organization on a premium plan
- Available real-time pipeline quota (purchased separately)
Step 1: Purchase Real-Time Pipeline Quota
Real-time pipelines are available as an add-on to your subscription:
- Navigate to Organization Settings → Billing
- Find the Real-Time Pipelines section
- Click Manage Real-Time Pipelines
- Select the number of real-time pipelines you need
- Complete the purchase
For current pricing, see Vectorize Pricing.
Step 2: Create Pipeline and Enable Real-Time Mode
- Create a new pipeline with your desired source and destination connectors
- After the pipeline is created, navigate to the pipeline details page
- Go to the Schedule tab
- Enable real-time mode:
- If you have purchased real-time pipelines: Select Real-time instead of Scheduled
- If you haven't purchased real-time pipelines: You'll see an "Unlock Real-time Processing" message with a "Purchase Real-time Pipelines" button
- Save the changes
The pipeline will begin processing changes continuously once real-time mode is enabled.
Step 3: Monitor Real-Time Processing
Real-time pipelines display a special indicator in the pipeline list:
- ⚡ Real-time badge shows the pipeline is processing continuously
- Pipeline metrics update in real-time to show processing activity
- Event logs capture all change detection and processing events
Converting Between Pipeline Modes
Scheduled to Real-Time
Convert an existing scheduled pipeline to real-time mode:
- Open the pipeline details page
- Navigate to the Schedule tab
- Toggle from Scheduled to Real-time
- Confirm the conversion
Ensure you have available real-time pipeline quota before converting. The system will prevent conversion if you've reached your limit.
Real-Time to Scheduled
Convert a real-time pipeline back to scheduled:
- Open the pipeline details page
- Navigate to the Schedule tab
- Toggle from Real-time to Scheduled
- Configure your desired schedule (daily, hourly, etc.)
- Save changes
This frees up real-time pipeline quota for use with other pipelines.
Data Source Support
All data sources are supported with real-time pipelines. The pipeline continuously monitors for changes and processes them as they're detected.
Processing time depends on several factors including the source connector's change detection method, document size, and content complexity.
Managing Real-Time Pipeline Quota
Viewing Usage
Check your current real-time pipeline usage:
- Go to Organization Settings → Billing
- View Real-Time Pipelines section
- See "X of Y pipelines in use"
Adjusting Quota
To increase your real-time pipeline quota:
- Click Manage Real-Time Pipelines
- Increase the count to desired number
- Confirm billing changes
To decrease quota:
- First convert any active real-time pipelines to scheduled mode
- Then reduce your quota in billing settings
- Billing adjusts on next cycle
You cannot reduce your real-time pipeline quota below the number of active real-time pipelines. Convert pipelines to scheduled mode first.
Best Practices
When to Use Real-Time Pipelines
Ideal Use Cases:
- Customer support knowledge bases that need instant updates
- Financial data feeds requiring immediate processing
- Collaborative documents with frequent changes
- News and content aggregation systems
- Compliance and regulatory document tracking
Consider Scheduled Pipelines When:
- Data changes infrequently (weekly/monthly)
- Batch processing is more efficient
- Resource optimization is a priority
Resource Management
- Start with scheduled pipelines and upgrade to real-time for critical data
- Group related data sources into single pipelines when possible
- Monitor usage patterns to identify pipelines that don't need real-time
- Use scheduled pipelines for historical data imports
What's next?
-
Learn about managing and monitoring your pipelines in Pipeline Management.
OR
-
Configure scheduling options for non-real-time pipelines in Scheduling Pipelines.