Scheduling RAG Pipelines
Last updated
Last updated
The pipeline scheduler is the key to keeping your vector indexes up to date with fresh data. When your pipeline is running, Vectorize will immediately process any changes it finds. Depending on the source connector(s) you are using, Vectorize will either poll the source system, or listen for change notifications from the source system.
There are two top level modes you can use for the scheduler.
Pipelines must be configured using either a "Scheduled" or "Real-time" execution mode. This is not a permanent decision, and you choose to start with a scheduled pipeline and later decide to run the pipeline real-time and vice versa.
A pipeline that is configured in real-time mode is referred to as a "Real-Time Pipeline". These pipelines will run continuously until you explicitly stop them. As soon as changes are detected, your vector indexes will be updated in near-real time (usually within a few seconds).
If you are on the free plan your real time pipelines will run until you stop them or until you run out of free hours for the month.
A pipeline that is configured in scheduled mode is referred to as a "Scheduled Pipeline". These pipelines will start running automatically based on the configuration that you select.
Regardless of the schedule settings, Vectorize will run your pipeline for an hour immediately to backfill your vector indexes assuming you have pipeline hours available in your free plan or that you are on a paid plan. After that, your pipeline will run as scheduled.
When configuring a scheduled pipeline, you must specify how often, and for how long, you want your pipeline to run. You start this process by selecting the Schedule Type
Schedule Type options include:
Manual: Run the pipeline on-demand
Weekly: Run once a week on a specific day, Vectorize decides the time the pipeline will run.
Daily: Run every day, including weekends, starting at a set time and ending at a set time
Weekdays: Run Monday through Friday, starting a a set time and ending at a set time
Custom: Similar behavior to Daily and Weekdays, but you can pick specific days of the week when you want the schedule to run.
Set Days (for Weekly or Custom types)
For Weekly: Choose one day of the week
For Custom: Select multiple days as needed
Set Time Range (except for Manual and Weekly types)
Start Time: When the pipeline should begin running
Start times can be configured in 15-minute increments
End Time: When the pipeline should stop running
Time ranges must be at least one hour long
Pipeline schedules are always in increments of whole hours
Choose Timezone
Select your preferred timezone to ensure accurate scheduling
Review Current Schedule
The scheduler will display a summary of your selected options
It will also show the estimated runtime and any associated costs
Note: The scheduler summary considers free hours for free plan users and paid users' hourly rate when calculating costs. While these estimates are reasonably accurate, factors such as the number of days per month will result in minor cost fluctuations from month to month.
If you are using the Vectorize free plan and you exceed your free hours for the month, you will need to manually restart your pipeline at the start of the next month when your hours refresh for the month.