Google Drive
Last updated
Last updated
The Google Drive Source Connector allows you to integrate Google Drive as a data source for your pipelines. This guide explains the configuration options available when setting up a Google Drive connector.
Before starting, you'll need:
The JSON key token for your GCP service account.
Your GCP service account's email address.
If you don't have a GCP service account created yet, check out our guide How to Create a GCP Service Account for use with Google Drive.
In order for your RAG Pipeline to ingest content from Google Drive, you'll need to share all content you'd like to make available to your pipeline with your service account's email address. You can share individual files and folders. Sharing a parent folder will share all content inside the folder, including content in any sub-folders. Note that shared drives are not supported.
To configure a connector to your Google Drive instance:
Click Source Connectors from the main menu.
Click New Source Connector from the Source Connectors page.
Select the Google Drive card.
Enter the name and your service account's JSON key, then click Create Google Drive Integration.
You can think of the Google Drive connector as having two parts to it. The first is authorization with your service account. This part is re-usable across pipelines and allows you to connect to this same service account in different pipelines without providing the credentials every time.
The second part is the configuration that's specific to your RAG Pipeline, such as which files and directories should be processed.
The following table outlines the fields available when configuring a Google Drive source for use within a Retrieval-Augmented Generation (RAG) pipeline.
File Extensions
Specifies the types of files to be included (e.g., PDF, HTML, Markdown, Text, DOCX).
Yes
Root Folder IDs
Specifies the root folder id(s) to pull data from. These folders must be shared with the service account.
No
Polling Interval
Interval (in seconds) at which the connector will check Google Drive for updates.
No
If you haven't yet built a connector to your vector database, go to Configuring Vector Database Connectors and select the platform you prefer to use for storing output vectors.
OR
If you're ready to start producing vector embeddings from your input data, head to Pipeline Basics. Select your new connector as the data source to use it in your pipeline.