Skip to main content

Gmail Source Connector

The Gmail Source Connector allows you to integrate Gmail emails as a data source for your pipelines. This connector retrieves emails from your Gmail account based on various filtering criteria, making it ideal for processing customer support emails, newsletters, or any email-based content for your RAG applications.

Configure the connector

To configure a connector to your Gmail account:

  1. Click Source Connectors from the main menu.
  2. Click New Source Connector from the Source Connectors page.
  3. Select the Gmail card.

Gmail Card

  1. Enter a name for your integration.
  2. Click Authorize to grant Vectorize access to your Gmail account through OAuth2.

The connector uses OAuth2 authentication with refresh tokens to securely access your Gmail data. You'll be redirected to Google's consent screen to grant the necessary permissions.

Configuring the Gmail Connector in a RAG Pipeline

When configuring the Gmail connector in a pipeline, you can specify the following options:

Configuring Gmail for RAG Pipeline

Email Filtering Options

Address Filters

  • From Address Filter: Filter emails by sender addresses. You can specify multiple email addresses and choose whether to use AND or OR logic.
  • To Address Filter: Filter emails by recipient addresses with AND/OR logic support.
  • CC Address Filter: Filter emails by CC recipients with AND/OR logic support.

Content Filters

  • Subject Filter: Filter emails by keywords in the subject line. Supports AND/OR logic for multiple keywords.
  • Include Attachments: Choose whether to include email attachments in the processed content (default: false).

Date Range Filters

  • Start Date: Only include emails sent after this date (exclusive). Format: YYYY-MM-DD (e.g., 2024-01-01).
  • End Date: Only include emails sent before this date (exclusive). Format: YYYY-MM-DD (e.g., 2024-01-31).

Result Limits

  • Maximum Results: Specify the maximum number of email threads to retrieve. Enter -1 for all available emails, or specify a limit (default: -1).

Messages to Fetch

Select which categories of messages to include in the import:

  • All Mail: Covers everything in your Gmail account
  • Inbox: Messages in your inbox
  • Sent: Messages you've sent
  • Archived: Archived messages (see important note below)
  • Spam / Trash: Messages in Spam or Trash folders
  • Unread Only: Only unread messages
  • Starred: Starred messages
  • Important: Messages marked as important
Archived Behavior

The "Archived" option has a specific behavior: it pulls emails from the "All Mail" section using the query in:all -in:inbox, not just archived emails. This means enabling "Archived" will pull emails other than just the archived ones - it includes all emails that are not currently in the inbox.

Gmail Label Filtering

You can filter emails by Gmail labels using the Label Filters option. This supports both built-in Gmail labels and custom labels you've created.

Built-in Gmail Labels Examples

Gmail provides several built-in labels that you can use for filtering:

Category Labels:

  • CATEGORY_PROMOTION - Promotional emails (like "promotions" tab)
  • CATEGORY_SOCIAL - Social network notifications
  • CATEGORY_UPDATES - Updates and notifications
  • CATEGORY_FORUMS - Forum and discussion emails
  • CATEGORY_PERSONAL - Personal emails

Standard Labels:

  • INBOX - Messages in inbox
  • IMPORTANT - Important messages
  • STARRED - Starred messages
  • UNREAD - Unread messages
  • SENT - Sent messages
  • DRAFT - Draft messages

Example Usage:

INBOX, IMPORTANT, CATEGORY_SOCIAL

You can combine multiple labels and choose whether to use AND or OR logic for the filtering. For example, specifying CATEGORY_PROMOTION will only include promotional emails, while INBOX, IMPORTANT with AND logic will only include emails that are both in the inbox and marked as important.

Configuration Example

When setting up a Gmail connector for customer support emails, you would typically:

  1. Set From Address Filters to specific support email addresses (e.g., support@company.com, help@company.com) with OR logic
  2. Add Subject Keywords like "urgent" or "priority" to focus on important emails
  3. Select Message Types such as "Inbox" and "Important" to capture relevant emails
  4. Configure Label Filters to include emails with specific Gmail labels
  5. Set Date Ranges to limit the time period of emails to process
  6. Enable Include Attachments if you need to process attached documents

This approach ensures you capture all relevant customer support communications while filtering out noise.

What's next?

  • If you haven't yet built a connector to your vector database, go to Configuring Vector Database Connectors and select the platform you prefer to use for storing output vectors.

    OR

  • If you're ready to start producing vector embeddings from your input data, head to Pipeline Basics. Select your new connector as the data source to use it in your pipeline.

Was this page helpful?