Skip to main content

GitHub Source Connector

The GitHub Source Connector allows you to integrate GitHub issues and pull requests as a data source for your pipelines. This connector does not pull the actual code from repositories, only the metadata, issues, and pull requests. This guide explains the configuration options available when setting up a GitHub connector.

Configure the connector

To configure a connector to your GitHub account:

  1. Click Source Connectors from the main menu.

  2. Click New Source Connector from the Source Connectors page.

  3. Select the GitHub card.

  4. Enter a name for your integration, then enter your GitHub Personal Access Token. This token will be used to authenticate with GitHub and access the specified repositories.

Configuring the GitHub Connector in a RAG Pipeline

When configuring the GitHub connector in a pipeline, you can specify the following options:

Basic Configuration

  • Repositories: Specify the repositories to include in the format owner/repo (e.g., vectorize-io/docs). This is required.
  • Max Items: The maximum number of items to fetch from GitHub (default: 1000).
  • Created After: Optionally filter to only include items created after a specific date (format: YYYY-MM-DD).

Pull Request Configuration

  • Include Pull Requests: Whether to include pull requests from the repositories (enabled by default).
  • Pull Request Status: Filter pull requests by status: all, open, closed, or merged (default: all).
  • Pull Request Labels: Optionally filter pull requests by specific labels (e.g., "feature", "bug").

Issue Configuration

  • Include Issues: Whether to include issues from the repositories (enabled by default).
  • Issue Status: Filter issues by status: all, open, or closed (default: all).
  • Issue Labels: Optionally filter issues by specific labels (e.g., "enhancement", "documentation").

What's next?

  • If you haven't yet built a connector to your vector database, go to Configuring Vector Database Connectors and select the platform you prefer to use for storing output vectors.

    OR

  • If you're ready to start producing vector embeddings from your input data, head to Pipeline Basics. Select your new connector as the data source to use it in your pipeline.

Was this page helpful?