Skip to main content

Configuring Neo4j Graph Database Connector

The Neo4j Graph Database Connector (Beta) enables you to integrate Neo4j as a destination for your pipeline data. This connector stores your vectorized content in Neo4j, laying the foundation for future graph-enhanced retrieval capabilities.

Prerequisites

Before configuring Neo4j in Vectorize, you'll need:

  1. Neo4j Instance: Either Neo4j Aura (cloud) or self-hosted Neo4j database
  2. Connection Details:
    • Connection URI (e.g., neo4j://your-instance.neo4j.io or neo4j://localhost:7687)
    • Username (typically neo4j for new instances)
    • Password

Configuring the Integration

  1. From the main menu, click on Vector Databases
  2. Click New Vector Database Integration
  3. Select the Neo4j card (Beta)
  4. Enter your configuration details

Authentication Configuration

Provide the following connection details:

  1. Integration Name: Enter a descriptive name for this Neo4j integration
  2. Connection URI: Your Neo4j instance connection string
    • For Neo4j Aura: neo4j+s://xxxxxxxx.databases.neo4j.io
    • For local development: neo4j://localhost:7687
    • For self-hosted: neo4j://your-server:7687
  3. Username: Your Neo4j username (default is usually neo4j)
  4. Password: Your Neo4j instance password

Pipeline Configuration

When using Neo4j in a pipeline, you'll configure:

Label Name: A unique label for this pipeline's data nodes

  • This label organizes your data in the graph
  • Each pipeline should use a distinct label to separate its data
  • If the label doesn't exist, it will be created automatically
  • Example: DocumentChunks, ProductDocs, CustomerSupport

Using Neo4j with Vectorize

Neo4j functions as a destination connector where:

  • Your pipeline data is stored in the Neo4j graph database
  • Documents are organized using the label you specify
  • The connector supports the standard Vectorize retrieval endpoint

To retrieve data from your Neo4j-backed pipeline, use the standard retrieval endpoint:

curl --location 'https://client.app.vectorize.io/api/gateways/service/{org-id}/{pipeline-id}/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data '{
"question": "What are the key features?",
"numResults": 5,
"rerank": true
}'

Graph-Enhanced Retrieval

Neo4j's graph database capabilities enable advanced retrieval features that go beyond traditional vector search. By leveraging graph relationships between your documents, you can retrieve more contextually relevant information.

To use graph-enhanced retrieval, add the graph-search parameter to your retrieval request:

Enable graph search with default settings:

curl --location 'https://client.app.vectorize.io/api/gateways/service/{org-id}/{pipeline-id}/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data '{
"question": "What are the key features?",
"numResults": 5,
"graph-search": true
}'

Advanced Graph Search Configuration

Customize graph traversal behavior with additional parameters:

curl --location 'https://client.app.vectorize.io/api/gateways/service/{org-id}/{pipeline-id}/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data '{
"question": "What are the key features?",
"numResults": 5,
"graph-search": {
"max_hops": 2,
"graph_limit": 15
}
}'

Graph Search Parameters

ParameterTypeDefaultDescription
graph-searchboolean/objectfalseEnable graph-enhanced retrieval. Use true for defaults or an object to customize
max_hopsinteger1Maximum number of relationship hops to traverse from initial vector results
graph_limitinteger10Maximum number of related documents to retrieve through graph traversal

How Graph Search Works

  1. Initial Vector Search: The system first performs a standard vector similarity search to find the most semantically relevant documents based on your query
  2. Graph Traversal: Starting from the initial results, the system traverses relationships in the graph database (up to max_hops levels deep) to discover connected documents
  3. Enhanced Results: Both the initial vector search results and the graph-connected documents are combined, providing a richer set of contextually related content
  4. Ranking: Results maintain relevance scoring based on both vector similarity and graph proximity

Graph-enhanced retrieval is particularly powerful when:

  • Documents Have Relationships: Your content contains explicit connections like citations, references, or hierarchical structures
  • Context Is Critical: Understanding requires knowledge of related documents (e.g., product features that depend on other features)
  • Entity Relationships Matter: Queries involve entities that are connected (e.g., people working on projects, components in a system)
  • Multi-Hop Reasoning Needed: Answers require connecting information across multiple related documents

Example: Technical Documentation

Consider a technical documentation knowledge base where:

  • API documentation references related endpoints
  • Features link to their dependencies
  • Code examples cite prerequisite concepts
  • Troubleshooting guides reference configuration settings

With graph search:

{
"question": "How do I configure authentication?",
"numResults": 5,
"graph-search": {
"max_hops": 2,
"graph_limit": 20
}
}

The retrieval process will:

  1. Find authentication configuration documents via vector search
  2. Traverse relationships to discover related security best practices documents (1 hop away)
  3. Include linked prerequisite setup guides and related API documentation (2 hops away)
  4. Return a comprehensive result set covering the full context needed to configure authentication

Important Limitations

  • Advanced Query Filters: The advanced-query parameter is not supported when using Neo4j. Use metadata-filters for filtering instead.
  • Performance Considerations: Graph traversal can be resource-intensive, especially with higher max_hops values. Start with lower values and increase as needed.
  • Graph Structure Required: Graph search effectiveness depends on having meaningful relationships between your documents in the graph database.

Best Practices

Label Naming

  • Use descriptive, domain-specific labels
  • Follow Neo4j naming conventions (PascalCase)
  • Separate different data types with different labels
  • Examples: TechnicalDocs, CustomerTickets, ProductSpecs

Data Organization

  • Use separate pipelines (and labels) for different data domains
  • Each pipeline's data will be isolated under its own label
  • Consider your future graph structure needs when organizing data

Troubleshooting

Connection Issues

  • Connection refused: Verify your Neo4j instance is running and accessible
  • Authentication failed: Check username/password and ensure user has appropriate permissions
  • SSL/TLS errors: For Neo4j Aura, use neo4j+s:// protocol; for local, use neo4j://

Connector Status

  • Check the Vector Databases page in the Vectorize dashboard to verify your connector status
  • If the connector shows an error, review your connection settings
  • Ensure your Neo4j instance allows connections from Vectorize IP addresses

What's Next?

For Neo4j-specific documentation and Cypher query language, visit the Neo4j Documentation.

Was this page helpful?