Configuring Neo4j Graph Database Connector

The Neo4j Graph Database Connector (Beta) enables you to integrate Neo4j as a destination for your pipeline data. This connector stores your vectorized content in Neo4j, laying the foundation for future graph-enhanced retrieval capabilities.

Prerequisites

Before configuring Neo4j in Vectorize, you'll need:

Neo4j Instance: Either Neo4j Aura (cloud) or self-hosted Neo4j database
Connection Details:
- Connection URI (e.g., neo4j://your-instance.neo4j.io or neo4j://localhost:7687)
- Username (typically neo4j for new instances)
- Password

Configuring the Integration

From the main menu, click on Vector Databases
Click New Vector Database Integration
Select the Neo4j card (Beta)
Enter your configuration details

Authentication Configuration

Provide the following connection details:

Integration Name: Enter a descriptive name for this Neo4j integration
Connection URI: Your Neo4j instance connection string
- For Neo4j Aura: neo4j+s://xxxxxxxx.databases.neo4j.io
- For local development: neo4j://localhost:7687
- For self-hosted: neo4j://your-server:7687
Username: Your Neo4j username (default is usually neo4j)
Password: Your Neo4j instance password

Pipeline Configuration

When using Neo4j in a pipeline, you'll configure:

Label Name: A unique label for this pipeline's data nodes

This label organizes your data in the graph
Each pipeline should use a distinct label to separate its data
If the label doesn't exist, it will be created automatically
Example: DocumentChunks, ProductDocs, CustomerSupport

Using Neo4j with Vectorize

Neo4j functions as a destination connector where:

Your pipeline data is stored in the Neo4j graph database
Documents are organized using the label you specify
The connector supports the standard Vectorize retrieval endpoint

To retrieve data from your Neo4j-backed pipeline, use the standard retrieval endpoint:

curl --location 'https://client.app.vectorize.io/api/gateways/service/{org-id}/{pipeline-id}/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data '{
   "question": "What are the key features?",
   "numResults": 5,
   "rerank": true
}'

Graph-Enhanced Retrieval

Neo4j's graph database capabilities enable advanced retrieval features that go beyond traditional vector search. By leveraging graph relationships between your documents, you can retrieve more contextually relevant information.

Enabling Graph Search

To use graph-enhanced retrieval, add the graph-search parameter to your retrieval request:

Basic Graph Search

Enable graph search with default settings:

curl --location 'https://client.app.vectorize.io/api/gateways/service/{org-id}/{pipeline-id}/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data '{
   "question": "What are the key features?",
   "numResults": 5,
   "graph-search": true
}'

Advanced Graph Search Configuration

Customize graph traversal behavior with additional parameters:

curl --location 'https://client.app.vectorize.io/api/gateways/service/{org-id}/{pipeline-id}/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data '{
   "question": "What are the key features?",
   "numResults": 5,
   "graph-search": {
      "max_hops": 2,
      "graph_limit": 15
   }
}'

Graph Search Parameters

Parameter	Type	Default	Description
`graph-search`	boolean/object	false	Enable graph-enhanced retrieval. Use `true` for defaults or an object to customize
`max_hops`	integer	1	Maximum number of relationship hops to traverse from initial vector results
`graph_limit`	integer	10	Maximum number of related documents to retrieve through graph traversal

How Graph Search Works

Initial Vector Search: The system first performs a standard vector similarity search to find the most semantically relevant documents based on your query
Graph Traversal: Starting from the initial results, the system traverses relationships in the graph database (up to max_hops levels deep) to discover connected documents
Enhanced Results: Both the initial vector search results and the graph-connected documents are combined, providing a richer set of contextually related content
Ranking: Results maintain relevance scoring based on both vector similarity and graph proximity

Use Cases for Graph Search

Graph-enhanced retrieval is particularly powerful when:

Documents Have Relationships: Your content contains explicit connections like citations, references, or hierarchical structures
Context Is Critical: Understanding requires knowledge of related documents (e.g., product features that depend on other features)
Entity Relationships Matter: Queries involve entities that are connected (e.g., people working on projects, components in a system)
Multi-Hop Reasoning Needed: Answers require connecting information across multiple related documents

Example: Technical Documentation

Consider a technical documentation knowledge base where:

API documentation references related endpoints
Features link to their dependencies
Code examples cite prerequisite concepts
Troubleshooting guides reference configuration settings

With graph search:

{
  "question": "How do I configure authentication?",
  "numResults": 5,
  "graph-search": {
    "max_hops": 2,
    "graph_limit": 20
  }
}

The retrieval process will:

Find authentication configuration documents via vector search
Traverse relationships to discover related security best practices documents (1 hop away)
Include linked prerequisite setup guides and related API documentation (2 hops away)
Return a comprehensive result set covering the full context needed to configure authentication

Important Limitations

Advanced Query Filters: The advanced-query parameter is not supported when using Neo4j. Use metadata-filters for filtering instead.
Performance Considerations: Graph traversal can be resource-intensive, especially with higher max_hops values. Start with lower values and increase as needed.
Graph Structure Required: Graph search effectiveness depends on having meaningful relationships between your documents in the graph database.

Best Practices

Label Naming

Use descriptive, domain-specific labels
Follow Neo4j naming conventions (PascalCase)
Separate different data types with different labels
Examples: TechnicalDocs, CustomerTickets, ProductSpecs

Data Organization

Use separate pipelines (and labels) for different data domains
Each pipeline's data will be isolated under its own label
Consider your future graph structure needs when organizing data

Troubleshooting

Connection Issues

Connection refused: Verify your Neo4j instance is running and accessible
Authentication failed: Check username/password and ensure user has appropriate permissions
SSL/TLS errors: For Neo4j Aura, use neo4j+s:// protocol; for local, use neo4j://

Connector Status

Check the Vector Databases page in the Vectorize dashboard to verify your connector status
If the connector shows an error, review your connection settings
Ensure your Neo4j instance allows connections from Vectorize IP addresses

What's Next?

Using the Retrieval Endpoint - Learn about standard retrieval features
Advanced Retrieval - Explore query rewriting and reranking
Understanding Metadata - Learn how metadata becomes graph properties

For Neo4j-specific documentation and Cypher query language, visit the Neo4j Documentation.

Prerequisites​

Configuring the Integration​

Authentication Configuration​

Pipeline Configuration​

Using Neo4j with Vectorize​

Graph-Enhanced Retrieval​

Enabling Graph Search​

Basic Graph Search​

Advanced Graph Search Configuration​

Graph Search Parameters​

How Graph Search Works​

Use Cases for Graph Search​

Example: Technical Documentation​

Important Limitations​

Best Practices​

Label Naming​

Data Organization​

Troubleshooting​

Connection Issues​

Connector Status​

What's Next?​