Configuring Neo4j Graph Database Connector
The Neo4j Graph Database Connector (Beta) enables you to integrate Neo4j as a destination for your pipeline data. This connector stores your vectorized content in Neo4j, laying the foundation for future graph-enhanced retrieval capabilities.
Prerequisites
Before configuring Neo4j in Vectorize, you'll need:
- Neo4j Instance: Either Neo4j Aura (cloud) or self-hosted Neo4j database
- Connection Details:
- Connection URI (e.g.,
neo4j://your-instance.neo4j.ioorneo4j://localhost:7687) - Username (typically
neo4jfor new instances) - Password
- Connection URI (e.g.,
Configuring the Integration
- From the main menu, click on Vector Databases
- Click New Vector Database Integration
- Select the Neo4j card (Beta)
- Enter your configuration details
Authentication Configuration
Provide the following connection details:
- Integration Name: Enter a descriptive name for this Neo4j integration
- Connection URI: Your Neo4j instance connection string
- For Neo4j Aura:
neo4j+s://xxxxxxxx.databases.neo4j.io - For local development:
neo4j://localhost:7687 - For self-hosted:
neo4j://your-server:7687
- For Neo4j Aura:
- Username: Your Neo4j username (default is usually
neo4j) - Password: Your Neo4j instance password
Pipeline Configuration
When using Neo4j in a pipeline, you'll configure:
Label Name: A unique label for this pipeline's data nodes
- This label organizes your data in the graph
- Each pipeline should use a distinct label to separate its data
- If the label doesn't exist, it will be created automatically
- Example:
DocumentChunks,ProductDocs,CustomerSupport
Using Neo4j with Vectorize
Neo4j functions as a destination connector where:
- Your pipeline data is stored in the Neo4j graph database
- Documents are organized using the label you specify
- The connector supports the standard Vectorize retrieval endpoint
To retrieve data from your Neo4j-backed pipeline, use the standard retrieval endpoint:
curl --location 'https://client.app.vectorize.io/api/gateways/service/{org-id}/{pipeline-id}/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data '{
"question": "What are the key features?",
"numResults": 5,
"rerank": true
}'
Graph-Enhanced Retrieval
Neo4j's graph database capabilities enable advanced retrieval features that go beyond traditional vector search. By leveraging graph relationships between your documents, you can retrieve more contextually relevant information.
Enabling Graph Search
To use graph-enhanced retrieval, add the graph-search parameter to your retrieval request:
Basic Graph Search
Enable graph search with default settings:
curl --location 'https://client.app.vectorize.io/api/gateways/service/{org-id}/{pipeline-id}/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data '{
"question": "What are the key features?",
"numResults": 5,
"graph-search": true
}'
Advanced Graph Search Configuration
Customize graph traversal behavior with additional parameters:
curl --location 'https://client.app.vectorize.io/api/gateways/service/{org-id}/{pipeline-id}/retrieve' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data '{
"question": "What are the key features?",
"numResults": 5,
"graph-search": {
"max_hops": 2,
"graph_limit": 15
}
}'
Graph Search Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
graph-search | boolean/object | false | Enable graph-enhanced retrieval. Use true for defaults or an object to customize |
max_hops | integer | 1 | Maximum number of relationship hops to traverse from initial vector results |
graph_limit | integer | 10 | Maximum number of related documents to retrieve through graph traversal |
How Graph Search Works
- Initial Vector Search: The system first performs a standard vector similarity search to find the most semantically relevant documents based on your query
- Graph Traversal: Starting from the initial results, the system traverses relationships in the graph database (up to
max_hopslevels deep) to discover connected documents - Enhanced Results: Both the initial vector search results and the graph-connected documents are combined, providing a richer set of contextually related content
- Ranking: Results maintain relevance scoring based on both vector similarity and graph proximity
Use Cases for Graph Search
Graph-enhanced retrieval is particularly powerful when:
- Documents Have Relationships: Your content contains explicit connections like citations, references, or hierarchical structures
- Context Is Critical: Understanding requires knowledge of related documents (e.g., product features that depend on other features)
- Entity Relationships Matter: Queries involve entities that are connected (e.g., people working on projects, components in a system)
- Multi-Hop Reasoning Needed: Answers require connecting information across multiple related documents
Example: Technical Documentation
Consider a technical documentation knowledge base where:
- API documentation references related endpoints
- Features link to their dependencies
- Code examples cite prerequisite concepts
- Troubleshooting guides reference configuration settings
With graph search:
{
"question": "How do I configure authentication?",
"numResults": 5,
"graph-search": {
"max_hops": 2,
"graph_limit": 20
}
}
The retrieval process will:
- Find authentication configuration documents via vector search
- Traverse relationships to discover related security best practices documents (1 hop away)
- Include linked prerequisite setup guides and related API documentation (2 hops away)
- Return a comprehensive result set covering the full context needed to configure authentication
Important Limitations
- Advanced Query Filters: The
advanced-queryparameter is not supported when using Neo4j. Usemetadata-filtersfor filtering instead. - Performance Considerations: Graph traversal can be resource-intensive, especially with higher
max_hopsvalues. Start with lower values and increase as needed. - Graph Structure Required: Graph search effectiveness depends on having meaningful relationships between your documents in the graph database.
Best Practices
Label Naming
- Use descriptive, domain-specific labels
- Follow Neo4j naming conventions (PascalCase)
- Separate different data types with different labels
- Examples:
TechnicalDocs,CustomerTickets,ProductSpecs
Data Organization
- Use separate pipelines (and labels) for different data domains
- Each pipeline's data will be isolated under its own label
- Consider your future graph structure needs when organizing data
Troubleshooting
Connection Issues
- Connection refused: Verify your Neo4j instance is running and accessible
- Authentication failed: Check username/password and ensure user has appropriate permissions
- SSL/TLS errors: For Neo4j Aura, use
neo4j+s://protocol; for local, useneo4j://
Connector Status
- Check the Vector Databases page in the Vectorize dashboard to verify your connector status
- If the connector shows an error, review your connection settings
- Ensure your Neo4j instance allows connections from Vectorize IP addresses
What's Next?
- Using the Retrieval Endpoint - Learn about standard retrieval features
- Advanced Retrieval - Explore query rewriting and reranking
- Understanding Metadata - Learn how metadata becomes graph properties
For Neo4j-specific documentation and Cypher query language, visit the Neo4j Documentation.