Skip to main content

Advanced Query for Built-in Database and Elasticsearch

Overview

The Advanced Query feature provides enhanced search capabilities for built-in database and Elasticsearch, supporting three distinct search modes: pure text search, pure vector search, and hybrid search that combines both. This feature allows for more flexible and powerful querying compared to the standard vector-only search.

The Advanced Query feature supports a subset of the Elasticsearch query DSL.

warning

This feature is only available for the built-in database and Elasticsearch. Attempting to use it with other databases will result in an error.

Query Structure

The advanced query is specified using the advanced-query field in your request:

{
"question": "Example question",
"numResults": 5,
"advanced-query": {
"mode": "text|vector|hybrid", // optional, defaults to "vector" if not specified
"text-fields": ["field1", "field2"],
"match-type": "match|match_phrase|multi_match",
"text-boost": 1.0,
"filters": {
// Filters go here, can be a simple object or a complex bool query
}
}
}

Filter Examples

Filters can be used to narrow down search results based on specific criteria. They can be simple term filters, range filters, or complex boolean queries combining multiple conditions. When applying a filter, it will only return documents that match the specified criteria, and they can be used in conjunction with any of the search modes. Filters can be specified in the filters field of the advanced-query object. Here are some examples:

Basic Term Filter

{
"filters": {
"term": {
"origin": "file-upload"
}
}
}

Multiple Terms Filter

{
"filters": {
"terms": {
"filename": ["report.pdf", "summary.docx"]
}
}
}

Range Filter

{
"filters": {
"range": {
"document_metadata.price": {
"gte": 100,
"lte": 500
}
}
}
}

Nested Filters with AND Logic

You can combine multiple filters using a boolean query to ensure that all conditions must be met: must: All conditions must be true should: At least one condition must be true must_not: Conditions that must not be true

{
"filters": {
"bool": {
"must": [
{"term": {"document_metadata.status": "published"}},
{"range": {"date": {"gte": "2023-01-01"}}}
]
}
}
}

Nested Filters with OR Logic

{
"filters": {
"bool": {
"should": [
{"term": {"chunk_metadata.category": "urgent"}},
{"range": {"chunk_metadata.priority": {"gte": 8}}}
],
"minimum_should_match": 1
}
}
}

Search Modes

1. Vector Mode ("mode": "vector")

Performs semantic similarity search using embeddings (similar to the default behavior).

When to use: When you want to find semantically similar documents regardless of exact keyword matches.

Optional fields:

  • filters: Additional filters to apply

Example:

{
"question": "environmental sustainability initiatives",
"numResults": 10,
"advanced-query": {
"mode": "vector", // optional field, defaults to "vector" if not specified
"filters": {
"range": {
"document_metadata.publish_date": {
"gte": "2023-01-01",
"lte": "2023-12-31"
}
}
}
}
}

2. Hybrid Mode ("mode": "hybrid")

Combines both text and vector search, allowing you to leverage both keyword matching and semantic similarity.

When to use: When you want the benefits of both exact keyword matching and semantic understanding.

Required fields:

  • question: The text to search for
  • embeddings: Vector representation of your query
  • mode: Set to "hybrid"

Optional fields:

  • text-fields: Specifies which document fields to search in. This is an array of field names where the database will look for your search terms.

    • Example: ["text", "title", "summary"] would search across all three fields
    • The fields must exist in your indexed documents
    • Default: ["text"] (searches only in the main text field)
  • match-type: Controls how the text is matched. Options include:

    • "match" (default): Analyzes your query and matches documents containing ANY of the words. For example, searching "API authentication" would find documents with "API" OR "authentication" OR both.
    • "match_phrase": Searches for the exact phrase in the exact order. Searching "API authentication" would only find documents with those words appearing together in that order.
    • "multi_match": Optimized for searching across multiple fields with different relevance scoring. Best used when you specify multiple fields in text-fields.
    • Default: "match"
  • text-boost: Multiplier for text search scores. This allows you to prioritize text matches over vector similarity. Higher values (e.g., 2.0 or 3.0) will give more weight to text matches, while lower values (e.g., 0.5 or 1.0) will favor vector similarity.

  • Default: 1.0

    • Example: Setting text-boost to 2.0 will double the score of text matches compared to vector matches.
  • filters: Additional filters to apply to the search. This can include term filters, range filters, or complex boolean queries.

note

The similarity score for vector matches will be set to 0, and the score will be returned in the match_score field.

Example:

{
"question": "How do I implement OAuth 2.0 authentication flow in a React application?",
"numResults": 5,
"advanced-query": {
"mode": "hybrid",
"match-type": "match",
"text-boost": 1.5,
"filters": {
"term": {
"document_metadata.category": "tutorial"
}
}
}
}

3. Text Mode ("mode": "text")

Performs traditional text-based search without using embeddings.

When to use: When you want to find documents based on keyword matching rather than semantic similarity. The search will be performed on the value in the question field.

Required fields:

  • mode: Set to "text"

Optional fields:

  • text-fields: Specifies which document fields to search in. This is an array of field names where database will look for your search terms.

    • Example: ["text", "title", "summary"] would search across all three fields
    • The fields must exist in your indexed documents
    • Default: ["text"] (searches only in the main text field)
  • match-type: Controls how the text is matched. Options include:

    • "match" (default): Analyzes your query and matches documents containing ANY of the words. For example, searching "API authentication" would find documents with "API" OR "authentication" OR both.
    • "match_phrase": Searches for the exact phrase in the exact order. Searching "API authentication" would only find documents with those words appearing together in that order.
    • "multi_match": Optimized for searching across multiple fields with different relevance scoring. Best used when you specify multiple fields in text-fields.
    • Default: "match"
  • filters: Additional filters to apply to the search. This can include term filters, range filters, or complex boolean queries.

note

The similarity score for vector matches will be set to 0, and the score will be returned in the match_score field.

Example:

{
"question": "When should I use text search instead of vector search?",
"numResults": 5,
"advanced-query": {
"mode": "text",
"filters": {
"term": {
"document_metadata.title": "Api Documentation"
}
}
}
}

Complete Examples

Example 1: Finding Recent Technical Documentation

{
"question": "API authentication OAuth2",
"numResults": 10,
"advanced-query": {
"mode": "text",
"text-fields": ["text", "title", "tags"],
"match-type": "match",
"filters": {
"bool": {
"must": [
{"term": {"document_type": "documentation"}},
{"term": {"category": "technical"}},
{"range": {"last_updated": {"gte": "2023-06-01"}}}
]
}
}
}
}

Example 2: Semantic Search with Metadata Filters

{
"question": "sustainable energy solutions",
"embeddings": [0.234, -0.123, 0.456, ...],
"numResults": 15,
"advanced-query": {
"mode": "vector",
"filters": {
"bool": {
"must": [
{"terms": {"tags": ["renewable", "green", "sustainable"]}},
{"term": {"status": "published"}}
],
"must_not": [
{"term": {"archived": true}}
]
}
}
}
}

Example 3: Hybrid Search for Customer Support

{
"question": "refund policy damaged items",
"embeddings": [0.345, -0.234, 0.567, ...],
"numResults": 5,
"advanced-query": {
"mode": "hybrid",
"text-fields": ["text", "faq_question", "faq_answer"],
"match-type": "match_phrase",
"text-boost": 3.0,
"filters": {
"bool": {
"must": [
{"term": {"content_type": "support"}},
{"terms": {"category": ["refunds", "returns", "policies"]}}
],
"should": [
{"term": {"priority": "high"}}
]
}
}
}
}

Example 4: Searching Within Date Ranges

{
"question": "quarterly earnings report",
"numResults": 10,
"advanced-query": {
"mode": "text",
"text-fields": ["text", "title", "executive_summary"],
"match-type": "match",
"filters": {
"bool": {
"must": [
{"term": {"document_type": "financial_report"}},
{
"range": {
"report_date": {
"gte": "2023-01-01",
"lte": "2023-12-31"
}
}
}
]
}
}
}
}

Best Practices

  1. Choose the right mode:

    • Use text mode for exact keyword searches
    • Use vector mode for semantic/conceptual searches
    • Use hybrid mode when you need both
  2. Text field selection:

    • Include fields that are most likely to contain relevant keywords
    • Order fields by importance when using multi_match
  3. Text boost tuning:

    • Start with default (1.0) and adjust based on results
    • Higher values (2.0-5.0) when exact matches are crucial
    • Lower values (0.5-1.0) when semantic similarity is more important
  4. Filter optimization:

    • Use specific filters to reduce the search space
    • Combine filters logically using bool queries
    • Test filter performance with your data
  5. Match type selection:

    • match: Best for general searches
    • match_phrase: Best for finding specific phrases or quotes
    • multi_match: Best when searching across different field types

Error Handling

This feature is only supported for built-in database and Elasticsearch. If you attempt to use advanced-query with a different database, you will receive the following error:

InvalidRecordError: advanced-query is not supported for [Database Name]. Use our built in database for this feature. 
info

String fields in filters automatically receive the .keyword suffix for exact matching The existing metadata-filters field continues to work alongside advanced-query filters Pipeline ID filtering is automatically applied to all queries All existing Elasticsearch query DSL features are supported in the filters section.

Was this page helpful?