Skip to main content

Understanding Retrieval Performance Metrics

The Retrieval Performance dashboard provides real-time monitoring and analysis of your retrieval system's effectiveness. This guide explains the key metrics and features available in the dashboard.

Why We Measure Top 2 Chunks

The focus on top 2 chunks in our metrics provides a reliable signal for system health while avoiding common measurement pitfalls:

  • Measuring too many chunks can dilute your metrics. Poor relevance in lower-ranked chunks doesn't necessarily indicate system issues.
  • RAG systems usually form their answers primarily from the most relevant pieces of information. Having high relevance in the top 2 chunks means the LLM is more likely to generate accurate and focused responses.

By focusing on top 2 chunks, we get clean, actionable metrics that reliably indicate system health.

Core Metrics

The dashboard displays three primary metrics to help you evaluate retrieval performance:

Non-Rewritten Relevance

  • Measures the relevance of the top 2 chunks for queries in their original form
  • Ranges from 0 to 1, where higher values indicate better relevance
  • Color-coded for quick assessment:
    • Green (≥ 0.7): High relevance
    • Yellow (≥ 0.3): Moderate relevance
    • Red (< 0.3): Low relevance

Rewritten Question Relevance

  • Shows the relevance of the top 2 chunks for queries after they've been rewritten
  • Uses the same 0-1 scale and color coding as non-rewritten relevance
  • Helps evaluate if query rewriting improves retrieval quality

Overall Retrieval Health

  • Combines all relevance metrics into a single health score
  • Provides a high-level view of system performance
  • Uses the same color-coded thresholds to indicate overall health
  • Calculated as the running average of all available metrics

Interactive Visualization

The dashboard includes an interactive time series chart that shows how metrics change over time:

  • View performance trends over the desired time range
  • Toggle individual metrics by clicking on their cards
  • Hover over data points to see exact values
  • Compare rewritten vs non-rewritten performance
  • Track overall health progression

Usage Monitoring

The dashboard also includes usage statistics to help you track resource utilization:

Retrievals

  • Shows current usage against your monthly quota
  • Tracks standard retrieval operations
  • Helps monitor usage patterns and limits

Advanced Retrievals

  • Displays usage of enhanced retrieval features
  • Includes operations like rewritten queries
  • Helps manage resource allocation

Tips for Using the Dashboard

  1. Metric Toggle: Click on any metric card to show/hide its line in the graph
  2. Performance Analysis: Use the time series visualization to identify:
    • Sudden changes in performance
    • Impact of system updates
    • Time-based patterns
  3. Health Monitoring: Keep an eye on the Overall Retrieval Health score for:
    • System-wide performance issues
    • Long-term trends
    • Impact of optimizations

Interpreting Results

When analyzing your metrics, consider:

  • A consistent Overall Health score above 0.7 indicates strong retrieval performance
  • Large gaps between rewritten and non-rewritten relevance suggest opportunities for query optimization
  • Sudden drops in metrics may indicate underlying issues that need investigation
  • Usage patterns can help with capacity planning and resource allocation

Best Practices

  1. Regular Monitoring: Check the dashboard regularly to catch issues early
  2. Comparative Analysis: Compare rewritten vs non-rewritten metrics to optimize your system
  3. Trend Analysis: Use the time series data to identify patterns and make informed improvements

Was this page helpful?