Retrieval Performance

The Retrieval Performance dashboard provides real-time monitoring and analysis of your retrieval system's effectiveness. This guide explains the key metrics and features available in the dashboard.

Why We Measure Top 2 Chunks

The focus on top 2 chunks in our metrics provides a reliable signal for system health while avoiding common measurement pitfalls:

  • Measuring too many chunks can dilute your metrics. Poor relevance in lower-ranked chunks doesn't necessarily indicate system issues.

  • RAG systems usually form their answers primarily from the most relevant pieces of information. Having high relevance in the top 2 chunks means the LLM is more likely to generate accurate and focused responses.

By focusing on top 2 chunks, we get clean, actionable metrics that reliably indicate system health.

Core Metrics

The dashboard displays three primary metrics to help you evaluate retrieval performance:

Non-Rewritten Relevance

  • Measures the relevance of the top 2 chunks for queries in their original form

  • Ranges from 0 to 1, where higher values indicate better relevance

  • Color-coded for quick assessment:

    • Green (≥ 0.7): High relevance

    • Yellow (≥ 0.3): Moderate relevance

    • Red (< 0.3): Low relevance

Rewritten Question Relevance

  • Shows the relevance of the top 2 chunks for queries after they've been rewritten

  • Uses the same 0-1 scale and color coding as non-rewritten relevance

  • Helps evaluate if query rewriting improves retrieval quality

Overall Retrieval Health

  • Combines all relevance metrics into a single health score

  • Provides a high-level view of system performance

  • Uses the same color-coded thresholds to indicate overall health

  • Calculated as the running average of all available metrics

Interactive Visualization

The dashboard includes an interactive time series chart that shows how metrics change over time:

  • View performance trends over the desired time range

  • Toggle individual metrics by clicking on their cards

  • Hover over data points to see exact values

  • Compare rewritten vs non-rewritten performance

  • Track overall health progression

Usage Monitoring

The dashboard also includes usage statistics to help you track resource utilization:

Retrievals

  • Shows current usage against your monthly quota

  • Tracks standard retrieval operations

  • Helps monitor usage patterns and limits

Advanced Retrievals

  • Displays usage of enhanced retrieval features

  • Includes operations like rewritten queries

  • Helps manage resource allocation

Tips for Using the Dashboard

  1. Metric Toggle: Click on any metric card to show/hide its line in the graph

  2. Performance Analysis: Use the time series visualization to identify:

    • Sudden changes in performance

    • Impact of system updates

    • Time-based patterns

  3. Health Monitoring: Keep an eye on the Overall Retrieval Health score for:

    • System-wide performance issues

    • Long-term trends

    • Impact of optimizations

Interpreting Results

When analyzing your metrics, consider:

  • A consistent Overall Health score above 0.7 indicates strong retrieval performance

  • Large gaps between rewritten and non-rewritten relevance suggest opportunities for query optimization

  • Sudden drops in metrics may indicate underlying issues that need investigation

  • Usage patterns can help with capacity planning and resource allocation

Best Practices

  1. Regular Monitoring: Check the dashboard regularly to catch issues early

  2. Comparative Analysis: Compare rewritten vs non-rewritten metrics to optimize your system

  3. Trend Analysis: Use the time series data to identify patterns and make informed improvements

Last updated

Was this helpful?