Understanding Retrieval Performance Metrics
The Retrieval Performance dashboard provides real-time monitoring and analysis of your retrieval system's effectiveness. This guide explains the key metrics and features available in the dashboard.
Why We Measure Top 2 Chunks
The focus on top 2 chunks in our metrics provides a reliable signal for system health while avoiding common measurement pitfalls:
- Measuring too many chunks can dilute your metrics. Poor relevance in lower-ranked chunks doesn't necessarily indicate system issues.
- RAG systems usually form their answers primarily from the most relevant pieces of information. Having high relevance in the top 2 chunks means the LLM is more likely to generate accurate and focused responses.
By focusing on top 2 chunks, we get clean, actionable metrics that reliably indicate system health.
Core Metrics
The dashboard displays three primary metrics to help you evaluate retrieval performance:
Non-Rewritten Relevance
- Measures the relevance of the top 2 chunks for queries in their original form
- Ranges from 0 to 1, where higher values indicate better relevance
- Color-coded for quick assessment:
- Green (≥ 0.7): High relevance
- Yellow (≥ 0.3): Moderate relevance
- Red (< 0.3): Low relevance
Rewritten Question Relevance
- Shows the relevance of the top 2 chunks for queries after they've been rewritten
- Uses the same 0-1 scale and color coding as non-rewritten relevance
- Helps evaluate if query rewriting improves retrieval quality
Overall Retrieval Health
- Combines all relevance metrics into a single health score
- Provides a high-level view of system performance
- Uses the same color-coded thresholds to indicate overall health
- Calculated as the running average of all available metrics
Interactive Visualization
The dashboard includes an interactive time series chart that shows how metrics change over time:
- View performance trends over the desired time range
- Toggle individual metrics by clicking on their cards
- Hover over data points to see exact values
- Compare rewritten vs non-rewritten performance
- Track overall health progression
Usage Monitoring
The dashboard also includes usage statistics to help you track resource utilization:
Retrievals
- Shows current usage against your monthly quota
- Tracks standard retrieval operations
- Helps monitor usage patterns and limits
Advanced Retrievals
- Displays usage of enhanced retrieval features
- Includes operations like rewritten queries
- Helps manage resource allocation
Tips for Using the Dashboard
- Metric Toggle: Click on any metric card to show/hide its line in the graph
- Performance Analysis: Use the time series visualization to identify:
- Sudden changes in performance
- Impact of system updates
- Time-based patterns
- Health Monitoring: Keep an eye on the Overall Retrieval Health score for:
- System-wide performance issues
- Long-term trends
- Impact of optimizations
Interpreting Results
When analyzing your metrics, consider:
- A consistent Overall Health score above 0.7 indicates strong retrieval performance
- Large gaps between rewritten and non-rewritten relevance suggest opportunities for query optimization
- Sudden drops in metrics may indicate underlying issues that need investigation
- Usage patterns can help with capacity planning and resource allocation
Best Practices
- Regular Monitoring: Check the dashboard regularly to catch issues early
- Comparative Analysis: Compare rewritten vs non-rewritten metrics to optimize your system
- Trend Analysis: Use the time series data to identify patterns and make informed improvements