Using the RAG Sandbox
Last updated
Last updated
This section provides a detailed explanation of the key features in the RAG Sandbox, focusing on vector index selection and Large Language Model (LLM) settings.
To begin using the RAG Sandbox, you can access it from the RAG Evaluation Dashboard by clicking the button labeled "Open in RAG Sandbox" as shown below. This allows you to interact with vector indexes that were generated during the evaluation process.
The RAG Sandbox is an interactive tool that enables you to test and refine vector-based retrieval using real-time queries. In this environment, you can evaluate different vector search indexes generated during your RAG Evaluation, configure the Large Language Model (LLM), and fine-tune retrieval settings such as the number of top results (k-value). It provides live feedback on how well the system retrieves relevant document chunks and generates answers.
The Select Vector Search Index section is where you choose from the vector indexes generated during your RAG Evaluation. These indexes are created based on different vectorization plans that vary by embedding model, chunk size, overlap, and other parameters. Each vector index includes detailed metrics such as:
Avg. Relevancy: Indicates the average relevance of the retrieved documents for a query.
NDCG (Normalized Discounted Cumulative Gain): Measures the quality of the ranked document retrieval.
You can use this section to compare vector indexes and select the most appropriate one for your query.
The System Behavior setting allows you to define how the LLM behaves when generating responses. This is similar to the "System" parameter in an API call to a language model. For instance, you can instruct the assistant to act as a "helpful assistant" or specify other personas that match your use case. This setting provides more control over the tone and type of responses generated by the LLM.
In the Question input box, you can type your own custom question to interact with the vector search index. This is where you input the query you want the system to answer based on the retrieved documents. After typing your question, the system will use the selected vector search index to retrieve relevant document chunks and generate a response based on that context.
In addition to typing your own question, you can also select from a list of Sample Questions. These synthetic questions are automatically generated based on the documents used in the RAG Evaluation. Clicking on a sample question will automatically populate the question input box, allowing you to quickly test how well the system retrieves relevant information and generates a response.
The Prompt section allows you to adjust the template that the LLM uses to generate its response. This prompt includes placeholders for documents and context, which will be filled by the retrieved document chunks. You can modify this template to guide the LLM on how to respond based on the retrieved context, ensuring more precise or tailored answers.
The Retrieved Context section displays the document chunks retrieved based on the query. This section also includes key metrics such as:
Avg. Relevancy: Shows the average relevance of the retrieved document chunks.
NDCG: Measures how well the retrieved documents are ranked.
Cosine Similarity: Indicates how similar the retrieved document vectors are to the query.
You can also control the number of top results retrieved by adjusting the k-value.
You can control the k-value, which specifies the number of top results retrieved from the vector search index. The system allows you to choose between retrieving the top 1 to 10 most relevant document chunks. Adjusting this setting helps in balancing relevance with the amount of context provided.
Once the document chunks are retrieved, the LLM generates a response based on the provided context. This response is displayed in the LLM Response section. The content of the response depends on both the retrieved documents and the prompt template defined earlier.
To open the LLM Settings, click on the model name and gear icon above the LLM response.
You can select which Large Language Model (LLM) will be used to generate responses along with the settings.
The available models include:
GPT-4o
GPT-4o mini
LLaMA3.1 70B
LLaMA3.1 8B
Mixtral-8x7b
Gemma 2 9B
Each model may have different strengths and performance characteristics, allowing you to experiment with various models to find the best fit for your use case.
Setting | Description |
---|---|
Temperature
Temperature controls the level of randomness in the LLM's output. - A lower temperature makes the model more conservative and deterministic, therefore making the output more focused and predictable. - A higher temperature increases randomness, making the model more creative or exploratory.
Top P
Top P determines how much of the probability mass of all possible outcomes is considered when generating the next token. - With a lower Top P, the model will choose from the most probable choices, keeping the responses very focused and factual. - With a higher Top P, the model considers more choices, resulting in more diverse responses.