Extraction Tester
Last updated
Was this helpful?
Last updated
Was this helpful?
The Extraction Tester lets you test how different extraction methods process your documents before creating a RAG pipeline. This helps you choose the best extraction method for your specific documents and use case.
Upload your document using the file selector or drag-and-drop interface.
Select an extraction type from the dropdown menu:
Fast: A lightweight extractor optimized for speed and basic document processing
Vectorize Iris: Our advanced document processing solution that intelligently preserves document structure and semantics
Configure extraction settings, then click Extract Text.
Chunk Size: Maximum number of characters in each chunk
Start Page: Beginning page for extraction (optional)
Page Limit: Maximum number of pages to process (optional)
After processing, you can view your document in four different formats:
Text: Raw extracted text
Text Chunks: Text split into chunks based on your specified chunk size
Markdown: Extracted content with preserved formatting
Markdown Chunks: Formatted content split into chunks
Vectorize Iris is a model-based extraction solution that transforms how RAG systems handle PDFs. It combines extraction and chunking into one streamlined process, making it easier than ever to get clean, usable text from complex documents.
When using Vectorize Iris, you'll notice enhanced processing features including:
Preserved document structure and formatting
Intelligent handling of tables and images
For more details about Iris' capabilities, see Understanding Vectorize Iris.
Test your document extraction when you want to:
Compare different extraction methods
Verify how complex documents like PDFs will be processed
Check if tables and formatting are preserved as needed
Ensure your documents are processed correctly before creating a pipeline