Vectorize Iris
Vectorize Iris is a model-based extraction solution that transforms how AI systems handle PDFs and documents. It combines extraction and chunking into one streamlined process, making it easier than ever to get clean, usable text from complex documents.
Why Use Iris?
Iris replaces what traditionally required multiple tools (like PyPDF for extraction and RecursiveSplitter for chunking) with a single, more capable solution. It's particularly good at maintaining text semantics when converting to markdown, giving your LLMs cleaner context to work with.
Key advantages:
- Advanced format handling - Works with PDFs, images, scans, and embedded content
- Precise table parsing - Handles even complex tables accurately
- Multi-language support - Process 100+ languages including Hindi, Arabic, and Chinese
- Metadata extraction - Extract structured data using JSON schemas
Use Iris in RAG Pipelines
The easiest way to use Iris is directly in your Vectorize RAG pipelines. When creating or configuring a pipeline, select Vectorize Iris as your extraction method. The pipeline will automatically use Iris to process all incoming documents, maintaining structure and semantics for optimal retrieval.
Perfect for:
- Production RAG systems that need reliable extraction
- Pipelines processing diverse document types
- Applications requiring structured metadata extraction
Learn more about creating pipelines and automatic metadata extraction.
Use Iris via API or CLI
For direct extraction outside of pipelines, Iris is available through:
API Extraction
Programmatically extract text from documents using the Vectorize API. Ideal for:
- Custom extraction workflows
- Integration with existing applications
- On-demand document processing
See the full SDK documentation with code examples in Python and JavaScript.
CLI Tool
For local testing and development, use the Vectorize Iris command-line tool:
# Install
curl -fsSL https://raw.githubusercontent.com/vectorize-io/vectorize-iris/refs/heads/main/install.sh | sh
# Extract from a document
vectorize-iris document.pdf
# Process a directory
vectorize-iris ./documents -f ./output
Perfect for quick tests, batch processing, and CI/CD pipelines. Full installation and usage details in the CLI documentation.
Test Iris with the Extraction Tester
Before committing to an extraction method, test how Iris handles your specific documents using the Extraction Tester.
The Extraction Tester lets you:
- Upload sample documents
- Compare Iris against other extraction methods
- View results in multiple formats (text, markdown, chunks)
- Test metadata extraction schemas
- Verify table and image handling
Use this to ensure Iris meets your quality requirements before building production pipelines.