Skip to main content

Vectorize Iris

Vectorize Iris is a model-based extraction solution that transforms how AI systems handle PDFs and documents. It combines extraction and chunking into one streamlined process, making it easier than ever to get clean, usable text from complex documents.

Why Use Iris?

Iris replaces what traditionally required multiple tools (like PyPDF for extraction and RecursiveSplitter for chunking) with a single, more capable solution. It's particularly good at maintaining text semantics when converting to markdown, giving your LLMs cleaner context to work with.

Key advantages:

  • Advanced format handling - Works with PDFs, images, scans, and embedded content
  • Precise table parsing - Handles even complex tables accurately
  • Multi-language support - Process 100+ languages including Hindi, Arabic, and Chinese
  • Metadata extraction - Extract structured data using JSON schemas

Use Iris in RAG Pipelines

The easiest way to use Iris is directly in your Vectorize RAG pipelines. When creating or configuring a pipeline, select Vectorize Iris as your extraction method. The pipeline will automatically use Iris to process all incoming documents, maintaining structure and semantics for optimal retrieval.

Perfect for:

  • Production RAG systems that need reliable extraction
  • Pipelines processing diverse document types
  • Applications requiring structured metadata extraction

Learn more about creating pipelines and automatic metadata extraction.

Use Iris via API or CLI

For direct extraction outside of pipelines, Iris is available through:

API Extraction

Programmatically extract text from documents using the Vectorize API. Ideal for:

  • Custom extraction workflows
  • Integration with existing applications
  • On-demand document processing

See the full SDK documentation with code examples in Python and JavaScript.

CLI Tool

For local testing and development, use the Vectorize Iris command-line tool:

# Install
curl -fsSL https://raw.githubusercontent.com/vectorize-io/vectorize-iris/refs/heads/main/install.sh | sh

# Extract from a document
vectorize-iris document.pdf

# Process a directory
vectorize-iris ./documents -f ./output

Perfect for quick tests, batch processing, and CI/CD pipelines. Full installation and usage details in the CLI documentation.

Test Iris with the Extraction Tester

Before committing to an extraction method, test how Iris handles your specific documents using the Extraction Tester.

The Extraction Tester lets you:

  • Upload sample documents
  • Compare Iris against other extraction methods
  • View results in multiple formats (text, markdown, chunks)
  • Test metadata extraction schemas
  • Verify table and image handling

Use this to ensure Iris meets your quality requirements before building production pipelines.

Was this page helpful?