Understanding Vectorize Iris

We're excited to introduce Vectorize Iris, a model-based extraction solution that transforms how RAG systems handle PDFs. It combines extraction and chunking into one streamlined process, making it easier than ever to get clean, usable text from complex documents.

Why Use Iris?

Under the hood: Iris replaces what traditionally required multiple tools (like PyPDF for extraction and RecursiveSplitter for chunking) with a single, more capable solution. It's particularly good at maintaining text semantics when converting to markdown, giving your LLMs cleaner context to work with.

Key advantages:

  • Smart PDF processing - splits pages while preserving semantic structure

  • Handles both standalone images and PDFs with embedded images

  • Super precise table parsing (yes, even those complex ones!)

Last updated

Was this helpful?