Understanding Vectorize Iris
We're excited to introduce Vectorize Iris, a model-based extraction solution that transforms how RAG systems handle PDFs. It combines extraction and chunking into one streamlined process, making it easier than ever to get clean, usable text from complex documents.
Why Use Iris?
Under the hood: Iris replaces what traditionally required multiple tools (like PyPDF for extraction and RecursiveSplitter for chunking) with a single, more capable solution. It's particularly good at maintaining text semantics when converting to markdown, giving your LLMs cleaner context to work with.
Key advantages:
Smart PDF processing - splits pages while preserving semantic structure
Handles both standalone images and PDFs with embedded images
Super precise table parsing (yes, even those complex ones!)
Last updated
Was this helpful?