Vectorize Iris CLI
The Vectorize Iris CLI tool provides the fastest way to extract text from documents without writing any code. Perfect for testing, development, batch processing, and CI/CD pipelines.
Installation
curl -fsSL https://raw.githubusercontent.com/vectorize-io/vectorize-iris/refs/heads/main/install.sh | sh
Setup
Connect the CLI to your Vectorize account:
vectorize-iris configure
What happens:
- Opens your browser to the Vectorize platform
- Click "Authorize" to grant access
- Credentials are automatically saved to
~/.vectorize-iris/credentials - Done! You're ready to extract
Basic Usage
Extract text from a document:
vectorize-iris document.pdf
Extract from a URL:
vectorize-iris https://arxiv.org/pdf/2206.01062
Output Formats
Save as JSON:
vectorize-iris document.pdf -o json -f output.json
Save as plain text:
vectorize-iris document.pdf -o text -f output.txt
Save as YAML:
vectorize-iris document.pdf -o yaml -f output.yaml
Pipe to other tools:
vectorize-iris document.pdf -o json | jq -r '.text' > output.txt
Batch Processing
Process entire directories:
# Process all files in a directory
vectorize-iris ./documents -f ./output
# Process with JSON output
vectorize-iris ./documents -o json -f ./output
# Process scans as text
vectorize-iris ./scans -o text -f ./extracted
Advanced Options
Semantic Chunking
Split documents into semantic chunks:
vectorize-iris long-document.pdf --chunk-size 512
Custom Parsing Instructions
Guide the extraction with custom instructions:
vectorize-iris report.pdf --parsing-instructions \
"Extract only tables and numerical data, ignore narrative text"
Metadata Extraction
Extract structured metadata with a schema:
vectorize-iris invoice.pdf \
--metadata-schema 'invoice:{"invoice_number":"string","date":"string","total_amount":"number","vendor_name":"string"}' \
-o json
Document Classification
Classify documents using multiple schemas:
vectorize-iris document.pdf \
--metadata-schema 'invoice:{"invoice_number":"string",...}' \
--metadata-schema 'receipt:{...}' \
--metadata-schema 'contract:{...}' \
--metadata-schema 'cv:{...}' \
-o json
The tool will match your document against the schemas and extract relevant fields.
Infer Metadata Schema
Let Iris automatically detect and extract metadata:
vectorize-iris document.pdf --infer-metadata-schema -o json
Performance Tuning
For large documents, adjust timeout and polling:
vectorize-iris large-document.pdf \
--timeout 600 \
--poll-interval 5
Combined Options
Combine multiple options for complex workflows:
vectorize-iris document.pdf \
--chunk-size 256 \
--infer-metadata-schema \
--parsing-instructions "Focus on extracting structured data" \
-o yaml -f output.yaml
Common Use Cases
Quick testing:
vectorize-iris sample.pdf
Production batch processing:
vectorize-iris ./invoices --metadata-schema 'invoice:{...}' -o json -f ./processed
CI/CD integration:
vectorize-iris ./docs --chunk-size 512 -o json -f ./extracted
Research paper extraction:
vectorize-iris https://arxiv.org/pdf/2206.01062 \
--parsing-instructions "Extract title, abstract, methodology, and conclusions" \
-o json -f paper-analysis.json
Next Steps
- Try the Python or Node.js SDKs for programmatic access
- Learn more about Vectorize Iris
- Test extraction in the Extraction Tester
- Explore the vectorize-iris GitHub repository