Skip to main content

Vectorize Iris CLI

The Vectorize Iris CLI tool provides the fastest way to extract text from documents without writing any code. Perfect for testing, development, batch processing, and CI/CD pipelines.

Installation

curl -fsSL https://raw.githubusercontent.com/vectorize-io/vectorize-iris/refs/heads/main/install.sh | sh

Setup

Connect the CLI to your Vectorize account:

vectorize-iris configure

What happens:

  1. Opens your browser to the Vectorize platform
  2. Click "Authorize" to grant access
  3. Credentials are automatically saved to ~/.vectorize-iris/credentials
  4. Done! You're ready to extract

Basic Usage

Extract text from a document:

vectorize-iris document.pdf

Extract from a URL:

vectorize-iris https://arxiv.org/pdf/2206.01062

Output Formats

Save as JSON:

vectorize-iris document.pdf -o json -f output.json

Save as plain text:

vectorize-iris document.pdf -o text -f output.txt

Save as YAML:

vectorize-iris document.pdf -o yaml -f output.yaml

Pipe to other tools:

vectorize-iris document.pdf -o json | jq -r '.text' > output.txt

Batch Processing

Process entire directories:

# Process all files in a directory
vectorize-iris ./documents -f ./output

# Process with JSON output
vectorize-iris ./documents -o json -f ./output

# Process scans as text
vectorize-iris ./scans -o text -f ./extracted

Advanced Options

Semantic Chunking

Split documents into semantic chunks:

vectorize-iris long-document.pdf --chunk-size 512

Custom Parsing Instructions

Guide the extraction with custom instructions:

vectorize-iris report.pdf --parsing-instructions \
"Extract only tables and numerical data, ignore narrative text"

Metadata Extraction

Extract structured metadata with a schema:

vectorize-iris invoice.pdf \
--metadata-schema 'invoice:{"invoice_number":"string","date":"string","total_amount":"number","vendor_name":"string"}' \
-o json

Document Classification

Classify documents using multiple schemas:

vectorize-iris document.pdf \
--metadata-schema 'invoice:{"invoice_number":"string",...}' \
--metadata-schema 'receipt:{...}' \
--metadata-schema 'contract:{...}' \
--metadata-schema 'cv:{...}' \
-o json

The tool will match your document against the schemas and extract relevant fields.

Infer Metadata Schema

Let Iris automatically detect and extract metadata:

vectorize-iris document.pdf --infer-metadata-schema -o json

Performance Tuning

For large documents, adjust timeout and polling:

vectorize-iris large-document.pdf \
--timeout 600 \
--poll-interval 5

Combined Options

Combine multiple options for complex workflows:

vectorize-iris document.pdf \
--chunk-size 256 \
--infer-metadata-schema \
--parsing-instructions "Focus on extracting structured data" \
-o yaml -f output.yaml

Common Use Cases

Quick testing:

vectorize-iris sample.pdf

Production batch processing:

vectorize-iris ./invoices --metadata-schema 'invoice:{...}' -o json -f ./processed

CI/CD integration:

vectorize-iris ./docs --chunk-size 512 -o json -f ./extracted

Research paper extraction:

vectorize-iris https://arxiv.org/pdf/2206.01062 \
--parsing-instructions "Extract title, abstract, methodology, and conclusions" \
-o json -f paper-analysis.json

Next Steps

Was this page helpful?