1

    nvidia-ocr

    High-precision OCR for images, tables, and handwriting using NVIDIA NeMo Retriever.

    Updated Apr 2026
    Security scanned
    One-time purchase

    $12

    One-time purchase · Own forever

    ⚡ Also available via Agensi Pro — your AI agent can load this skill on demand via MCP. Learn more →

    Included in download

    • Extract tabular data from screenshots or PDFs into structured text.
    • Digitize handwritten notes and save them as searchable markdown.
    • terminal, network, env_vars automation included
    • Includes example output and usage patterns
    • Instant install

    See it in action

    [99.2%] INVOICE #1024
    [98.5%] Date: 2023-11-15
    [95.1%] Total: $1,250.00
    [88.4%] Item: NVIDIA H100 GPU (Qty: 1)
    Full text saved to: ~/.claude-ocr/ocr_1700000000.txt
    Total text blocks: 4

    About This Skill

    What it does

    This skill provides high-performance Optical Character Recognition (OCR) by leveraging the NVIDIA NeMo Retriever API. It allows your AI agent to "see" and extract text from images and documents with professional-grade accuracy. It handles complex structures like tables, charts, receipts, and even handwriting, returning structured text along with confidence scores and bounding box data.

    Why use this skill

    Standard LLM vision capabilities can sometimes hallucinate text or struggle with small, dense data like tables or low-quality screenshots. This skill uses a specialized OCR model optimized for precision. It supports batch processing of entire directories, provides confidence metrics to ensure data reliability, and automatically saves output to structured files for further analysis. It is significantly faster and more accurate for data extraction tasks than generic vision prompting.

    Supported tools

    • NVIDIA NeMo Retriever: State-of-the-art OCR foundation model.
    • Python Integration: Built-in handling for Base64 encoding and batch file processing.
    • Exporting: Saves results locally in .txt or .md formats for easy developer access.

    Use Cases

    • Extract tabular data from screenshots or PDFs into structured text.
    • Digitize handwritten notes and save them as searchable markdown.
    • Batch process a folder of images to extract and aggregate text data.
    • Verify automated test results by extracting text from UI screenshots.

    Reviews

    No reviews yet — be the first to share your experience.

    Only users who have downloaded or purchased this skill can leave a review.

    Security Scanned

    Passed automated security review

    Permissions

    Terminal / Shell
    Network Access
    Environment Variables

    Allowed Hosts

    build.nvidia.com
    ai.api.nvidia.com

    Frequently Asked Questions

    $12

    One-time