Python Specific Formats Processing OCR packages

« All Tags

Selected Tags

Click on a tag to remove it

More Tags

Click on a tag to add it and filter down

OCR packages

Showing projects tagged as Specific Formats Processing and OCR

PyMuPDF

8.6 9.7 Python

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Kreuzberg

8.4 10.0 Rust

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
pdftabextract

6.4 0.0 L3 Python

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Awesome Python is part of the LibHunt network. Terms. Privacy Policy.

We recommend Spin The Wheel Of Names for a cryptographically secure random name picker.