PDF packages

Showing projects tagged as Specific Formats Processing and PDF

  • PyPDF2

    8.8 9.5 L2 Python
    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
  • WeasyPrint

    8.6 9.7 L1 Python
    The awesome document factory
  • PyMuPDF

    8.6 9.7 Python
    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
  • Kreuzberg

    8.4 10.0 Rust
    A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
  • PDFMiner

    8.3 0.0 L3 Python
    DISCONTINUED. Python PDF Parser (Not actively maintained). Check out pdfminer.six.
  • Camelot

    7.3 7.6 Python
    A Python library to extract tabular data from PDFs
  • borb

    6.8 8.9 Python
    borb is a library for reading, creating and manipulating PDF files in python.
  • pdftabextract

    6.4 0.0 L3 Python
    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
  • plutoprint

    4.5 9.2 Python
    A Python Library for Generating PDFs and Images from HTML, powered by PlutoBook
  • ReportLab

    3.4 -
    Allowing Rapid creation of rich PDF documents.
  • Meltano Singer SDK

    2.8 9.8 Python
    Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com