Specific Formats Processing packages

Showing projects tagged as Specific Formats Processing

  • PyPDF2

    8.8 9.5 L2 Python
    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
  • WeasyPrint

    8.6 9.7 L1 Python
    The awesome document factory
  • PyMuPDF

    8.6 9.7 Python
    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
  • Kreuzberg

    8.4 10.0 Rust
    A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
  • PDFMiner

    8.3 0.0 L3 Python
    DISCONTINUED. Python PDF Parser (Not actively maintained). Check out pdfminer.six.
  • python-docx

    8.2 8.1 L5 Python
    Create and modify Word documents with Python
  • csvkit

    8.1 7.5 L3 Python
    A suite of utilities for converting to and working with CSV, the king of tabular file formats.
  • Python-Markdown

    7.8 7.5 Python
    A Python implementation of John Gruber’s Markdown with Extension support.
  • tablib

    7.8 4.4 L4 Python
    Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
  • XlsxWriter

    7.6 7.7 L3 Python
    A Python module for creating Excel XLSX files.
  • Kaitai Struct

    7.5 7.5 Shell
    Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby / Rust
  • python-pptx

    7.3 6.9 Python
    Create Open XML PowerPoint documents in Python
  • Camelot

    7.3 7.6 Python
    A Python library to extract tabular data from PDFs
  • xlwings

    7.2 8.0 L4 Python
    xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.
  • borb

    6.8 8.9 Python
    borb is a library for reading, creating and manipulating PDF files in python.
  • markdown2

    6.8 8.3 Python
    markdown2: A fast and complete implementation of Markdown in Python
  • unoconv

    6.7 0.0 Python
    DISCONTINUED. Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.
  • Mistune

    6.7 7.2 L4 Python
    A fast yet powerful Python Markdown parser with renderers and plugins.
  • docxtpl

    6.6 6.8 Python
    Use a docx as a jinja2 template
  • pdftabextract

    6.4 0.0 L3 Python
    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
  • xlwt

    5.4 0.0 L3 Python
    DISCONTINUED. Writing and reading data and formatting information from Excel files.
  • pyexcel

    5.1 8.7 L5 Python
    Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files
  • pymorphy2

    5.0 0.0 Python
    Morphological analyzer / inflection engine for Russian and Ukrainian languages.
  • mistletoe

    4.7 6.8 Python
    A fast, extensible and spec-compliant Markdown parser in pure Python.
  • Construct

    4.7 2.7 Python
    Construct: Declarative data structures for python that allow symmetric parsing and building
  • plutoprint

    4.5 9.2 Python
    A Python Library for Generating PDFs and Images from HTML, powered by PlutoBook
  • openpyxl

    4.4 -
    A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
  • unp

    3.5 0.0 L5 Python
    Unpacks things.
  • ReportLab

    3.4 -
    Allowing Rapid creation of rich PDF documents.
  • Meltano Singer SDK

    2.8 9.8 Python
    Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com