Python Specific Formats Processing packages

« All Tags

Selected Tags

Click on a tag to remove it

Specific Formats Processing

More Tags

Click on a tag to add it and filter down

Specific Formats Processing packages

Showing projects tagged as Specific Formats Processing

PyPDF2

8.8 9.5 L2 Python

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
WeasyPrint

8.6 9.7 L1 Python

The awesome document factory
PyMuPDF

8.6 9.7 Python

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Kreuzberg

8.4 10.0 Rust

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
PDFMiner

8.3 0.0 L3 Python

DISCONTINUED. Python PDF Parser (Not actively maintained). Check out pdfminer.six.
python-docx

8.2 8.1 L5 Python

Create and modify Word documents with Python
csvkit

8.1 7.5 L3 Python

A suite of utilities for converting to and working with CSV, the king of tabular file formats.
Python-Markdown

7.8 7.5 Python

A Python implementation of John Gruber’s Markdown with Extension support.
tablib

7.8 4.4 L4 Python

Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
XlsxWriter

7.6 7.7 L3 Python

A Python module for creating Excel XLSX files.
Kaitai Struct

7.5 7.5 Shell

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby / Rust
python-pptx

7.3 6.9 Python

Create Open XML PowerPoint documents in Python
Camelot

7.3 7.6 Python

A Python library to extract tabular data from PDFs
xlwings

7.2 8.0 L4 Python

xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.
borb

6.8 8.9 Python

borb is a library for reading, creating and manipulating PDF files in python.
markdown2

6.8 8.3 Python

markdown2: A fast and complete implementation of Markdown in Python
unoconv

6.7 0.0 Python

DISCONTINUED. Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.
Mistune

6.7 7.2 L4 Python

A fast yet powerful Python Markdown parser with renderers and plugins.
docxtpl

6.6 6.8 Python

Use a docx as a jinja2 template
pdftabextract

6.4 0.0 L3 Python

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
xlwt

5.4 0.0 L3 Python

DISCONTINUED. Writing and reading data and formatting information from Excel files.
pyexcel

5.1 8.7 L5 Python

Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files
pymorphy2

5.0 0.0 Python

Morphological analyzer / inflection engine for Russian and Ukrainian languages.
mistletoe

4.7 6.8 Python

A fast, extensible and spec-compliant Markdown parser in pure Python.
Construct

4.7 2.7 Python

Construct: Declarative data structures for python that allow symmetric parsing and building
plutoprint

4.5 9.2 Python

A Python Library for Generating PDFs and Images from HTML, powered by PlutoBook
openpyxl

4.4 -

A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
unp

3.5 0.0 L5 Python

Unpacks things.
ReportLab

3.4 -

Allowing Rapid creation of rich PDF documents.
Meltano Singer SDK

2.8 9.8 Python

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com