SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Parser Projects
-
MinerU
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
Big growth in projects with commercial intent choosing AGPL-3.0 (Firecrawl, MinerU, Daytona)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: pydantic VS ctxure - a user suggested alternative | libhunt.com/r/pydantic | 2026-06-05
-
I'm curious because I have a similar use case for a querying frontend. Did you consider using https://github.com/tobymao/sqlglot? If so, what was missing to justify writing your own parser?
-
MegaParse
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
MegaParse: The ultimate parser for LLMs.
-
PDFMiner.six Official Documentation
-
Lark
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
I've been using the Lark library to handle parsing. I'd never experimented with it before, but now have a good deal of experience stretching it's LALR rules. This gives an ideal O(n) performance. I have been surprised at the cost of building the grammar at runtime. Fortunately the library is quite prepared for this and comes with some high level caching options.
-
-
-
Project mention: Snoop Project Update (search for usernames on 5k websites) | news.ycombinator.com | 2026-01-01
-
-
oletools
oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
-
rdflib
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
-
-
>For example, it cannot specialize an element type for lists.
Yes, but that would be a CL violation (or an extension to provide via something else than DEFTYPE), since DEFTYPE's body can't be infinitely recursive; cf https://www.lispworks.com/documentation/HyperSpec/Body/m_def...
>if you attempt to (declaim) every function, you will immediately see how vague and insufficient the types come out compared to even Python.
Indeed, but it is 1) used by the compiler itself while Cpython currently ignores annotations and 2) runtime and buildtime typing use the same semantics and syntax, so you don't need band-aids like https://github.com/agronholm/typeguard
But yeah, CL's type system is lacking in many places. In order of practical advantages and difficulty to add (maybe): recursive DEFTYPE, typed HASH-TABLEs (I mean the keys and values), static typing of CLOS slots (invasive, like https://github.com/marcoheisig/fast-generic-functions), ..., parametric typing beyond ARRAYs.
-
-
-
python-user-agents
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.
-
cinemagoer
Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies
-
Project mention: Compressing Icelandic name declension patterns into a 3.27 kB trie | news.ycombinator.com | 2025-08-02
You might be able to build something similar yourself using declension data extracted from Wiktionary using wiktextract: https://github.com/tatuylonen/wiktextract#pre-extracted-data
-
-
-
Construct
Construct: Declarative data structures for python that allow symmetric parsing and building
-
guessit
GuessIt is a python library that extracts as much information as possible from a video filename.
Python Parser discussion
Python Parser related posts
-
YAML? That's Norway Problem
-
Zero: The Programming Language for Agents
-
Architecture Teardown: How Meta Trains LLMs for Code Generation on 100k GPU Clusters
-
Comp Language Syntax
-
Fortifying LLM Applications: Robust Guardrails for AI Outputs in Python
-
A 2.5x faster Postgres parser with Claude Code
-
FastAPI from Zero: Writing Your First API Route
-
A note from our sponsor - SaaSHub
www.saashub.com | 13 Jun 2026
Index
What are some of the best open-source Parser projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | MinerU | 67,139 |
| 2 | pydantic | 28,019 |
| 3 | sqlglot | 9,318 |
| 4 | MegaParse | 7,366 |
| 5 | pdfminer.six | 6,988 |
| 6 | Lark | 5,907 |
| 7 | json_repair | 4,967 |
| 8 | sqlparse | 4,006 |
| 9 | snoop | 3,944 |
| 10 | phonenumbers | 3,748 |
| 11 | oletools | 3,351 |
| 12 | rdflib | 2,458 |
| 13 | m3u8 | 2,263 |
| 14 | typeguard | 1,763 |
| 15 | strictyaml | 1,616 |
| 16 | godot-gdscript-toolkit | 1,554 |
| 17 | python-user-agents | 1,515 |
| 18 | cinemagoer | 1,317 |
| 19 | wiktextract | 1,177 |
| 20 | ViperMonkey | 1,118 |
| 21 | pdfsyntax | 1,012 |
| 22 | Construct | 1,000 |
| 23 | guessit | 911 |