[Go to site: main page, start]

Python Parser

Open-source Python projects categorized as Parser

Top 23 Python Parser Projects

  1. MinerU

    Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

    Project mention: AI Killed One Open Source Business Model; Not Open Source | dev.to | 2026-01-13

    Big growth in projects with commercial intent choosing AGPL-3.0 (Firecrawl, MinerU, Daytona)

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. pydantic

    Data validation using Python type hints

    Project mention: pydantic VS ctxure - a user suggested alternative | libhunt.com/r/pydantic | 2026-06-05
  4. sqlglot

    Python SQL Parser and Transpiler

    Project mention: A 2.5x faster Postgres parser with Claude Code | news.ycombinator.com | 2026-02-06

    I'm curious because I have a similar use case for a querying frontend. Did you consider using https://github.com/tobymao/sqlglot? If so, what was missing to justify writing your own parser?

  5. MegaParse

    File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

    Project mention: 📰 All Data and AI Weekly #231-02March2026 | dev.to | 2026-03-02

    MegaParse: The ultimate parser for LLMs.

  6. pdfminer.six

    Community maintained fork of pdfminer - we fathom PDF

    Project mention: Extract Text from PDFs with PDFMiner in Python | dev.to | 2025-12-29

    PDFMiner.six Official Documentation

  7. Lark

    Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

    Project mention: Comp Language Syntax | dev.to | 2026-03-30

    I've been using the Lark library to handle parsing. I'd never experimented with it before, but now have a good deal of experience stretching it's LALR rules. This gives an ideal O(n) performance. I have been surprised at the cost of building the grammar at runtime. Fortunately the library is quite prepared for this and comes with some high level caching options.

  8. json_repair

    Repair malformed JSON from LLMs, APIs, logs, and user input in Python.

  9. sqlparse

    A non-validating SQL parser module for Python

  10. snoop

    Snoop — инструмент разведки на основе открытых данных (OSINT world)

    Project mention: Snoop Project Update (search for usernames on 5k websites) | news.ycombinator.com | 2026-01-01
  11. phonenumbers

    Python port of Google's libphonenumber

  12. oletools

    oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.

  13. rdflib

    RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.

  14. m3u8

    Python m3u8 Parser for HTTP Live Streaming (HLS) Transmissions

  15. typeguard

    Run-time type checker for Python

    Project mention: Steel Bank Common Lisp | news.ycombinator.com | 2026-02-24

    >For example, it cannot specialize an element type for lists.

    Yes, but that would be a CL violation (or an extension to provide via something else than DEFTYPE), since DEFTYPE's body can't be infinitely recursive; cf https://www.lispworks.com/documentation/HyperSpec/Body/m_def...

    >if you attempt to (declaim) every function, you will immediately see how vague and insufficient the types come out compared to even Python.

    Indeed, but it is 1) used by the compiler itself while Cpython currently ignores annotations and 2) runtime and buildtime typing use the same semantics and syntax, so you don't need band-aids like https://github.com/agronholm/typeguard

    But yeah, CL's type system is lacking in many places. In order of practical advantages and difficulty to add (maybe): recursive DEFTYPE, typed HASH-TABLEs (I mean the keys and values), static typing of CLOS slots (invasive, like https://github.com/marcoheisig/fast-generic-functions), ..., parametric typing beyond ARRAYs.

  16. strictyaml

    Type-safe YAML parser and validator.

    Project mention: YAML? That's Norway Problem | news.ycombinator.com | 2026-05-22
  17. godot-gdscript-toolkit

    Independent set of GDScript tools - parser, linter, formatter, and more

  18. python-user-agents

    A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

  19. cinemagoer

    Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

  20. wiktextract

    Wiktionary dump file parser and multilingual data extractor

    Project mention: Compressing Icelandic name declension patterns into a 3.27 kB trie | news.ycombinator.com | 2025-08-02

    You might be able to build something similar yourself using declension data extracted from Wiktionary using wiktextract: https://github.com/tatuylonen/wiktextract#pre-extracted-data

  21. ViperMonkey

    A VBA parser and emulation engine to analyze malicious macros.

  22. pdfsyntax

    A Python library to inspect and modify the internal structure of a PDF file

  23. Construct

    Construct: Declarative data structures for python that allow symmetric parsing and building

  24. guessit

    GuessIt is a python library that extracts as much information as possible from a video filename.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Parser discussion

Log in or Post with

Python Parser related posts

  • YAML? That's Norway Problem

    2 projects | news.ycombinator.com | 22 May 2026
  • Zero: The Programming Language for Agents

    2 projects | news.ycombinator.com | 16 May 2026
  • Architecture Teardown: How Meta Trains LLMs for Code Generation on 100k GPU Clusters

    4 projects | dev.to | 29 Apr 2026
  • Comp Language Syntax

    1 project | dev.to | 30 Mar 2026
  • Fortifying LLM Applications: Robust Guardrails for AI Outputs in Python

    1 project | dev.to | 18 Mar 2026
  • A 2.5x faster Postgres parser with Claude Code

    2 projects | news.ycombinator.com | 6 Feb 2026
  • FastAPI from Zero: Writing Your First API Route

    1 project | dev.to | 11 Jan 2026
  • A note from our sponsor - SaaSHub
    www.saashub.com | 12 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Parser projects in Python? This list will help you:

# Project Stars
1 MinerU 67,139
2 pydantic 28,019
3 sqlglot 9,318
4 MegaParse 7,366
5 pdfminer.six 6,988
6 Lark 5,901
7 json_repair 4,967
8 sqlparse 4,006
9 snoop 3,944
10 phonenumbers 3,747
11 oletools 3,351
12 rdflib 2,458
13 m3u8 2,263
14 typeguard 1,763
15 strictyaml 1,616
16 godot-gdscript-toolkit 1,554
17 python-user-agents 1,515
18 cinemagoer 1,317
19 wiktextract 1,177
20 ViperMonkey 1,118
21 pdfsyntax 1,012
22 Construct 1,000
23 guessit 911

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?