[Go to site: main page, start]

Python Data Validation

Open-source Python projects categorized as Data Validation

Top 23 Python Data Validation Projects

Data Validation
  1. cleanlab

    Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. jsonschema

    An implementation of the JSON Schema specification for Python

    Project mention: Framework de Tests Automatisés API avec Pytest: Tutoriel Pratique | dev.to | 2026-05-22
  4. pandera

    A light-weight, flexible, and expressive statistical data testing library

  5. deepchecks

    Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

  6. Cerberus

    Lightweight, extensible data validation library for Python (by pyeve)

  7. schema

    Schema validation just got Pythonic

  8. Schematics

    Python Data Structures for Humans™.

  9. soda-core

    Data Contracts engine for the modern data stack. https://www.soda.io

    Project mention: Show HN: Data contracts engine for the modern data stack | news.ycombinator.com | 2026-01-28
  10. voluptuous

    CONTRIBUTIONS ONLY: Voluptuous, despite the name, is a Python data validation library.

    Project mention: Programmers and software developers lost the plot on naming their tools | news.ycombinator.com | 2025-12-11

    When I told a co-worker about https://pypi.org/project/voluptuous/ he immediately searched for the name alone, then told us not to do the same.

  11. cleanvision

    Automatically find issues in image datasets and practice data-centric computer vision.

  12. dingo

    Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool (by MigoXLab)

    Project mention: Show HN: Dingo 1.9.0 released: With enhanced hallucination detection | news.ycombinator.com | 2025-07-31
  13. colander

    A serialization/deserialization/validation library for strings, mappings and lists.

  14. pointblank

    Data validation toolkit for assessing and monitoring data quality. (by posit-dev)

    Project mention: Pointblank: Data Validation That's Actually Beautiful | dev.to | 2026-03-18

    GitHub: github.com/posit-dev/pointblank (star the repo, file issues, contribute!)

  15. opendataeditor

    The Open Data Editor (ODE) is a no-code application to explore and validate tabular data in a simple way. Forever free and open source project powered by the Frictionless Framework.

  16. valideer

    Lightweight data validation and adaptation Python library.

  17. Validoopsie

    A simple and easy to use Data Validation library for Python.

  18. python-codicefiscale

    :it: :credit_card: italian fiscal codes encoding, decoding and validation - codifica, decodifica e validazione del Codice Fiscale italiano.

  19. snowflake-provisioning

    Snowflake Database, Schema, and Warehouse provisioning with Access Roles & Generating and Provisioning of Functional Roles & Snowflake Source Export, Snowflake cloning, and data tieout tool

  20. laravel-validation

    A PHP Laravel like validation for python language

  21. OpenDQV

    Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write — before data enters your pipeline.

    Project mention: OpenDQV – open-source data quality validation at the point of write | news.ycombinator.com | 2026-03-20
  22. data_check

    data and pipeline testing with and for SQL

  23. validatelite

    ValidateLite: A lightweight CLI for database schema validation and data quality checks. Ideal for CI/CD, ETL, and data pipelines.

    Project mention: DevLog #1 - ValidateLite: Building a Zero-Config Data Validation Tool | dev.to | 2025-08-09

    This data validation tool is built on a simple principle: "Cross-cloud ready, code-first, operational in 30 seconds." And it is open source: ValidateLite on GitHub.

  24. SmartExcelGuardian

    SmartExcelGuardian is a professional Python desktop application for Excel data cleanup, validation, and auditing. It automatically detects missing values, duplicates, type issues, and invalid formulas, applies heuristic scoring, conditional formatting, and auto-calculated Excel formulas, and export

    Project mention: SmartExcelGuardian: Open-source Excel data cleaning with heuristics and formulas | news.ycombinator.com | 2026-01-19
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Validation discussion

Log in or Post with

Python Data Validation related posts

  • Framework de Tests Automatisés API avec Pytest: Tutoriel Pratique

    1 project | dev.to | 22 May 2026
  • Detect, Defend, Prevail: Payments Fraud Detection using ML & Deepchecks

    1 project | dev.to | 13 Jan 2024
  • Deepchecks: Open-source ML testing and validation library

    1 project | news.ycombinator.com | 11 Sep 2023
  • Deepchecks' New Open Source is on Product Hunt, and Needs Your Help

    3 projects | /r/deeplearning | 18 Jun 2023
  • Do you think we need an open-source web scraping monitoring tool?

    2 projects | /r/webscraping | 6 May 2023
  • [D] Is accurately estimating image quality even possible?

    3 projects | /r/MachineLearning | 22 Apr 2023
  • Python: Data validation

    5 projects | dev.to | 20 Jan 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 23 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Data Validation projects in Python? This list will help you:

# Project Stars
1 cleanlab 11,515
2 jsonschema 4,952
3 pandera 4,381
4 deepchecks 4,025
5 Cerberus 3,285
6 schema 2,945
7 Schematics 2,589
8 soda-core 2,374
9 voluptuous 1,845
10 cleanvision 1,188
11 dingo 718
12 colander 463
13 pointblank 445
14 opendataeditor 306
15 valideer 261
16 Validoopsie 88
17 python-codicefiscale 87
18 snowflake-provisioning 50
19 laravel-validation 15
20 OpenDQV 10
21 data_check 5
22 validatelite 3
23 SmartExcelGuardian 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?