SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Data Projects
-
30-Days-Of-Python
The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw
A free, beginner‑friendly programming challenge on GitHub, 30 Days of Python breaks down the process of learning Python into daily lessons and exercises that guide you step by step from the basics to more advanced concepts.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Scrapling
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
Project mention: Launch HN: Intuned (YC S22) – Build and run reliable browser automations as code | news.ycombinator.com | 2026-06-08What is the advantage of your product over having Codex generate a script using something like https://github.com/D4Vinci/Scrapling?
-
LLM / agentic frameworks: LangChain, LlamaIndex, LangGraph, AutoGen, MCP, RAG. (Fiddler and Razorpay both list these. "Hands-on counts, not just awareness," as Razorpay puts it.)
-
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
Pandas-AI: Talk to your dataframes in natural language.
-
Project mention: volnux VS Prefect - a user suggested alternative | libhunt.com/r/volnux | 2025-11-19
-
airbyte
Open-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both self-hosted and Cloud.
Project mention: Show HN: Airbyte Agents – context for agents across multiple data sources | news.ycombinator.com | 2026-05-05 -
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
That’s where Mage AI stood out. From the very first try to run it , it feels really easy and straight forward .
-
knowledge-repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
-
-
-
CKAN
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
Project mention: CKAN – an open-source DMS (data management system) | news.ycombinator.com | 2026-03-02 -
-
Project mention: Show HN: DataFlow,Turn raw data into high-quality LLM training datasets | news.ycombinator.com | 2026-03-16
-
-
preswald
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.
-
Project mention: LazyLLM, Easiest and laziest way for building multi-agent LLMs applications | news.ycombinator.com | 2025-11-05
-
-
-
-
PyPika
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
-
If you are interested in this topic, we have a fully feature colour science Python package that can of course render the visible spectrum: https://github.com/colour-science/colour?tab=readme-ov-file#...
Python Data discussion
Python Data related posts
-
Automatically Validate Python Packages
-
CocoIndex Review: Incremental RAG Engine for AI Agents
-
Pydantic vs msgspec vs validatedata: Why Your Validation Library Slows Down on Bad Data
-
Anthropic's 10 Finance Agents: A Buyer's Guide for Banks
-
Show HN: Airbyte Agents – context for agents across multiple data sources
-
Deep Dive into LlamaIndex's RAG Pipeline and Pinecone Vector Database Integration
-
Implementing a Centralized Metrics Layer with dbt and a Semantic Layer
-
A note from our sponsor - SaaSHub
www.saashub.com | 18 Jun 2026
Index
What are some of the best open-source Data projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | 30-Days-Of-Python | 64,710 |
| 2 | Scrapling | 64,713 |
| 3 | llama_index | 50,089 |
| 4 | pandas-ai | 23,587 |
| 5 | Prefect | 22,636 |
| 6 | airbyte | 21,487 |
| 7 | akshare | 20,305 |
| 8 | chinese-xinhua | 11,504 |
| 9 | Mage | 8,749 |
| 10 | knowledge-repo | 5,533 |
| 11 | dlt | 5,460 |
| 12 | superduper | 5,289 |
| 13 | CKAN | 5,053 |
| 14 | Mimesis | 4,813 |
| 15 | DataFlow | 5,047 |
| 16 | datasets | 4,570 |
| 17 | preswald | 4,284 |
| 18 | LazyLLM | 3,842 |
| 19 | docetl | 3,836 |
| 20 | TextRecognitionDataGenerator | 3,660 |
| 21 | pandas-datareader | 3,181 |
| 22 | PyPika | 2,917 |
| 23 | Colour | 2,600 |