[Go to site: main page, start]

Python web-scraping-python

Open-source Python projects categorized as web-scraping-python

Top 13 Python web-scraping-python Projects

web-scraping-python
  1. Scrapling

    🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

    Project mention: Launch HN: Intuned (YC S22) – Build and run reliable browser automations as code | news.ycombinator.com | 2026-06-08

    What is the advantage of your product over having Codex generate a script using something like https://github.com/D4Vinci/Scrapling?

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    Project mention: Why everyone is talking about loop-engineering and how is it changing agentic ai workflows? Claude Code and Web Scraping examples | dev.to | 2026-06-10

    Think about what a mature scraping project already contains. There is a schema that every item must validate against. There are field coverage thresholds, because a run where only 60% of products have prices is a failed run no matter what the exit code says. There are expected item counts, error rate ceilings, and finish reason checks. In the Scrapy world we even have a dedicated framework for all of this, and I wrote about it earlier this year in my post on giving spidey-senses to your spiders with Spidermon. Here is the reframe that I cannot stop thinking about: a Spidermon monitor suite is a rubric. Our community spent a decade encoding "what good data looks like" into machine-checkable criteria, because silent failure is scraping's oldest enemy, the spider that runs green for three weeks while quietly shipping garbage. We built the evaluator long before we had a generator capable of acting on its feedback. Every other field adopting loop engineering has to invent its definition of done from scratch. We just have to plug ours in. The missing piece was never detection. It was what happens after detection, which until now was a human reading an alert, opening the site, sighing at the redesign, and rewriting selectors. Models like Fable 5, which Anthropic says can work autonomously far longer than any previous Claude model, are finally good enough to sit inside that gap. John Rooney saw early versions of this pattern when he built scraping agents for 30 days, and the lesson that stuck with me from his series is that agents fail not from lack of capability but from lack of structure around them. Loops are that structure.

  4. SeleniumBase

    📊 Python's all-in-one framework for web crawling, scraping, and testing. Supports pytest. CDP Mode provides stealth. Includes many tools.

    Project mention: Scraping German Rental Price Data – Part I: Whole Lotta Captchas | news.ycombinator.com | 2025-07-29

    Not yet! But it's on my list to try out next after giving SeleniumBase[1] a chance.

    [1] https://github.com/seleniumbase/SeleniumBase

  5. botasaurus

    The All in One Framework to Build Undefeatable Scrapers

  6. agentql

    AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale. Includes REST API, Python and JavaScript SDKs, browser debugger.

  7. wreq-python

    An ergonomic Python HTTP Client with TLS fingerprint

    Project mention: Hybrid scraping: The architecture for the modern web | dev.to | 2026-02-25

    Python’s requests package, which uses urllib from the standard library, has a very distinctive TLS fingerprint, containing ciphers (amongst other things) that aren’t seen in a browser. This makes it very easy to spot. Both rnet, and other options such as curl-cffi, are able to send a TLS fingerprint similar to that of a browser. This reduces the chances of our request being blocked.

  8. scrapfly-scrapers

    Scalable Python web scraping scripts for +40 popular domains

    Project mention: 5 Best Free Web Scraping Tools in 2026 | dev.to | 2026-04-30

    Instead of trying to bypass blocks manually, you send a request to their API and let it deal with proxies, headers, browser rendering, and fingerprinting. Scrapfly solves this by managing the infrastructure for you.

  9. tls-requests

    TLS Requests is a powerful Python library for secure HTTP requests, offering browser-like TLS client, fingerprinting, anti-bot page bypass, and high performance.

  10. datacrawl

    A simple and easy to use web crawler for Python

  11. CobWeb-lnx

    CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.

  12. compodio

    Putting the podcast in community radio

  13. gli99

    Web scraper for gifcities.org

  14. pygrounds

    Python web-scraping API for Newgrounds

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python web-scraping-python discussion

Log in or Post with

Python web-scraping-python related posts

  • How to write and publish a Python package to PyPI

    3 projects | dev.to | 11 May 2026
  • Contributing to Larger Open Source Project - Scrapy

    4 projects | dev.to | 6 Dec 2025
  • AgentQL MCP Server: Structured Web Data for Claude, Cursor, Windsurf, and more

    1 project | dev.to | 12 Mar 2025
  • AgentQL Launch Week Recap—make the web AI-ready

    1 project | dev.to | 18 Nov 2024
  • Stealth Mode—Enhanced Bot Detection Evasion—Launch week day 3

    1 project | dev.to | 12 Nov 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 14 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source web-scraping-python projects in Python? This list will help you:

# Project Stars
1 Scrapling 62,873
2 Scrapy 62,224
3 SeleniumBase 12,780
4 botasaurus 4,797
5 agentql 1,397
6 wreq-python 1,373
7 scrapfly-scrapers 1,003
8 tls-requests 154
9 datacrawl 64
10 CobWeb-lnx 39
11 compodio 6
12 gli99 4
13 pygrounds 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?