[Go to site: main page, start]

Stream Processing

Open-source projects categorized as Stream Processing

Top 23 Stream Processing Open-Source Projects

Stream Processing
  1. pathway

    Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

    Project mention: GitHub's Fake Star Economy | news.ycombinator.com | 2026-04-20
  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. mediapipe

    Cross-platform, customizable ML solutions for live and streaming media.

    Project mention: Building a Jedi-Style Hand Gesture Interface with TensorFlow.js: Control Your Browser Without Touching Anything | dev.to | 2026-02-09

    In this tutorial, I'll show you how to build a production-ready hand gesture control system using TensorFlow.js and MediaPipe Hands that transforms any webcam into a precision input device.

  4. vector

    A high-performance observability data pipeline.

    Project mention: We Cut Log Costs by 35% Using Vector 0.30 and Loki 3.0: Lessons from a 3-Month Tuning | dev.to | 2026-05-04

    We evaluated three alternatives: ClickHouse for log storage, Fluent Bit for log collection, and the Vector (https://github.com/vectordotdev/vector) + Loki (https://github.com/grafana/loki) stack. ClickHouse had great query performance but required manual index management, which would add operational overhead. Fluent Bit was lightweight but lacked the transform capabilities we needed to mask PII and drop low-value logs. Vector and Loki stood out: Vector is a Rust-based agent with 1/10th the memory footprint of Filebeat, and Loki is designed for cost-efficient log storage with a query model that aligns with how our team actually debugs (using labels, not full-text search).

  5. awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  6. redpanda

    Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!

    Project mention: Top Open-Source Data Engineering Tools- Unravelling the Best in 2026 | dev.to | 2025-12-10

    Redpanda

  7. awesome-system-design

    A curated list of awesome System Design (A.K.A. Distributed Systems) resources.

    Project mention: You might not be a junior anymore, but are you thinking like a senior dev? These 10 mental models reveal the difference. | dev.to | 2025-06-26

    Awesome System Design

  8. watermill

    Building event-driven applications the easy way in Go.

    Project mention: How I built Upple: A modern uptime monitor with Go and React | dev.to | 2026-01-02

    I'm using Watermill for the event bus with Redis Streams as the backend. Redis Streams has this concept of consumer groups; consumers in the same group split messages between them, while different groups each receive all messages.

  9. risingwave

    Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real time, at scale.

    Project mention: Building a Real-Time Crypto Arbitrage Monitoring System | dev.to | 2025-11-24

    In crypto markets, these price differences, or spreads, appear and vanish in milliseconds. If your data pipeline takes five seconds to process a batch of prices, the opportunity is already gone. This post demonstrates how to use RisingWave—an open-source real-time event streaming platform—to detect arbitrage opportunities with sub-second latency using standard SQL.

  10. connect

    Fancy stream processing made operationally mundane (by redpanda-data)

  11. fluent-bit

    Fast and Lightweight Logs, Metrics and Traces processor for Linux, BSD, OSX and Windows

    Project mention: Benchmark: Vector 0.40 vs. Fluent Bit 3.0 Log Processing Throughput for 100k Logs/Second | dev.to | 2026-04-28

    18.7k

  12. Faust

    Python Stream Processing

  13. Hazelcast

    Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.

  14. materialize

    The live data layer for apps and AI agents. Create up-to-the-second views into your business, just using SQL (by MaterializeInc)

    Project mention: ANN v3: 200ms p99 query latency over 100B vectors | news.ycombinator.com | 2026-01-25

    I agree our sample may not be representative but we try to stay focused on the current and next crop of tpuf customers. So far "CI prohibits network access during tests" just hasn't come up as a pain point for any of them, but as I mentioned in another comment [0], we're definitely keeping an open mind about introducing an offline dev experience.

    At my last company an engineer spent a year implementing Bazel [0][1] only to have it ripped out after they left [2] due to the maintenance burden. You might say it was a little bit of a hassle. :)

    [0]: https://news.ycombinator.com/item?id=46758156

    [1]: https://github.com/MaterializeInc/materialize/pull/24243

    [2]: https://github.com/MaterializeInc/materialize/pull/31006

    [3]: https://github.com/MaterializeInc/materialize/pull/33895

  15. hudi

    Upserts, Deletes And Incremental Processing on Big Data.

    Project mention: Top Open-Source Data Engineering Tools- Unravelling the Best in 2026 | dev.to | 2025-12-10

    Apache Hudi

  16. river

    🌊 Online machine learning in Python

  17. faststream

    FastStream is an asynchronous Python framework for building event-driven applications. It brings together message broker integration, dependency injection, validation, testing utilities, and AsyncAPI documentation generation in a single toolkit

    Project mention: FastStream 0.7: MQTT support – in-memory tests, AsyncAPI generation and more | news.ycombinator.com | 2026-06-01
  18. fluvio

    🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.

  19. danfojs

    Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

  20. arroyo

    Distributed stream processing engine in Rust

  21. Memgraph

    High-performance open-source in-memory graph database for GraphRAG, AI memory, agentic AI, and real-time graph analytics. Cypher-compatible, built in C++.

    Project mention: CI/CD Auto-Remediation: The Complete Guide for SRE and Platform Teams (2026) | dev.to | 2026-05-11

    Auto-remediating into a worse state. The classic failure is auto-scaling a service to handle elevated error rates that are themselves caused by a downstream dependency. The service scales, hammers the dependency harder, and the dependency collapses. Fix: never auto-remediate without dependency-graph awareness. Aurora uses Memgraph for this; HolmesGPT uses its toolset structure; pure-L1 stacks should require manual escalation when the failure crosses service boundaries.

  22. peerdb

    Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage

    Project mention: Postgres and ClickHouse forming the default data stack for AI | news.ycombinator.com | 2025-12-27

    You should try PeerDB, it was acquired by ClickHouse for exactly this use-case - Fast, simple Postgres replication to ClickHouse. https://github.com/PeerDB-io/peerdb

    In ClickHouse Cloud, you have ClickPipes which is a simpler/managed manifesation of PeerDB https://clickhouse.com/cloud/clickpipes/postgres-cdc-connect...

  23. awesome-streaming

    a curated list of awesome streaming frameworks, applications, etc

    Project mention: Streaming | news.ycombinator.com | 2025-09-25
  24. numaflow

    Kubernetes-native platform to run massively parallel data/streaming jobs

    Project mention: Rewriting Numaflow (for AI), an open-source stream processing platform, in Rust | news.ycombinator.com | 2025-08-18
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Stream Processing discussion

Log in or Post with

Stream Processing related posts

  • Bytewax: Stream processing library built using Python and Rust

    1 project | news.ycombinator.com | 22 May 2026
  • Benchmark: Vector 0.40 vs. Fluent Bit 3.0 Log Processing Throughput for 100k Logs/Second

    3 projects | dev.to | 28 Apr 2026
  • Pushing and Pulling: Three Reactivity Algorithms

    3 projects | news.ycombinator.com | 8 Mar 2026
  • Building a Real-Time Crypto Arbitrage Monitoring System

    1 project | dev.to | 24 Nov 2025
  • Composeable stream processing: reactive dataflow graphs in Python

    1 project | news.ycombinator.com | 12 Oct 2025
  • Build a Self-Hosted Apache Iceberg Lakehouse in Minutes with RisingWave

    4 projects | dev.to | 9 Oct 2025
  • Streaming

    1 project | news.ycombinator.com | 25 Sep 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 22 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Stream Processing projects? This list will help you:

# Project Stars
1 pathway 62,902
2 mediapipe 35,690
3 vector 22,062
4 awesome-bigdata 14,452
5 redpanda 12,227
6 awesome-system-design 12,213
7 watermill 9,762
8 risingwave 9,088
9 connect 8,684
10 fluent-bit 7,936
11 Faust 6,823
12 Hazelcast 6,572
13 materialize 6,314
14 hudi 6,175
15 river 5,846
16 faststream 5,239
17 fluvio 5,234
18 danfojs 5,049
19 arroyo 4,938
20 Memgraph 4,174
21 peerdb 3,154
22 awesome-streaming 2,987
23 numaflow 2,714

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Rust is
the 3rd most popular programming language
based on number of references?