Real-time Streaming Pipeline with Kafka + Flink

Build a real-time event-processing pipeline with Kafka and Flink (or Spark Streaming). Senior-level data engineering.

Apache KafkaApache FlinkPython or JavaDocker Compose

About this project

Streaming is the senior data-engineer specialty. This project teaches Kafka (producer + consumer + topic management), Flink for stream processing (windowing, watermarks, state), and the operational tail (exactly-once semantics, dead-letter queues, monitoring lag). Build a real event source — simulated user clicks, NYC taxi rides, Twitter firehose — and run aggregations in real time.

Why build this in 2026?

AI use-cases need streaming data; senior data engineers with streaming experience are scarce.

What you'll ship

GitHub repo with docker-compose

Architecture diagram

Demo video showing streaming aggregation

Sign up to see the full project brief

Full deliverables, success criteria, and AI Career Tutor support — free.

You'll unlock:Complete project brief, AI tutor that knows this project, and progress tracking when you start.

Skills you'll practice

kafkasparkpythondistributed systems