Skip to main content

Real-time Streaming Pipeline with Kafka + Flink

Build a real-time event-processing pipeline with Kafka and Flink (or Spark Streaming). Senior-level data engineering.

Apache KafkaApache FlinkPython or JavaDocker Compose

About this project

Streaming is the senior data-engineer specialty. This project teaches Kafka (producer + consumer + topic management), Flink for stream processing (windowing, watermarks, state), and the operational tail (exactly-once semantics, dead-letter queues, monitoring lag). Build a real event source — simulated user clicks, NYC taxi rides, Twitter firehose — and run aggregations in real time.

Why build this in 2026?

AI use-cases need streaming data; senior data engineers with streaming experience are scarce.

What you'll ship

  • GitHub repo with docker-compose
Architecture diagram
Demo video showing streaming aggregation

Sign up to see the full project brief

Full deliverables, success criteria, and AI Career Tutor support — free.

You'll unlock:Complete project brief, AI tutor that knows this project, and progress tracking when you start.

Skills you'll practice

kafkasparkpythondistributed systems