Kafka, Kafka Connect, SQL Server CDC, and Event Hub Streaming on Kubernetes: A 3-Part Guide
In today’s data driven world, real time data processing is becoming increasingly important for businesses that are looking to gain timely insights from their data. This 3 part guide will take you through the process of setting up a powerful streaming pipeline using Kafka, Kafka connect running in Kubernetes using Strimzi, SQL server (using CDC) and Azure Event hub as an alternative to an in-cluster Kafka instance.
Prerequisites
- Kubernetes Cluster — In this example, we’ll use the Kubernetes cluster that ships with Docker desktop.
- Basic understanding of Kafka, Kafka Connect and CDC.
- Basic knowledge of Azure Event Hub.
In part 1, we will cover the process to use Strimzi to run a Kafka and Kafka Connect on a Kubernetes cluster.
In part 2, we will delve deeper into snapshotting and streaming data changes from SQL server into Kafka topics using our In-cluster Kafka instance, CDC, Kafka connect and a Debezium SQL server connector.
In part 3, we’ll jump into the world of Azure Event Hub which will be a managed Kafka instance housing our topics instead of an in-cluster Kafka instance.
The full code for this series can be found here.
Links to each part,
Part 1: Running Kafka and Kafka Connect on Kubernetes with Strimzi
Part 2: Running SQL Server and Streaming Changes with CDC
Part 3: Using Azure Event Hub for Stream Processing and Integration