This article is contributed. See the original author and article here.
SQL server is a batch oriented service, just like any DBMS. One program has to query it in order to get the result – so to have real time analytics we would have to change this batch behavior to a streaming /event/push behavior.
On the other side, we have azure with Azure Event Hubs, Stream analytics and Streaming datasets on Power BI. They work pretty well together if the data source is a streaming, producing events (something we can have with an custom code application or some Open Source solution with Kafka.
The challenge here was to find something to make the bridge between SQL Server and Event Hubs.
After some time looking for solutions on the internet, I found this Docs <reference> page with an approach to bring CDC data from Postgres to Event Hubs.
The solution proposed to use an Apache Kafka based solution named Debezium <reference>. This was so far unknown to me. With Debezium, one solution can monitor a CDC source and generate the events on a Kafka enables hub.
The initial setup seems a bit complex and it do not run on windows, so I had to find easier ways to enable this deployment.
Have you event thought about Docker?
It works pretty well on windows 10 or any build after (Build number…18363?) because of WSL2 <reference>. If you still never tried WSL2, I hightly recommend it.
After a few tries, I installs a WSL2 – Ubuntu 20.04 based and with that I could install Docker Desktop on my Windows 10 Surface Book :smiling_face_with_smiling_eyes:
Note: Docker Desktop works well for your development env. For production, you can install it on ACI or AKS <reference, reference>
With that said, Debezium had a Docker Image available on hub.docker.com, named Debeziumserver.
Debezium Server is a lightweight version that do not have Kafka installed. The image has already installes the SQL Server connector and can output the events directly to Event Hubs.
To install and configure the docker, I ran only this line of code
docker run -d -it –name SampleDebezium -v $PWD/conf:/debezium/conf -v $PWD/data:/debezium/data debezium/server
My SQL Server is hosted on Azure and to create a lab enviorement, I created a single table and ebabled CDC on it
<script>
Debezium will query the latest changed rows on CDC based on how it is setted on the configuration file and create the events on Event Hub
<sample configuration file>
Event Hub:
I created a sample event hub to hold this experiment.
But how to consume the events?? -> Stream analytics
With Stream analytics I consumes the events and created 2 outputs.
Azure Data Lake output
I’m saving every event on a Azure data lake for doing later analytics. This approach even enables to move from a OLTP on premises database to a modern Data Warehouse architecture with a incremental CDC based load. More need to be worked and tested here – Maybe a next post
Power BI Output
The Power BI output created a real tine dataset that can be consumes from a Power BI Dashboard.
Difference between real time dataset and Direct Query <reference>
Make it work and test
I created a script who negates data on my CDC enables table to test this method. The script will <describe the script>
<Add the script from GitHub>
Lets assure the Debezium server container is running
Show query output on Steam Analytics
Create the tiles on a Power BI Dashboard
See it running
<Add a .gif with real time enables dashboard and the SQL Server creating data>
Known problem
It seems when a big transaction (like a update on 200k rows) happens, Debezium stops complaining the message was bicker than the maximum size defined from Event Hubs. Maybe there is a way to break it on smaller messages, maybe it is how it works, so our system shoun only run OLTP (row by row) workload.
Acknowledgment
Special thanks to Goran, who guided me on my first steps on Stream Analytics
<Add a gif >
# References
# Architecture https://debezium.io/documentation/reference/architecture.html#_debezium_server
# Docker image with example https://hub.docker.com/r/debezium/server
# Debezium Server https://debezium.io/documentation/reference/1.4/operations/debezium-server.html
# SQL Server connector https://debezium.io/documentation/reference/connectors/sqlserver.html
# Azure Event hubs connection https://debezium.io/documentation/reference/1.4/operations/debezium-server.html#_azure_event_hubs
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.
Recent Comments