No More Silos: Integrating Databases and Apache Kafka

Session Number: 8035
Track: Big Data & Data Warehousing
Sub-Categorization: Data Engineering
Session Type: Tips, Techniques and Tuning
Primary Presenter: Robin Moffatt [Developer Advocate - Confluent]
Time: Jun 25, 2019 (08:50 AM - 09:50 AM)
Room: 310, Level 3

Speaker Bio: Robin is a developer advocate at Confluent, the company founded by the creators of Apache Kafka, as well as an Oracle Developer Champion and ACE Director Alumnus. His career has always involved data, from the old worlds of COBOL and Db2, through the worlds of Oracle and Hadoop, and into the current world with Kafka. His particular interests are analytics, systems architecture, performance testing, and optimization. He blogs at and (and previously and can be found tweeting grumpy geek thoughts as @rmoff. Outside of work he enjoys drinking good beer and eating fried breakfasts, although generally not at the same time.
Technologies or Products Used: Kafka

Session Summary for Attendees:  Companies new and old are all recognizing the importance of a low-latency, scalable, fault-tolerant data backbone in the form of the Apache Kafka streaming platform. With Kafka, developers can integrate multiple sources and systems, which enables low-latency analytics, event-driven architectures and the population of multiple downstream systems.

In this talk we’ll look at one of the most common integration requirements—connecting databases to Kafka. We’ll consider the concept that all data is a stream of events, including that which resides within a database. We’ll look at why we’d want to stream data from a database, including driving applications in Kafka from events upstream. We’ll discuss the different methods for connecting databases to Kafka, and the pros and cons of each. Techniques including Change-Data-Capture (CDC) and Kafka Connect will be covered, as will an exploration of the power of KSQL for performing transformations such as joins on the inbound data.

Attendees of this talk will learn:

- That all data is event streams; databases are just a materialized view of a stream of events.
- The best ways to integrate databases with Kafka.
- Anti-patterns to be aware of.
- The power of KSQL for transforming streams of data in Kafka.