Event-Driven Architecture (2) — Change data capture

Architech
4 min readJun 15, 2021

The reason why I decided to write about the topic of change data capture (CDC) is somehow tightly coupled with the story of event-driven architecture. In my experience, this question pops up frequently. How to pull data from non-event systems and how to transform non-event systems (data) to event-based systems is something that every organization that starts this transformation to an event-centric organization asks itself at the very beginning.

This happens when you have a big (possibly old) core monolithic system that doesn’t have event streaming capabilities, which is slow and expensive to change. Being slow and expensive means that it is difficult to add new features, integrate with your new services or add new value to your customers. And you want to keep a competitive edge and improve on product delivery — shorten time to market, improve on agility and autonomy of your delivery teams — so you end up looking at new architectures and technologies that can help you out in your quest.

We all know that customer expectations and superior user experience are the main drivers of change. We need to respond to this faster, more efficiently, and less costly. Event-driven architecture concepts and principles will bring those capabilities to our organization, but not without some trade-offs.

The Monolith

Usually, this is your core legacy system that has been around for some time for which you have an army of developers, engineers, and business analysts servicing this system. This system is proven in battle, packs a mean infrastructure, and gets the job done. Amen. But it is expensive to maintain, difficult to improve, and almost impossible to be a source of innovation in your organization.

Monolith seen here in his natural environment
Photo by Hulki Okan Tabak on Unsplash

Several patterns describe how to move away from a monolithic architecture into a microservice one. When moving to an event-driven one you begin with a question: how can I extract and publish events from my monolithic system into the open? The main goal here being you want to make your data open for other systems (microservices) to be able to use them and build your new features around them. In doing so you lay foundations for modern decoupled architecture. And if you succeed everybody will applaud you.

CDC tool being used as a means to pull data from your databases can be used in several ways, depending on the way your application works — on how the database keeps a record of your domain entities (EDA works hand-in-hand with DDD and it is modeled to provide decoupling between Bounded Contexts).

Simple CDC — very rare to find in natural habitat, has one characteristic, its database structure is simple and the entire entity is in one table. If your CDC tool emits changes in the database they can effectively be interpreted as commands/events on your entity.

simple CDC

CDC and stream processing

Usually, you have a situation that your domain entity (let say, customer, as being a common domain) data spans several tables, and business logic changes records in those tables in some fashion (usually no one knows how). That means that the CDC tool will emit those changes as they happen to send out rows from tables as they are changed by some backend process. In this situation, you need to know how your application works and how it writes to your database so that you can capture them and transform them into meaningful events.

each table publishes one topic

CDC tool captures changes in the core system database and published those changes to streaming integration solution to CDC topics — e.g. each topic streams changes from one table. The stream processing component consumes those streams and creates events from streamed data. Stream processing components can be microservice, Kafka streams, Flink, Spark… The result of the stream processing component is an event published on a streaming integration solution.

The stream processing component is in charge to consume those streams of data and combine them into the business-relevant event — domain event. This component can be developed as a microservice, Flink job, Spark, or other stream processing technology.

Complex CDC solutions would involve stream processing from various sources as well as CDC topics. This is the case when your domain events are derived from entities of different bounded contexts (e.g. microservices). These “other” systems must have event streaming capabilities or have one of the CDC techniques applied.

CDC tool captures changes in a core system database and published those changes to streaming integration solution as CDC topics — e.g. each topic streams changes from one table. The stream processing component consumes those streams with the addition of streams from other systems and creates events from streamed data.

Stream processing components can be microservice, Kafka streams, Flink, Spark… The result of the stream processing component is an event published on a streaming integration solution.

This type of CDC architecture (or landscape) is very similar to CEP — complex event processing, as it requires that you process events “in-flight”.

Whichever pattern you require will depend on your use case, database technology, and of course your event streaming capability. In the end, your implementation can differ in pattern and architecture. If so, feel free to share!

--

--

Architech

IT professional - developing, managing, thinking. Startup founder and entrepreneur. https://www.architech.hr and https://blog.architech.hr