evitaDB - Fast e-commerce database
logo
page-background

Change data capture

Change data capture (CDC) is a design pattern used to track and capture changes made to schema and data in a database. evitaDB supports CDC through all its APIs, allowing developers to monitor and respond to data changes very easily in near real-time in their preferred programming language. This document explains how to implement CDC using our API.

The database maintains a so-called Write-Ahead Log (WAL) that records all changes made to the database. This log is used to ensure data integrity and durability, but it can also be leveraged to implement change data capture (CDC) functionality. Once the catalogue is switched to the ACTIVE (transactional) stage, clients can start consuming information about changes made to both the schema and the data in the catalogue.
There is also a special CDC available for the entire database engine that allows clients to monitor high-level operations such as catalogue creation, deletion, and other global events (for more details, consult the Control Engine chapter).
Change data capture is not available for catalogues in the WARMING_UP stage since the WAL is not being recorded during that phase. This phase is considered "introductory" and clients should not work (query) with the data in that phase anyway. Clients should wait until the catalogue reaches the ACTIVE stage and perceive all the data at that moment as a consistent snapshot of the first version of the catalogue.

Engine and catalogue-level CDCs cannot be combined into a single stream since they operate on different levels (engine vs. catalogue). Catalogue-level CDC is always tied to a particular catalogue (name). If you need to capture all changes across all catalogues, you need to subscribe to engine-level CDC and then for each catalogue separately to catalogue-level CDC. The engine-level CDC notifies about catalogue creation/deletion events, so clients can dynamically subscribe/unsubscribe to catalogue-level CDCs as catalogues are created/deleted.

The basic principle in all APIs is the same:

  1. clients define a predicate/condition that specifies which changes they are interested in,
  2. define a starting point in the form of a catalogue version from which they want to start receiving changes,
  3. and subscribe to the change stream.

From that point onwards, clients will receive notifications about all changes that match their criteria. The changes are delivered in the order they were made, ensuring that clients can process them sequentially. The second step is optional — if no starting version is specified, the change stream will start from the next version of the catalogue.

Subscription lifecycle

Once subscribed, the change stream remains active until one of the following occurs:

  1. the client explicitly cancels the subscription
  2. the client can't keep up with the rate of incoming changes (backpressure)
  3. the client throws an exception during processing
  4. the client doesn't react within a timeout
  5. the server shuts down or the catalogue is deleted
  6. the server doesn't react within a timeout
  7. the subscription TTL (time-to-live) expires - see configuration settings
As you can see, there are many reasons why a subscription may end. Therefore, clients should be prepared to handle such situations gracefully. The standard approach is to implement the AutoCloseable interface in your subscriber and re-establish the subscription in the close() method or schedule a re-establishment by another application service. Your subscriber should also track the last successfully processed version and index so that it can resume from the correct point when re-establishing the subscription. The criteria handle version and index as inclusive, so you should skip the first event after resumption if it matches the last processed version and index.

Hierarchy of mutations

Not all mutations operate on the same level, and some mutations may encapsulate others. For example, when an entity is upserted, it may contain multiple mutations within it (multiple attribute, associated data, price operations, etc.). The hierarchy of mutations is as follows:

When you don't specify any filtering criteria, you will receive all mutations in flattened form, i.e. you will receive all mutations regardless of their hierarchy. So, for example, an entity attribute upsert will be delivered once as part of the entity upsert mutation and once as a standalone attribute upsert mutation. In practice, a client usually wants either high-level information about entity changes (so only entity mutations) or very specific low-level changes (e.g. only changes to attributes of a particular name). The approach with a simple flattened stream that is filtered by a single predicate covers all these use cases very well, and it is very easy to understand and implement.

Author: Ing. Jan Novotný

Date updated: 21.10.2025

Documentation Source