
Change data capture
Change data capture (CDC) is a design pattern used to track and capture changes made to schema and data in a database. evitaDB supports CDC through all its APIs, allowing developers to monitor and respond to data changes very easily in near real-time in their preferred programming language. This document explains how to implement CDC using our API.
Engine and catalogue-level CDCs cannot be combined into a single stream since they operate on different levels (engine vs. catalogue). Catalogue-level CDC is always tied to a particular catalogue (name). If you need to capture all changes across all catalogues, you need to subscribe to engine-level CDC and then for each catalogue separately to catalogue-level CDC. The engine-level CDC notifies about catalogue creation/deletion events, so clients can dynamically subscribe/unsubscribe to catalogue-level CDCs as catalogues are created/deleted.
The basic principle in all APIs is the same:
- clients define a predicate/condition that specifies which changes they are interested in,
- define a starting point in the form of a catalogue version from which they want to start receiving changes,
- and subscribe to the change stream.
From that point onwards, clients will receive notifications about all changes that match their criteria. The changes are delivered in the order they were made, ensuring that clients can process them sequentially. The second step is optional — if no starting version is specified, the change stream will start from the next version of the catalogue.
Hierarchy of mutations
Not all mutations operate on the same level and some mutations may encapsulate others. For example, when an entity is upserted, it may contain multiple mutations within it (multiple attribute, associated data, price operations etc.). The hierarchy of mutations is as follows:
- (complete listing, available in engine change capture)
- (complete listing, available in catalog schema change capture)
- (complete listing, available in catalog data change capture)
- (available in all change capture streams)
When you don't specify any filtering criteria, you will receive all mutations in flattened form, i.e. you will receive all mutations regardless of their hierarchy. So, for example, an entity attribute upsert will be delivered once as part of the entity upsert mutation and once as a standalone attribute upsert mutation. In practice, a client usually wants either high-level information about entity changes (so only entity mutations) or very specific low-level changes (e.g. only changes to attributes of a particular name). The approach with a simple flattened stream that is filtered by a single predicate covers all these use cases very well, and it is very easy to understand and implement.
