A schema is the logical representation of a catalog that specifies the types of entities that can be stored and
the relationships between them. It allows you to maintain the consistency of your data and is very useful
for automatic generation of the web APIs on top of it.
evitaDB internally maintains a schema for each entity collection / catalog,
although it supports a relaxed approach, where the schema is automatically built according to data
inserted into the database.
The schema is not only crucial for maintaining data consistency, but is also a key source for web API schema
generation. It allows us to create Open API and GraphQL schemas. If you
pay close attention to the schema definition, you'll be rewarded with nice, understandable, and self-documented APIs.
Every single piece of information in the schema affects the way the web APIs look. For example, relation cardinality
(zero or one, exactly one, zero or more, one or more) affects whether the API marks the relation as optional, returns
a single value/object, or returns an array of them. Filterable attributes are propagated to the documented query
language blocks, while non-filterable attributes are not. The data types of the attributes affect which query
constraints can be used in relation to this very attribute, and so on. The documentation you write in the evitaDB schema
is propagated to all your APIs. You can read more about this projection in the dedicated Web API chapters of the
documentation.
Mutations and versioning
The schema can only be changed by what are called mutations. While this is a rather cumbersome approach, it has some
big advantages for the system:
mutation represents an isolated change to the schema - this means that the client making the schema change
only sends deltas to the server, which saves a lot of network traffic and also implies server-side logic that doesn't
need to resolve deltas internally
mutation is directly used as a WAL entry - the mutation
represents an atomic operation in the transactional log that is distributed across the cluster, and it also
represents a place where conflict resolution takes place (if the server receives similar mutations from two
parallel sessions, it easily decides whether to throw a concurrent change exception - if the mutations are equal,
there is no conflict; if they are different, the first mutation is accepted and the second is rejected with an
exception)
The schema is versioned - each time a schema mutation is performed, its version number is incremented by one. If you
have two schema instances on the client side, you can easily tell if they're the same by comparing their version
number, and if not, which one is newer.
Hopefully not. We're aware that writing mutations is cumbersome, and provide better support in our drivers. The client
drivers wrap the immutable schemas inside the builder objects, so you can just call alter methods on them and
the builder will generate the list of mutations at the end. See the example.
However, if you want to use evitaDB on a platform that is not yet supported and covered by a specific client driver,
you have to work directly with our web APIs that only accept mutations, and you have no other options than to write
the mutations directly or to write your own client driver. But you can open source it and help the community. Let us
know about it!
The name validation logic and reserved words are present in the class
.
There is also a special property called nameVariants in the schema of each named object. It contains variants
of the object name in different "developer" notations such as camelCase, PascalCase, snake_case and so on. See
.
for a complete listing.
List of mutations related to catalog
Top level mutations:
Within ModifyCatalogSchemaMutation you can use mutations:
Global attribute schema has the same structure as attribute schema except for one additional
characteristic. A global attribute can be made uniqueGlobally, which means that values of such an attribute must be
unique across all entities and entity types in the entire catalog.
Well, it is useful for entity URL that we naturally want to be unique among all entities in the catalog. The global
unique attribute allows us to ask evitaDB for an entity with a specific value without knowing its type in advance.
This solves the use case when a new request arrives in your application and you need to check if there is an entity
that matches it (no matter if it's a product, category, brand, group or whatever types you have in your project).
A global attribute can also be used as a "dictionary definition" for an attribute that is used in multiple entity
collections, and we want to make sure it's named and described the same in all of them. An entity collection cannot
define an attribute with the same name as the global attribute. It can only "use" the global attribute with that name
and thus share its complete definition.
number to a newly inserted entity.
The primary key always starts with 1 and is incremented by 1. evitaDB guarantees its uniqueness within the same
entity type. The primary keys generated in this way are optimal for binary operations in the data structures used.
List of mutations related to primary key
Within ModifyEntitySchemaMutation you can use mutation:
Evolution
We recommend the schema-first approach, but there are cases where you don't want to bother with the schema and just want
to insert and query the data (e.g. rapid prototyping). When a new catalog is created, it is set up
in "auto evolution" mode, where the schema adapts to the data on first insertion. If you want to control the schema
strictly, you have to limit the evolution by changing the default schema. In strict mode, evitaDB throws an exception
if the input data violates the schema.
You still need to create entity collections manually, but after that you can immediately insert
your data and the schema will be built accordingly. The existing schemas will still be validated on each entity
insertion/update - you will not be allowed to store the same attribute as a number type the first time and as a string
the next time. The first use will set up the schema, which must be respected from that moment on.
If the first entity has its primary key, evitaDB expects all entities to have their primary key set when inserting.
If the first entity has its primary key set to NULL, evitaDB will generate primary keys for you and will reject
external primary keys. New attribute schemas are implicitly created as nullable, filterable and non-array data types
as sortable. This means that the client is immediately able to filter/sort on almost anything, but the database itself
will consume a lot of resources. The references will be created as indexed but not faceted.
There are several partial lax modes between strict and fully automatic evolution mode - see
for details.
For example - you can strictly control the entire schema, except for new locale or currency definitions, which are
allowed to be added automatically on first use.
List of mutations related to evolution mode
Within ModifyEntitySchemaMutation you can use mutations:
Locales and currencies
The schema specifies a list of allowed currencies and locales. We assume that the list of allowed currencies / locales
will be relatively small (units, max lower tens of them) and if the system knows them in advance, it can generate enums
for each of them in a web APIs. This helps developers to write queries with auto-completion. There is another positive
effect. E-commerce systems don't often extend the list of used currencies or locales (because there are usually a lot
of manual operations involved), and having the allowed set guarded by the system eliminates the possibility of inserting
invalid prices or localizations by mistake.
The price lists are closer to "data" than locales or currencies. The set of price lists is expected to change very
often, and their numbers can reach high cardinality (thousands, tens of thousands). It wouldn't be practical to generate
enumeration values for them and change the Web API schemas every time a price list is added or removed.
List of mutations related to locales & currencies
Within ModifyEntitySchemaMutation you can use mutations:
Hierarchy placement
When hierarchy placement is enabled, entities of this type can form a tree structure. Each entity can have a maximum
of one parent node and zero or more child entities. Neither the depth of the tree nor the number of siblings at each
level is limited.
Enabling hierarchy placement implies the creation of a new
for the involved
entity type. When another entity references a hierarchy entity and the reference is marked as indexed, the special
is created for each hierarchical entity. This index will
hold reduced attribute and price indices of the referencing entity, allowing quick evaluation of
withinHierarchy filter conditions.
Orphan hierarchy nodes
The typical problem associated with creating a tree structure is the order in which nodes are attached to it. In
order to have a consistent tree, one should start from the root nodes and gradually descend along the axis of their
children. This isn't always easy to do when we need to copy an existing tree to an external system (for scripting
purposes, it's much easier and more performance-effective to index in batch using the natural order of records). Similar
situation is when the intermediate tree node needs to be removed, but its children do not. We can force developers to
rewire children to different parents before removing their parent, but they often don't have direct control over the
order of operations and can't easily do that.
That's why evitaDB recognizes so-called orphan hierarchy nodes. An orphan node is a node that declares itself to be
a child of a parent node with a certain primary key that evitaDB doesn't know yet (or the orphan node itself). Orphan
nodes do not participate in the evaluation of queries on hierarchical structures,
but are present in the index. If a node of a referenced primary key is appended to the main hierarchy tree, the
orphan nodes (sub-trees) are also appended. In this way, the hierarchy tree eventually becomes consistent.
List of mutations related to hierarchy placement
Within ModifyEntitySchemaMutation you can use mutation:
Prices
When prices are enabled, entities of this type can have a set of prices associated with them and can be
filtered and sorted by price constraints. Single entity
can have zero or more prices (the system is designed for situation when entity has tens or hundreds of prices attached
to it).For each combination of priceList and currency there is a special
.
List of mutations related to hierarchy placement
Within ModifyEntitySchemaMutation you can use mutation:
Attributes
An entity type can have zero or more attributes. The system is designed for situation when entity has tens of
attributes. You should pay attention to the number of filterable / sortable / unique attributes. There is a
separate instance of
for each filterable
attribute, for each
sortable attribute and
or for each
unique attribute. Attributes that are neither filterable / sortable / unique don't consume operating memory.
Attribute schema can be marked as localized, meaning that it only makes sense in a specific
Attribute schema can be made deprecated, which will be propagated to generated web API documentation.
List of mutations related to attribute
Within ModifyEntitySchemaMutation you can use mutation:
Default value
An attribute may have a default value defined. The value is used when a new entity is created and no value has been
assigned to a particular attribute. There is no other situation where the default value matters.
Allowed decimal places
The allowed decimal places setting is an optimization that allows rich numeric types (such
as
type, which is much more
compact and can be used for fast binary searches in array/bitset representation. The original rich format is still
present in an attribute container, but internally the database uses the primitive form when an attribute is part of is
part of filter or sort conditions.
If number cannot be converted to a compact form (for example, it has more digits in the fractional part than expected),
an exception is thrown and the entity update is refused.
Sortable attribute compounds
Sortable attribute compound is a virtual attribute composed of the values of several other attributes, which can only be
used for sorting. evitaDB requires a previously prepared sort index to be able to sort entities. This fact makes sorting
much faster than ad-hoc sorting by attribute value. Also, the sorting mechanism of evitaDB is somewhat different from
what you might be used to. If you sort entities by two attributes in an orderBy clause of the query, evitaDB sorts
them first by the first attribute (if present) and then by the second (but only those where the first attribute is
missing). If two entities have the same value of the first attribute, they are not sorted by the second attribute, but
by the primary key (in ascending order). If we want to use fast "pre-sorted" indexes, there is no other way to do it,
because the secondary order would not be known until a query time.
This default sorting behavior by multiple attributes is not always desirable, so evitaDB allows you to define a sortable
attribute compound, which is a virtual attribute composed of the values of several other attributes. evitaDB also allows
you to specify the order of the "pre-sorting" behavior (ascending/descending) for each of these attributes, and also
the behavior for NULL values (first/last) if the attribute is completely missing in the entity. The sortable attribute
compound is then used in the orderBy clause of the query instead of specifying the multiple individual attributes to
achieve the expected sorting behavior while maintaining the speed of the "pre-sorted" indexes.
Sortable attribute compound schema can be made deprecated, which will be propagated to generated web API documentation.
List of mutations related to sortable attribute compound
Within ModifyEntitySchemaMutation you can use mutation:
The sortable attribute compound schema is described by:
Associated data
An entity type may have zero or more associated data. The system is designed for the situation when an entity has
tens of associated data.
Associated data schema can be marked as localized, meaning that it only makes sense in a specific
Associated data schema can be made deprecated, which will be propagated to generated web API documentation.
List of mutations related to associated data
Within ModifyEntitySchemaMutation you can use mutation:
Reference
An entity type may have zero or more references. References can be managed or unmanaged. The managed references refer
to entities within the same catalog and can be checked for consistency by evitaDB. The non-managed references refer
to entities that are managed by external systems outside the scope of evitaDB. An entity can have a self-reference
that refers to the same entity type. An entity type can have several references to the same entity type.
References can have zero or more attributes that apply only to a particular "link" between these two entity instances.
Global attribute cannot be used as a reference attribute. Otherwise, the same rules apply
for reference attributes as for regular entity attributes.
References are unidirectional in nature, which means that if the reference points from entity A to entity B, it does
not mean that entity B automatically references entity A. It is possible to set up a bi-directional reference by creating
a so-called "reflected reference" on the other entity type and identifying the original reference that should be reflected.
The reflected reference may or may not inherit attributes from the original reference, and it may also define its own
separate attributes. This can be described by the following ERD diagram:
erDiagram
A ||--o{ A_to_B : references
B ||--o{ A_to_B : references
A_to_B {
string A1
string A2
}
B ||--o{ B_to_A : references
A ||--o{ B_to_A : references
B_to_A {
string A1
string B2
}
Reflected references are automatically created, updated, and removed when the original reference is manipulated. It also
works the other way around - when the reflected reference is manipulated, the original reference is updated.
There is a subtle difference between the original reference and the reflected reference. The original reference can
exist even if the referenced entity does not (yet) exist (the reference is orphaned). On the other hand, when you create
a reflected reference, the referenced entity must exist. This is because the reflected reference immediately creates
the original reference, and the original reference must have a valid target. This behaviour is needed to maintain
consistency when moving entities between different scopes that treat original and reflected references
differently.
If the reference contains an attribute that is not defined on the other side, and the reference is created - the missing
attribute on the other side is created with its default value (if no such default value is defined, an exception is thrown).
When another entity references an entity and the reference is marked as indexed, the special
is created for each referenced entity. This index will
hold reduced attribute and price indices of the referencing entity, allowing quick evaluation of
referencedEntityHaving filter conditions and
referenceProperty sorting.
If the reference is marked as faceted, the special
is created for
the entity type. This index contains optimized data structures for facet summary
computation. All reference instances of a given type are then inserted into the facet reference index (there is no
way to exclude a reference from indexing in the facet reference index). References can (but don't have to) be organized
into facet groups that refer to a managed or non-managed entity type.
Each reference schema has a certain cardinality. The cardinality describes the expected number of relations of this
type. In evitaDB we define only one-way relations from the perspective of the entity. We follow the ERD modeling
standards. Cardinality affects the design of the Web API schemas
(returning only single references or arrays) and also helps us to protect the consistency of the data so that it
conforms to the creator's mental model.
List of mutations related to reference
Within ModifyEntitySchemaMutation you can use mutation:
The ModifyReferenceAttributeSchemaMutation expect nested attribute mutation.
Scopes
Scopes are separate areas of memory where entity indexes are stored. Scopes are used to separate live data from archived
data. Scopes are used to handle so-called "soft deletes" - the application can choose between a hard delete and
archiving the entity, which simply moves the entity to the archive scope. The reasons for this feature are explained in
the dedicated blog post.
By default, archived entities have no indexes other than the primary key index. This is because archived entities are
not normally queried and are only looked up by their primary key. By not maintaining the indexes of archived entities,
we save memory and CPU resources. There may be cases where you want to query the archived entities and therefore you
have full control over which indexes are maintained in the archive scope when you define the entity schema. Note that
the more indexes you maintain, the more memory and CPU resources you will consume as the list of archived entities grows.
Changes in reference behavior
When you move an entity from one scope to another, the original references are retained, while the reflected references
are removed if either of the following conditions is not met
the reflected reference schema is not marked as indexed in the target scope
the primary reference schema (i.e. the original reference being reflected) is not marked as indexed in the target scope.
Reflected references are something that is maintained by the evitaDB engine, and it requires appropriate indexes to be
present in the target scope in order to work. By default, the archive scope does not maintain any indexes other than
the primary key and a few others explicitly specified by you in the entity schema.
Therefore, the reflected references are usually removed when the entity is moved to the archive scope. The engine can
recreate them if the entity is moved back to the live scope where appropriate indexes exist.