Data model
This article describes the structure of the database entity (alternative to record in a relational database or document in some NoSQL databases). Understanding the entity structure is crucial for working with evitaDB.
Terms used in this document
- facet
- Facet is a property of entity used for quick filtering of entities by the user. It is displayed as a checkbox in the filter bar or as a slider in case of a large number of different numerical values. Facets help the customer to narrow down the current category list, manufacturer list, or full-text search results. It would be hard for the customer to go through dozens of pages of results and probably would be forced to look for some subcategory or find a better search phrase. It's frustrating for the user, and facets could make this process easier. With a few clicks, the user can narrow down the results to relevant facets. The key aspect here is to provide enough information and require the user to go to the most relevant facet combinations. It's very helpful to disregard facets as soon as they would cause no results to be returned, or even to inform the user that selecting a particular facet would narrow the results to very few records and that his freedom of choice will be severely limited.
- facet group
- Facet group is used to group facets of the same type. The facet group controls the mechanism of facet filtering. It means that facet groups allow to define whether facets in the group will be combined with boolean OR, AND relations when used in filtering. It also allows to define how this facet group will be combined with other facet groups in the same query (i.e. AND, OR, NOT). This type of Boolean logic affects the facet statistics calculation and is the crucial part of facet evaluation.
The evitaDB data model consists of three layers:
- catalog
- entity collection
- entity (data)
Catalog
Collection
Collections in evitaDB are not isolated and entities in them can be related to entities in different collections. Currently, the relationships are only unidirectional.
Although evitaDB requires a schema for each entity type, it supports automatic evolution if you allow it. If you don't specify otherwise, evitaDB learns about entity attributes, their data types and all necessary relations as you add new data. Once the attributes, associated data or other contours of the entity are known, they are enforced by evitaDB. This mechanism is somewhat similar to the schema-less approach, but results in a much more consistent data store.
Entity
Minimal entity definition consists of:
Entity type
Primary key
We chose this library for two main reasons:
- it allows us to store int arrays in a more compressed format than a simple array of primitive integers,
- and contains the algorithms for fast boolean operations on these integer sets
- it uses twice as much memory
- it's much slower for Boolean operations
Since evitaDB is an in-memory database, we expect that the number of entities will not exceed two billion.
Hierarchical placement
Entities can be organized hierarchically. This means that an entity can refer to a single parent entity and can be referred to by multiple child entities. A hierarchy always consists of entities of the same type.
Each entity must be part of at most one hierarchy (tree).
Most of the e-commerce systems organize their products in hierarchical category system. The categories are the source for the catalog menus and when the user examines the category content, he/she usually sees products in the entire category subtree of the category. That's why hierarchies are directly supported by evitaDB.
Attributes (unique, filterable, sortable, localized)
The entity attributes allow you to define a set of data to be fetched in bulk along with the entity body. Each attribute schema can be marked as filterable to allow filtering by it, or sortable to allow sorting by it.
The attributes are automatically filterable / sortable when they are automatically added by the automatic schema evolution mechanism to make the "I don't care" approach to the schema easy and "just working". However, filterable or sortable attributes require indexes that are kept entirely in memory by evitaDB, and this approach leads to a waste of resources. Therefore, we recommend to use the schema-first approach and to mark as filterable / sortable only those attributes that are really used for filtering / sorting.
Attributes are also recommended to be used for frequently used data that accompanies the entity (for example "name". "perex", "main motive"), even if you don't necessarily need it for filtering/sorting purposes. evitaDB stores and fetches all attributes in a single block, so keeping this frequently used data in attributes reduces the overall I/O.
Localized attributes
Data types in attributes
Sortable attribute compounds
Sortable attribute compounds are not inserted into an entity, but are automatically created by the database when an entity is inserted and maintain the index for the defined entity / reference attribute values. The attribute compounds can only be used to sort the entities in the same way as the attribute.
Associated data
Localized associated data
References
References are unidirectional in nature, which means that if the reference points from entity A to entity B, it does not mean that entity B automatically references entity A. It is possible to set up a bi-directional reference by creating a so-called "reflected reference" on the other entity type and identifying the original reference that should be reflected.
Prices
Prices are specific to very few entity types (usually products, shipping methods, and so on), but since correct price calculation is a very complex and important part of e-commerce systems and highly affects the performance of entity filtering and sorting, they deserve first-class support in the entity model. It is quite common in B2B systems that a single product has dozens of prices assigned to different customers.
The price has the following structure:
- int priceId
Contains the identification of the price in the external systems. This ID is expected to be used for synchronization of the price in relation to the primary source of the prices. The price with the same ID must be unique within the same entity. The prices with the same ID in multiple entities should represent the same price in terms of other values - such as validity, currency, price list, the price itself, and all other properties. These values can be different for a limited time (for example, the prices of Entity A and Entity B can be the same, but Entity A is updated in a different session/transaction and at a different time than Entity B).
- String priceList
Contains the identification of the price list in the external system. Every price must refer to a price list. The price list identification can refer to another Evita entity or contain any external price list identification (e.g. ID or unique name of the price list in the external system). A single entity is expected to have a single price for the price list unless `validity' is specified. In other words, it makes no sense to have multiple concurrently valid prices for the same entity that are rooted in the same price list.
- Currency currency
- Identification of the currency. Three-letter form according to ISO 4217.
- int innerRecordId
Some special products (such as master products or product sets) may contain prices of all "child" products so that the aggregating product can display them in certain views of the product. In this case, it is necessary to distinguish the projected prices of the subordinate products in the product that represents them.
- BigDecimal priceWithoutTax
Price without tax.
- BigDecimal priceWithTax
Price with tax.
- BigDecimal taxRate
Tax percentage (i.e. for 19% it'll be 19.00)
- DateTimeRangevalidity
Date and time interval for which the price is valid (inclusive).
- boolean indexed
Controls whether the price is subject to filtering/sorting logic, non-indexed prices will be fetched along with the entity, but will not be considered when evaluating the query. These prices can be used for "informational" prices, such as the reference price (the crossed out price often found on e-commerce sites as the "usual price"), but are not used as the "price for sale".