Data types

The article gives an introduction to data types in EvitaDB query language, including basic and complex types, and provides code examples to demonstrate their usage.

This document lists all data types supported by evitaDB that can be used in attributes or associated data for storing client relevant information.

There are two categories of data types:

simple data types that can be used both for attributes and associated data
complex data types that can be used only for associated data

Simple data types

evitaDB data types are limited to following list:

String, formatted as "string"
Byte, formatted as 5
Short, formatted as 5
Integer, formatted as 5
Long, formatted as 5
Boolean, formatted as true
Character, formatted as 'c'
BigDecimal, formatted as 1.124
OffsetDateTime, formatted as 2021-01-01T00:00:00+01:00
LocalDateTime, formatted as 2021-01-01T00:00:00
LocalDate, formatted as 2021-01-01
LocalTime, formatted as 00:00:00
DateTimeRange, formatted as [2021-01-01T00:00:00+01:00,2022-01-01T00:00:00+01:00]
BigDecimalNumberRange, formatted as [1.24,78]
LongNumberRange, formatted as [5,9]
IntegerNumberRange, formatted as [5,9]
ShortNumberRange, formatted as [5,9]
ByteNumberRange, formatted as [5,9]
Locale, formatted as language tag 'cs-CZ'
Currency, formatted as 'CZK'
UUID, formatted as 2fbbfcf2-d4bb-4db9-9658-acf1d287cbe9
Predecessor, formatted as 789

String

The string type is internally encoded with the character set UTF-8. evitaDB query language and other I/O methods of evitaDB implicitly use this encoding.

Dates and times

Although evitaDB supports local variants of the date time like LocalDateTime, it's always converted to OffsetDateTime using the evitaDB server system default timezone. You can control the default Java timezone in several ways. If your data is time zone specific, we recommend to work directly with the OffsetDateTime on the client side and be explicit about the offset from the first day.

Why do we internally use OffsetDateTime for time information?

Offset/time zone handling varies from database to database. We wanted to avoid setting the timezone in session or database configuration properties, as this mechanism is error-prone and impractical. Saving/loading date times with timezone information would be the best option, but we run into problems with parsing in certain environments, and only the date with offset information seems to be widely supported. The offset information is good enough for our case - it identifies a globally valid time that is known at the time the data value is stored.

DateTimeRange

The DateTimeRange represents a specific implementation of the defining from and to boundaries by the OffsetDateTime data types. The offset date times are written in the ISO format.

Range is written as:

when both boundaries are specified:

when a left boundary (since) is specified:

when a right boundary (until) is specified:

NumberRange

The NumberRange represents a specific implementation of the defining from and to boundaries by the Number data types. The supported number types are: Byte, Short, Integer, Long and BigDecimal.

Both boundaries of the number range must be of the same type - you cannot mix for example BigDecimal as lower bound and Byte as upper bound.

Range is written as:

when both boundaries are specified:

when a left boundary (since) is specified:

when a right boundary (until) is specified:

Predecessor

The

is a special data type used to define a single oriented linked list of entities of the same type. It represents a pointer to a previous entity in the list. The head element is a special case and is represented by the constant Predecessor#HEAD. The predecessor attribute can only be used in the attributes of an entity or its reference to another entity. It cannot be used to filter entities, but is very useful for sorting.

Motivation for linked lists in database sorting

The linked list is a very optimal data structure for sorting entities in a database that holds large amounts of data. Inserting a new element into a linked list is a constant time operation and requires only two updates:

inserting a new element into the list, pointing to an existing element as its predecessor
updating the original element pointing to the predecessor to point to the new element.

Moving (updating) an element or removing an existing element from a linked list is also a constant time operation, requiring similar two updates. The disadvantage of the linked list is its poor random access performance (get element at n-th index) and list traversal, which requires a lot of random access to different parts of memory. However, these disadvantages can be mitigated by keeping the linked list in the form of an array or binary tree of properly positioned primary keys.

There are alternatives approaches to this problem, but they all have their downsides. Some of them are summarized in the article "Keeping an ordered collection in PostgreSQL" by Nicolas Goy. We went through similar journey and concluded that the linked list is the least of all evils:

It doesn't require mass updates of surrounding entities or occasional "reshuffling".
it doesn't force the client logic to be complicated (and it plays well with the UI drag'n'drop repositioning flow)
it is very data efficient - it only requires a single
int
(4B) per single item in the list

Maintaining consistency of the linked list

Constructing a linked list could be a tricky process from a consistency point of view - especially in the warm-up phase, when you need to reconstruct the data from an external primary store. To be consistent at all times, you'd need to start with the entity that represents the head of the chain, then insert its successor, and vice versa. This is often not trivial, and if you have two predecessor attributes with different "order" for the same entities, it's absolutely impossible.

That's why we designed our linked list implementation to tolerate partial inconsistencies, and to converge to a consistent state as missing data is inserted. We support these inconsistency scenarios:

multiple head elements
multiple successor elements for a single predecessor
circular dependencies, where a head element points to an element in its tail

The sorting by an inconsistent predecessor attribute sorts the entities by the chains in the following order:

the chains starting with a head element (starting with the chain with most elements, to the chain with least elements)
the chains with elements sharing the same predecessor (starting with the chain with most elements, to the chain with least elements)
the chains with circular dependencies (starting with the chain with most elements, to the chain with least elements)

When the dependencies are fixed, the sort order will converge to the correct one. The will contain only a single chain of correctly ordered elements and will return true when the isConsistent() method is called on it.

The inconsistent state is also allowed in the transactional phase, but we recommend avoiding it and updating all the elements involved (in any order) within a single transaction, which will ensure that the linked list remains consistent for all other transactions.

Complex data types

The complex types are types that don't qualify as simple evitaDB types (or an array of simple evitaDB types). Complex types are stored in a

data structure that is intentionally similar to the JSON data structure so that it can be easily converted to JSON format and can also accept and store any valid JSON document.

Associated data may even contain array of complex objects. Such data will be automatically converted to an array of ComplexDataObject types - i.e. ComplexDataObject[].

The complex type can contain the properties of

any simple evitaDB types
any other complex types (additional inner POJOs)
generic Lists
generic Sets
generic Maps
any array of simple evitaDB types or complex types

Author: Ing. Jan Novotný

Date updated: 23.8.2023

Documentation Source