
The Tough Life with High Cardinality
Working with "one-to-many" references is very common in data models and presents a variety of problems that are not immediately obvious. If the database itself does not provide enough tools and expressive means to solve them, the application developer is forced to address these issues themselves, which usually leads to a significant increase in workload and performance problems. In version 2025.2
, evitaDB introduces new ways to work with these types of references, helping to address common scenarios in e-commerce applications.

Sorting by 1:N References
erDiagram Product { string id PK string name string description } Group { string id PK string name } ProductGroup { string productId FK string groupId FK int order } Product ||--o{ ProductGroup : "belongs to" Group ||--o{ ProductGroup : "aggregates"
If we were to solve this using a relational database like PostgreSQL, the SQL query might look like this:
In addition to the query complexity, it is clear at first glance what problems the application developer will have to solve. The query only handles the correct sorting and does not allow for retrieving other product properties. This would require an additional database query or wrapping it in an outer query that would allow fetching. To get the details of the groups the product belongs to, another SQL query would be needed, and then everything would need to be combined in the application logic.
In the web environment, such a problem is very common, so we strive to make things easier for developers. In evitaDB, this query would look much simpler:
References to Entities Organized in Tree Structures
The example we have been working with concerns relationships between entities that are not organized into tree structures. If we replaced the group with a product category, which is often organized hierarchically, it would make more sense to list products sorted logically according to the hierarchical structure and then secondarily by the sorting attribute within each category. The model might look like this:
erDiagram Product { int id PK string name } ProductCategory { int productId FK int categoryId FK int order } Category { int id PK int parentCategoryId FK string name } Product ||--o{ ProductCategory : "belongs to" Category ||--o{ ProductCategory : "classified in" Category ||--o{ Category : "parent of"
Even this query is very complicated to understand in relational database formulation. In NoSQL databases, however, we fare even worse – in the vast majority, recursive searching is not possible, so we have no choice but to solve it in the application logic.
The approach in evitaDB adheres to the declarative principle – the developer simply specifies the desired output without worrying about how it is achieved. That’s the database’s job. Our query for hierarchical structures in evitaDB would look like this:
The Relationship Between Traversing and Sorting by the Selected Reference
There are many situations where these alternative approaches might be useful.
Deep Sorting
Limiting the Scope of 1:N References Listings
For references with high cardinality, another problem arises when trying to list them within the main entity. A typical example is when, for instance, on the product detail page, we need to list all the groups the product belongs to, as well as all tags, parameters, related products, associated images, technical documentation, and more. All of these are examples of references with high cardinality. Many of them do not need to be modeled structurally and can be captured as unstructured JSON, but still, many candidates for modeling with references remain for various reasons.
In the case of a relational database, all sub-records with high cardinality must be merged into a single column output and then processed in the application logic. The merge can be into a simple structure (e.g., comma-separated):
Or into a more complex JSON structure, which allows passing larger amounts of structured data at once:
Again, in application logic, we need to handle the conversion of this information back into a format we can work with in the application. It is also clear that these transformations incur some performance cost both on the database and application logic sides. Unfortunately, the row-based relational approach doesn’t provide much better options in this case. NoSQL databases have an advantage here.
Queries in evitaDB do not suffer from this problem, as they can return entities loaded to any depth in a structured way. The issue arises when there is a risk of a large number of references being returned. Besides the fact that we might not be able to present such large amounts of data to the target user, we also risk unnecessary load on the database machine and the transmission of large amounts of data over the network (i.e., increased query latency). Therefore, it’s useful to limit the number of references loaded in such cases. In evitaDB, we would do this like this:
In many situations, we don’t even need to load the actual references – we just need information about their count or simply the fact that at least one such reference exists. This situation often occurs when we need to filter the references further – for example, in the product entity, check if there is at least one reference to a warehouse marked as a priority with a stock quantity greater than one. We would simply handle this like this (the GraphQL query variant is more representative here):
Spacing
The current world is driven by marketing needs, so it is common to have listings of articles/products and other entities interspersed with links to articles, advertising banners, and other elements that are unrelated to the entity in question. The same may be required at the level of references to other entities – for example, in the article detail page, in the section of similar articles, which are listed with pagination, we might want to leave space for an advertising banner on every second page.
For these purposes, spacing is used, which can also be applied to references listing. Our described example would be handled like this:
Summary
The examples for PostgreSQL and MongoDB were generated using LLM and may contain errors. However, they represent valid approaches to solving the given problem that a developer would likely explore at some stage of application development.