evitaDB - Fast e-commerce database
logo
page-background

Histogram

Histograms serve a pivotal role in e-commerce parametrized filtering by visually representing the distribution of product attributes, enabling customers to adjust their search criteria efficiently. They facilitate a more interactive and precise filtering experience, allowing users to modify the range of properties like price or size based on actual item availability.

There are actually only a few use cases in e-commerce websites where histograms are used. The most common is the price histogram, which is used to filter products by price. You can see an example of such a histogram on the Booking.com website:

Booking.com price histogram filterBooking.com price histogram filter

It's a shame that the histogram isn't used more often, because it's a very useful tool for gaining insight into the distribution of product attributes with high cardinality values such as weight, height, width and so on.

The histogram data structure is optimized for frontend rendering. It contains the following fields:

  • min - the minimum value of the attribute in the current filter context
  • max - the maximum value of the attribute in the current filter context
  • overallCount - the number of elements whose attribute value falls into any of the buckets (it's basically a sum of all bucket occurrences)
  • buckets - an sorted array of buckets, each of which contains the following fields:
    • threshold - the minimum value of the attribute in the bucket, the maximum value is the threshold of the next bucket (or max for the last bucket)
    • occurrences - the number of elements whose attribute value falls into the bucket
    • relativeFrequency - a value used for visualizing bucket height in UI (0-100 scale):
      • For standard histograms: percentage of total occurrences, calculated as (occurrences / overallCount) * 100
      • For equalized histograms: normalized value density that considers both occurrences and bucket width:
        1. Raw frequency is calculated as occurrences * (totalRange / bucketWidth) - this rewards buckets with many occurrences packed into narrow ranges
        2. Values are then normalized to sum to 100 across all buckets
        3. Empty buckets always have relativeFrequency = 0
    • requested:
      • contains true if the query didn't contain any attributeBetween or priceBetween constraints
      • contains true if the query contained attributeBetween or priceBetween constraint for particular attribute / price and the bucket threshold lies within the range (inclusive) of the constraint
      • contains false otherwise

Attribute histogram

argument:int!

the number of columns (buckets) in the histogram; number should be chosen so that the histogram fits well into the available space on the screen

argument:enum(STANDARD|OPTIMIZED|EQUALIZED|EQUALIZED_OPTIMIZED)

The behavior of the histogram calculation:

  • STANDARD (default): Returns exactly the requested number of buckets with equal-width intervals across the value range.
  • OPTIMIZED: Returns fewer buckets when data is sparse to avoid large gaps (empty buckets).
  • EQUALIZED: Returns exactly the requested number of buckets, but positions bucket boundaries based on cumulative frequency distribution so each bucket covers approximately equal portion of total records. This provides better user experience when data is heavily skewed.
  • EQUALIZED_OPTIMIZED: Combines EQUALIZED bucketing with optimization to reduce empty buckets.
argument:string+
one or more names of the entity attribute whose values will be used to generate the histograms
The
can be computed from any filterable attribute whose type is numeric. The histogram is computed only from the attributes of elements that match the current mandatory part of the filter. Range selections on attributes placed inside the userFilter container — both attributeBetween and histogramHaving — are excluded from the attribute-histogram baseline so the slider does not contract under its own handle as the user drags it. Facet selections (facetHaving) and the price range (priceBetween) remain applied, so the histogram reflects the range of attribute values actually reachable under the user's current facet and price picks. The rationale and a worked example are covered in Baseline relaxation below.

To demonstrate the use of the histogram, we will use the following example:

The simplified result looks like this:

The histogram result in JSON format is a bit more verbose, but it's still quite readable:

Attribute histogram contents optimization

During user testing, we found that histograms with scarce data are not very useful. Besides the fact that they don't look good, they are often harder to manipulate with the widget that controls the histogram and tries to stick to the bucket thresholds. Therefore, we have introduced a new histogram calculation mode - OPTIMIZED. In this mode, the histogram calculation algorithm tries to reduce the number of buckets when the data is sparse and there would be large gaps (empty buckets) between buckets. This results in more compact histograms that provide a better user experience.

To demonstrate the optimization of the histogram, we will use the following example:

The simplified result looks like this:

The optimized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, the number of buckets has been adjusted to fit the data, contrary to the default behavior.

Attribute histogram equalization

Standard histograms use equal-width buckets across the entire value range. This works well for uniformly distributed data but can be problematic when data is heavily skewed. For example, if 90% of products have width between 10-50 cm and only 10% have width between 50-500 cm, equal-width buckets would cram most products into the first few buckets while leaving many empty buckets in the upper range.

The EQUALIZED behavior solves this by positioning bucket boundaries based on cumulative frequency distribution. Instead of dividing the value range into equal intervals, it divides the records into approximately equal groups. Each bucket then covers roughly the same number of items, providing a more balanced and informative histogram.
This technique is inspired by histogram equalization in image processing, adapted for filter slider UX. The algorithm:
  1. Calculates the total weight (sum of all record counts)
  2. Calculates cumulative frequency for each unique value
  3. Positions bucket boundaries at points where cumulative frequency crosses threshold (i/bucketCount)
  4. Counts actual occurrences in each resulting bucket

To demonstrate equalized histogram, we will use the following example:

The simplified result looks like this:

The equalized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, unlike standard histograms where bucket widths are equal, equalized histograms adjust bucket widths to distribute records more evenly. This makes the histogram more useful for filtering when data has a skewed distribution.

Price histogram

argument:int!

the number of columns (buckets) in the histogram; number should be chosen so that the histogram fits well into the available space on the screen

argument:enum(STANDARD|OPTIMIZED|EQUALIZED|EQUALIZED_OPTIMIZED)

The behavior of the histogram calculation:

  • STANDARD (default): Returns exactly the requested number of buckets with equal-width intervals across the value range.
  • OPTIMIZED: Returns fewer buckets when data is sparse to avoid large gaps (empty buckets).
  • EQUALIZED: Returns exactly the requested number of buckets, but positions bucket boundaries based on cumulative frequency distribution so each bucket covers approximately equal portion of total records. This provides better user experience when data is heavily skewed.
  • EQUALIZED_OPTIMIZED: Combines EQUALIZED bucketing with optimization to reduce empty buckets.
The
is computed from the price for sale. Only priceBetween placed inside userFilter is excluded from the price-histogram baseline so the price slider does not contract under its own handle as the user drags it. Attribute range sliders (attributeBetween, histogramHaving) and facet selections (facetHaving) remain applied, so the price histogram reflects the prices actually reachable under the user's current attribute range and facet picks.
The priceType requirement the source price property for the histogram computation. If no requirement, the histogram visualizes the price with tax.

To demonstrate the use of the histogram, we will use the following example:

The simplified result looks like this:

The histogram result in JSON format is a bit more verbose, but it's still quite readable:

Price histogram contents optimization

During user testing, we found that histograms with scarce data are not very useful. Besides the fact that they don't look good, they are often harder to manipulate with the widget that controls the histogram and tries to stick to the bucket thresholds. Therefore, we have introduced a new histogram calculation mode - OPTIMIZED. In this mode, the histogram calculation algorithm tries to reduce the number of buckets when the data is sparse and there would be large gaps (empty buckets) between buckets. This results in more compact histograms that provide a better user experience.

To demonstrate the optimization of the histogram, we will use the following example:

The simplified result looks like this:

The optimized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, the number of buckets has been adjusted to fit the data, contrary to the default behavior.

Price histogram equalization

Just as with attribute histograms, standard price histograms use equal-width buckets which can be problematic for skewed price distributions. For example, in a marketplace where most items cost $10-$50 but a few luxury items cost $500-$5000, equal-width buckets would waste slider space on the expensive (but sparse) end.

The EQUALIZED behavior for price histograms positions bucket boundaries based on cumulative frequency distribution, so each bucket covers approximately the same number of products. This provides a better filtering experience, especially for e-commerce catalogs with diverse price ranges.

To demonstrate equalized price histogram, we will use the following example:

The simplified result looks like this:

The equalized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, the bucket boundaries are positioned to distribute products more evenly across the slider range.

Baseline relaxation — sliders don't contract under their own handles

Every histogram answers a "what-if" question: what range of values would still be reachable if I let go of this slider and moved it to the extremes? A histogram whose [min, max] shrank every time the user dragged the slider inward would trap the user in a collapsing range — each drag would make the next drag have less room, and returning to a wider range would be impossible without resetting the slider to its full extent. To avoid this, every histogram's [min, max] baseline must hide the user's own range picks while still honouring picks made on other filter surfaces (facet buttons, the price slider, etc.).

How evitaDB applies the relaxation

evitaDB classifies every child of userFilter into one of three mutually exclusive filter surfaces:
  1. Attribute range slidersattributeBetween and histogramHaving. These drive attribute histograms, both on plain entity attributes and on reference-level histograms.
  2. Facet selectionsfacetHaving. These drive the facet summary and its impact calculations.
  3. Price rangepriceBetween. This drives the price histogram.
When an extra-result projection (attribute histogram, facet summary impact, price histogram) is computed, evitaDB peels away only the surface that projection belongs to and leaves the other two applied. The main entity page returned by the query is still narrowed by all three surfaces — the relaxation applies strictly to the [min, max] spans and bucket distributions of the extra-result projections.

Worked example

Suppose the user is browsing Product and has made three independent picks:
and the query also requests attributeHistogram(20, "height", "width"), priceHistogram(20), and a facet summary with IMPACT. evitaDB computes four baselines in one pass:
Self-computationWhat the baseline hidesWhat the baseline keeps applied
height histogramevery attribute range slider — attributeBetween("height", …) and every other attributeBetween or histogramHaving in the same userFilterfacetHaving("brand", …), priceBetween(100, 500)
width histogramthe same — every attribute range slider is peeled for any attribute histogram in the queryfacetHaving("brand", …), priceBetween(100, 500)
facet impact for other brandsevery facetHaving selectionattributeBetween("height", …), priceBetween(100, 500)
price histogrampriceBetween(100, 500)facetHaving("brand", …), attributeBetween("height", …)
This also means that adding a second slider on the same filter surface does not contract the first one: if the query contains both attributeBetween("height", 50, 120) and attributeBetween("width", 10, 40), each attribute histogram is computed with both range sliders peeled, so neither slider contracts the other's [min, max] as the user drags.
Pick the userFilter child that matches where the slider lives — each one is recognised by evitaDB as a range carrier and is peeled from the appropriate histogram baseline:
Slider lives on …Recommended userFilter child
a plain entity attribute (Product.width, Product.height, …)attributeBetween
a reference-level histogram (e.g. parameterValues.height on Product)histogramHaving — the first-class carrier for reference histograms; also disambiguates between multiple histograms on the same reference
the price for salepriceBetween
a facet selectionfacetHaving
Plain referenceHaving is not accepted inside userFilter — it has no slider semantics and would not participate in baseline relaxation. Use histogramHaving for slider carriers on references.

Author: Ing. Jan Novotný

Date updated: 7.11.2023

Documentation Source