evitaDB - Fast e-commerce database
logo
page-background

Histogram

Histograms serve a pivotal role in e-commerce parametrized filtering by visually representing the distribution of product attributes, enabling customers to adjust their search criteria efficiently. They facilitate a more interactive and precise filtering experience, allowing users to modify the range of properties like price or size based on actual item availability.

There are actually only a few use cases in e-commerce websites where histograms are used. The most common is the price histogram, which is used to filter products by price. You can see an example of such a histogram on the Booking.com website:

Booking.com price histogram filterBooking.com price histogram filter

It's a shame that the histogram isn't used more often, because it's a very useful tool for gaining insight into the distribution of product attributes with high cardinality values such as weight, height, width and so on.

The histogram data structure is optimized for frontend rendering. It contains the following fields:

  • min - the minimum value of the attribute in the current filter context
  • max - the maximum value of the attribute in the current filter context
  • overallCount - the number of elements whose attribute value falls into any of the buckets (it's basically a sum of all bucket occurrences)
  • buckets - an sorted array of buckets, each of which contains the following fields:
    • threshold - the minimum value of the attribute in the bucket, the maximum value is the threshold of the next bucket (or max for the last bucket)
    • occurrences - the number of elements whose attribute value falls into the bucket
    • requested:
      • contains true if the query didn't contain any attributeBetween or priceBetween constraints
      • contains true if the query contained attributeBetween or priceBetween constraint for particular attribute / price and the bucket threshold lies within the range (inclusive) of the constraint
      • contains false otherwise

Attribute histogram

argument:int!

the number of columns (buckets) in the histogram; number should be chosen so that the histogram fits well into the available space on the screen

argument:enum(STANDARD|OPTIMIZED)

The behavior of the histogram calculation - either STANDARD (default), where exactly the requested number of buckets is returned, or OPTIMIZED, where the number of columns is reduced when the data is sparse and there would be large gaps (empty buckets) between buckets. This results in more compact histograms that provide a better user experience.

argument:string+
one or more names of the entity attribute whose values will be used to generate the histograms
The
can be computed from any filterable attribute whose type is numeric. The histogram is computed only from the attributes of elements that match the current mandatory part of the filter. The interval related constraints - i.e. attributeBetween and priceBetween in the userFilter part are excluded for the sake of histogram calculation. If this weren't the case, the user narrowing the filtered range based on the histogram results would be driven into a narrower and narrower range and eventually into a dead end.

To demonstrate the use of the histogram, we will use the following example:

The simplified result looks like this:

The histogram result in JSON format is a bit more verbose, but it's still quite readable:

Attribute histogram contents optimization

During user testing, we found that histograms with scarce data are not very useful. Besides the fact that they don't look good, they are often harder to manipulate with the widget that controls the histogram and tries to stick to the bucket thresholds. Therefore, we have introduced a new histogram calculation mode - OPTIMIZED. In this mode, the histogram calculation algorithm tries to reduce the number of buckets when the data is sparse and there would be large gaps (empty buckets) between buckets. This results in more compact histograms that provide a better user experience.

To demonstrate the optimization of the histogram, we will use the following example:

The simplified result looks like this:

The optimized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, the number of buckets has been adjusted to fit the data, contrary to the default behavior.

Price histogram

argument:int!

the number of columns (buckets) in the histogram; number should be chosen so that the histogram fits well into the available space on the screen

argument:enum(STANDARD|OPTIMIZED)

The behavior of the histogram calculation - either STANDARD (default), where exactly the requested number of buckets is returned, or OPTIMIZED, where the number of columns is reduced when the data is sparse and there would be large gaps (empty buckets) between buckets. This results in more compact histograms that provide a better user experience.

The
is computed from the price for sale. The interval related constraints - i.e. attributeBetween and priceBetween in the userFilter part are excluded for the sake of histogram calculation. If this weren't the case, the user narrowing the filtered range based on the histogram results would be driven into a narrower and narrower range and eventually into a dead end.
The priceType requirement the source price property for the histogram computation. If no requirement, the histogram visualizes the price with tax.

To demonstrate the use of the histogram, we will use the following example:

The simplified result looks like this:

The histogram result in JSON format is a bit more verbose, but it's still quite readable:

Price histogram contents optimization

During user testing, we found that histograms with scarce data are not very useful. Besides the fact that they don't look good, they are often harder to manipulate with the widget that controls the histogram and tries to stick to the bucket thresholds. Therefore, we have introduced a new histogram calculation mode - OPTIMIZED. In this mode, the histogram calculation algorithm tries to reduce the number of buckets when the data is sparse and there would be large gaps (empty buckets) between buckets. This results in more compact histograms that provide a better user experience.

To demonstrate the optimization of the histogram, we will use the following example:

The simplified result looks like this:

The optimized histogram result in JSON format is a bit more verbose, but it's still quite readable:

As you can see, the number of buckets has been adjusted to fit the data, contrary to the default behavior.

Author: Ing. Jan Novotný

Date updated: 7.11.2023

Documentation Source