Bitmap indexes
This topic describes how to create and manage a bitmap index, along with usage cases.
Introduction
A bitmap index is a special database index that uses bitmaps, which are an array of bits. A bit is always in one of two values: 0 and 1. Each bit in the bitmap corresponds to a single row in the table. The value of each bit depends upon the value of the corresponding row.
A bitmap index can help improve the query performance on a given column. If a query's filter conditions match a prefix index, it can significantly improve query efficiency and quickly return results. However, a table can only have one prefix index. If the query's filter conditions do not include the prefix of the prefix index, a bitmap index can be created for that column to improve query efficiency.
How to design a bitmap index to accelerate queries
The primary considerations for choosing a bitmap index are column cardinality and the filtering effect of the bitmap index on queries. Contrary to popular belief, bitmap indexes in StarRocks are more suitable for queries on columns with high cardinality and queries on the combination of multiple low cardinality columns. Furthermore, bitmap indexes need to effectively filter out data, potentially filtering out at least 999/1000 of the data, thereby reducing the amount of Page data read.
For queries on a single low cardinality column, the filtering effect of a bitmap index is poor, because too many rows need to be read and scattered across multiple Pages.
You need to consider the cost of loading data when evaluating the filtering effect of bitmap indexes on queries. In StarRocks, underlying data is organized and loaded by Pages (default size is 64K). The cost of loading data includes the time to load Pages from disk, decompress Pages, and decode.
However, excessively high cardinality can also cause issues such as occupying more disk space, and because bitmap indexes need to be constructed during data loading, frequent data loading can affect loading performance.
Additionally, the overhead of loading bitmap indexes during queries should be considered. During a query, bitmap indexes are loaded on demand, and the larger the value of number of column values involved in query conditions/cardinality x bitmap index
, the greater the overhead of loading bitmap indexes during queries.
To determine the appropriate cardinality and query conditions for bitmap indexes, it is recommended to refer to the Performance test on bitmap index in this topic to conduct performance tests. You can use actual business data and queries to create bitmap indexes on columns of different cardinalities, to analyze the filtering effect of bitmap indexes on queries (at least filtering out 999/1000 of the data), the disk space usage, the impact on loading performance, and the overhead of loading bitmap indexes during queries.
StarRocks has a built-in adaptive selection mechanism for bitmap indexes. If a bitmap index fails to accelerate queries, for example, if it cannot filter out many Pages, or the overhead of loading bitmap indexes during queries is high, it will not be used during the query, so query performance will not be significantly affected.