ποΈ Gather statistics for CBO
This topic describes the basic concept of StarRocks cost-based optimizer (CBO) and how to collect statistics for the CBO to select an optimal query plan. StarRocks 2.4 introduces histograms to gather accurate data distribution statistics.
ποΈ Synchronous materialized view
This topic describes how to create, use, and manage a synchronous materialized view (Rollup).
ποΈ Asynchronous materialized views
4 items
ποΈ Colocate Join
For shuffle join and broadcast join, if the join condition is met, the data rows of the two joining tables are merged into a single node to complete the join. Neither of these two join methods can avoid latency or overhead caused by data network transmission between nodes.
ποΈ Use Lateral Join for column-to-row conversion
Column-to-row conversion is a common operation in ETL processing. Lateral is a special Join keyword that can associate a row with an internal subquery or table function. By using Lateral in conjunction with unnest(), you can expand one row into multiple rows. For more information, see unnest.
ποΈ Query Cache
The query cache is a powerful feature of StarRocks that can greatly enhance the performance of aggregate queries. By storing the intermediate results of local aggregations in memory, the query cache can avoid unnecessary disk access and computation for new queries that are identical or similar to previous ones. With its query cache, StarRocks can deliver fast and accurate results for aggregate queries, saving time and resources and enabling better scalability. The query cache is especially useful for high-concurrency scenarios where many users run similar queries on large and complex data sets.
ποΈ Data Cache
From v3.1.7 and v3.2.3 onwards, StarRocks introduced Data Cache to accelerate queries in shared-data clusters, replacing File Cache in earlier versions. Data Cache loads data from remote storage in blocks (on the order of MBs) as needed, while File Cache loads entire data files each time in the background, regardless of how many data rows are actually needed.
ποΈ Computing the number of distinct values
2 items
ποΈ Sorted streaming aggregate
Common aggregation methods in database systems include hash aggregate and sort aggregate.
ποΈ Accelerate COUNT(DISTINCT) and Joins with AUTO INCREMENT and Global Dictionary
This topic describes how to accelerate COUNT(DISTINCT) calculation and Joins using AUTO INCREMENT columns and Global Dictionary.
ποΈ [Preview] Flat JSON
This topic introduces the basic concepts of the Flat JSON feature and how to use this feature.