approx_top_k
Descriptionβ
Returns the top k most frequently occurring item values in an expr along with their approximate counts.
This function is supported from v3.0.
Syntaxβ
APPROX_TOP_K(<expr> [ , <k> [ , <counter_num> ] ] )
Argumentsβ
-
expr: An expression of STRING, BOOLEAN, DATE, DATETIME, or numeric type. -
k: An optional INTEGER literal greater than 0. Ifkis not specified, it defaults to5. The maximum value is100000. -
counter_num: An optional INTEGER literal greater than or equal tok, The larger thecounter_numis, the more accurate the result will be. However, this also comes with increased CPU and memory costs.- The maximum value is
100000. - If
counter_numis not specified, it defaults tomax(min(2 * k, 100), 100000).
- The maximum value is
Returnsβ
Results are returned as an ARRAY of type STRUCT, where each STRUCT contains an item field for the value (with its original input type) and a count field (of type BIGINT) with the approximate number of occurrences. The array is sorted by count descending.
The aggregate function returns the top k most frequently occurring item values in an expression expr along with their approximate counts. The error in each count may be up to 2.0 * numRows / counter_num where numRows is the total number of rows. Higher values of counter_num provide better accuracy at the cost of increased memory usage. Expressions that have fewer than counter_num distinct items will yield exact item counts. Results include NULL values as their own item in the results.
Examplesβ
Use data in the scores table as an example.
-- Calculate the score distribution of each subject.
MySQL > SELECT subject, APPROX_TOP_K(score) AS top_k FROM scores GROUP BY subject;
+---------+--------------------------------------------------------------------------------------------------------------------+
| subject | top_k |
+---------+--------------------------------------------------------------------------------------------------------------------+
| physics | [{"item":99,"count":2},{"item":null,"count":1},{"item":100,"count":1},{"item":85,"count":1},{"item":60,"count":1}] |
| english | [{"item":null,"count":1},{"item":92,"count":1},{"item":98,"count":1},{"item":100,"count":1},{"item":85,"count":1}] |
| NULL | [{"item":90,"count":1}] |
| math | [{"item":80,"count":2},{"item":null,"count":1},{"item":92,"count":1},{"item":95,"count":1},{"item":70,"count":1}] |
+---------+--------------------------------------------------------------------------------------------------------------------+
-- Calculate the score distribution of the math subject.
MySQL > SELECT subject, APPROX_TOP_K(score) AS top_k FROM scores WHERE subject IN ('math') GROUP BY subject;
+---------+-------------------------------------------------------------------------------------------------------------------+
| subject | top_k |
+---------+-------------------------------------------------------------------------------------------------------------------+
| math | [{"item":80,"count":2},{"item":null,"count":1},{"item":95,"count":1},{"item":92,"count":1},{"item":70,"count":1}] |
+---------+-------------------------------------------------------------------------------------------------------------------+
keywordβ
APPROX_TOP_K