📄️ any_value
Obtains an arbitrary row from each aggregated group. You can use this function to optimize a query that has a GROUP BY clause.
📄️ approx_count_distinct
Returns the approximate value of aggregate function similar to the result of COUNT(DISTINCT col).
📄️ approx_top_k
Returns the top k most frequently occurring item values in an expr along with their approximate counts.
📄️ avg
Returns the average value of selected fields.
📄️ bitmap
Here is a simple example to illustrate the usage of several aggregate functions in Bitmap. For detailed function definitions or more Bitmap functions, see bitmap-functions.
📄️ corr
Returns the Pearson correlation coefficient between two expressions. This function is supported from v2.5.10. It can also be used as a window function.
📄️ count
Returns the total number of rows specified by an expression.
📄️ count_if
Returns the number of records that meet the specified condition or 0 if no records satisfy the condition.
📄️ covar_pop
Returns the population covariance of two expressions. This function is supported from v2.5.10. It can also be used as a window function.
📄️ covar_samp
Returns the sample covariance of two expressions. This function is supported from v2.5.10. It can also be used as a window function.
📄️ ds_hll_count_distinct
Returns the approximate value of aggregate function similar to the result of COUNT(DISTINCT col). APPROXCOUNTDISTINCT(expr) is similar function.
📄️ group_concat
Concatenates non-null values from a group into a single string, with a sep argument, which is , by default if not specified. This function can be used to concatenate values from multiple rows of a column into one string.
📄️ grouping
Indicates whether a column is an aggregate column. If it is an aggregate column, 0 is returned. Otherwise, 1 is returned.
📄️ grouping_id
grouping_id is used to distinguish the grouping statistics results of the same grouping standard.
📄️ hll_raw_agg
This function is an aggregate function that is used to aggregate HLL fields. It returns an HLL value.
📄️ hll_union
Returns the concatenation of a set of HLL values.
📄️ hll_union_agg
HLL is an engineering implementation based on the HyperLogLog algorithm, which is used to save the intermediate results of HyperLogGog calculation process.
📄️ mann_whitney_u_test
Description
📄️ max
Returns the maximum value of the expr expression.
📄️ max_by
Returns the value of x associated with the maximum value of y.
📄️ min
Returns the minimum value of the expr expression.
📄️ min_by
Returns the value of x associated with the minimum value of y.
📄️ multi_distinct_count
Returns the total number of rows of the expr, equivalent to count(distinct expr).
📄️ multi_distinct_sum
Returns the sum of distinct values in expr, equivalent to sum(distinct expr).
📄️ percentile_approx
Returns the approximation of the pth percentile, where the value of p is between 0 and 1.
📄️ percentile_approx_weight
Returns the approximation of the pth percentile with weight. percentileapproxweight is a weighted version of PERCENTILE_APPROX, and allows users to specify a weight (a constant value or numeric column) for each input value.
📄️ percentile_cont
Computes the percentile value of expr with linear interpolation.
📄️ percentile_disc
Returns a percentile value based on a discrete distribution of the input column expr. If the exact percentile value cannot be found, this function returns the larger value between the two closest values.
📄️ percentile_disc_lc
Returns a percentile value based on a discrete distribution of the input column expr. Same behavior as percentiledisc. However, the implementation algorithm is different. percentiledisc needs to obtain all input data, and the memory consumed by merge sorting to obtain percentile values is the memory of all input data. On the other hand, percentiledisclc builds a hash table of key->count, so when the input cardinality is low, there is no obvious memory increase even if the input data size is large.
📄️ retention
Calculates the user retention rate within a specified period of time. This function accepts 1 to 31 conditions and evaluates whether each condition is true. If the condition evaluates to true, 1 is returned. Otherwise, 0 is returned. It eventually returns an array of 0 and 1. You can calculate the user retention rate based on this data.
📄️ std
Returns the standard deviation of an expression. Since v2.5.10, this function can also be used as a window function.
📄️ stddev,stddev_pop,std
Returns the population standard deviation of the expr expression. Since v2.5.10, this function can also be used as a window function.
📄️ stddev_samp
Returns the sample standard deviation of an expression. Since v2.5.10, this function can also be used as a window function.
📄️ sum
Returns the sum of non-null values for expr. You can use the DISTINCT keyword to compute the sum of distinct non-null values.
📄️ var_samp,variance_samp
Returns the sample variance of an expression. Since v2.5.10, this function can also be used as a window function.
📄️ variance,var_pop,variance_pop
Returns the population variance of an expression. Since v2.5.10, this function can also be used as a window function.
📄️ window_funnel
Searches for an event chain in a sliding window and calculates the maximum number of consecutive events in the event chain. This function is commonly used for analyzing conversion rate. It is supported from v2.3.