跳到主要内容
版本:Candidate-3.4

array_sortby

对数组中的元素根据另外一个键值数组元素或者 Lambda 函数生成的键值数组元素进行升序排列。有关 Lambda 表达式的详细信息,参见 Lambda expression。该函数从 2.5 版本开始支持。

举例,有两个数组 a = [3,1,4],b = [7,5,6]。将 b 作为排序键,对 a 里的元素进行排序。

根据键值对关系,b 的元素 [7,5,6] 一一对应 a 的 元素[3,1,4]。

转换前:

数组第一个元素第二个元素第三个元素
a314
b756

转换后,b 按照升序排列为 [5,6,7],对应 a 的元素位置也进行相应调整,变为 [1,4,3]。

数组第一个元素第二个元素第三个元素
a143
b567

当参与排序的数组大于2个时候,该函数的作用是根据多个数组列(如 array1、array2、array3 等)的值对 array0 进行升序排列。排序规则如下:

  1. 首先比较 array1 的对应元素;
  2. 如果相同,则比较 array2 的对应元素;
  3. 依此类推,直到最后一个数组列。

注意:

  • 参与排序的数组元素必须是可排序类型或者 JSON 类型。
  • 所有参与排序的数组元素大小必须与原始数组保持一致(NULL 除外)。

示例说明:

给定以下四个数组:

array0 = [1, 2, 3, 4, 5]
array1 = [6, 5, 5, 5, 4]
array2 = ["d", "b", "a", "b", "4"]
array3 = ["2023-01-01", "2023-01-04", "2023-01-03", "2023-01-05", "2023-01-02"]

排序步骤:

  1. 比较 array1:
  • array1 排序后的索引顺序是 [4, 2, 3, 1, 0],因为 4 < 5 < 5 < 5 < 6。
  1. 对于 array1 中相同的元素 [5, 5, 5],比较 array2:
  • array2 中 [a, b, b] 排序后的顺序是 [2, 3, 4],因为 a < b = b。
  1. 对于 array1 和 array2 中相同的元素 [b, b],比较 array3:
  • array3 中 [2023-01-04, 2023-01-05] 排序后的顺序是 [2, 4],因为 2023-01-04 < 2023-01-05。

最终对数值array0的排序结果

array0:[5, 3, 2, 4, 1]

语法

array_sortby(array0, array1)
array_sortby(<lambda function>, array0 [, array1...])
array_sortby(array0, array1, [array2, array3...])
  • array_sortby(array0, array1)

    根据 array1 的键值数组元素对 array0 进行升序排序。

  • array_sortby(<lambda_function>, array0 [, array1...])

    根据 lambda_function 生成的键值数组元素,对 array0 进行升序排序。

  • array_sortby(array0, array1, [array2, array3...])

    根据多个数组列(array1、array2、array3 等)的值对 array0 进行升序排序。排序规则是:首先比较 array1 的对应元素,如果相同则比较 array2 的对应元素,依此类推,直到最后一个数组列

参数说明

  • array0:需要排序的数组,支持的数据类型为 ARRAY,或者 null。数组中的元素必须为可排序的元素。
  • array1:用于排序的键值数组,支持的数据类型为 ARRAY,或者 null
  • lambda_function:lambda 函数,用于生成排序键值数组。
  • array1, [array2, array3...]:用于排序的键值数组,支持的数据类型为 ARRAY,或者 null

返回值说明

返回的数据类型为 ARRAY。

注意事项

  • 只支持升序排序。
  • 如果需要降序排列,可以对排序后的结果,调用 reverse() 函数。
  • null 值会排在最前面。
  • 返回数组中的元素类型和输入数组中的元素类型一致,null 属性一致。
  • 如果用于排序的键值数组或表达式为 null,数据保持不变。
  • 排序涉及的所有数组的元素个数必须一致,否则返回报错。

示例

下面的示例使用如下数据表。

CREATE TABLE `test_array` (
`c1` int(11) NULL COMMENT "",
`c2` ARRAY<int(11)> NULL COMMENT "",
`c3` ARRAY<int(11)> NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`c1`)
PROPERTIES (
"replication_num" = "3",
"storage_format" = "DEFAULT",
"enable_persistent_index" = "false",
"compression" = "LZ4"
);

insert into test_array values
(1,[4,3,5],[82,1,4]),
(2,null,[23]),
(3,[4,2],[6,5]),
(4,null,null),
(5,[],[]),
(6,NULL,[]),
(7,[],null),
(8,[null,null],[3,6]),
(9,[432,21,23],[5,4,null]);

select * from test_array order by c1;
+------+-------------+------------+
| c1 | c2 | c3 |
+------+-------------+------------+
| 1 | [4,3,5] | [82,1,4] |
| 2 | NULL | [23] |
| 3 | [4,2] | [6,5] |
| 4 | NULL | NULL |
| 5 | [] | [] |
| 6 | NULL | [] |
| 7 | [] | NULL |
| 8 | [null,null] | [3,6] |
| 9 | [432,21,23] | [5,4,null] |
+------+-------------+------------+
9 rows in set (0.00 sec)

示例一:将数组 c3 按照 c2 的值进行升序排序。

select c1, c3, c2, array_sort(c2), array_sortby(c3,c2)
from test_array order by c1;
+------+------------+-------------+----------------+----------------------+
| c1 | c3 | c2 | array_sort(c2) | array_sortby(c3, c2) |
+------+------------+-------------+----------------+----------------------+
| 1 | [82,1,4] | [4,3,5] | [3,4,5] | [1,82,4] |
| 2 | [23] | NULL | NULL | [23] |
| 3 | [6,5] | [4,2] | [2,4] | [5,6] |
| 4 | NULL | NULL | NULL | NULL |
| 5 | [] | [] | [] | [] |
| 6 | [] | NULL | NULL | [] |
| 7 | NULL | [] | [] | NULL |
| 8 | [3,6] | [null,null] | [null,null] | [3,6] |
| 9 | [5,4,null] | [432,21,23] | [21,23,432] | [4,null,5] |
+------+------------+-------------+----------------+----------------------+

示例二:将数组 c3 按照 Lambda 表达式生成的键值数组进行升序排序。该函数与上个示例功能对等。

select 
c1,
c3,
c2,
array_sort(c2) as sorted_c2_asc,
array_sortby((x,y) -> y, c3, c2) as sorted_c3_by_c2
from test_array order by c1;
+------+------------+-------------+---------------+-----------------+
| c1 | c3 | c2 | sorted_c2_asc | sorted_c3_by_c2 |
+------+------------+-------------+---------------+-----------------+
| 1 | [82,1,4] | [4,3,5] | [3,4,5] | [1,82,4] |
| 2 | [23] | NULL | NULL | [23] |
| 3 | [6,5] | [4,2] | [2,4] | [5,6] |
| 4 | NULL | NULL | NULL | NULL |
| 5 | [] | [] | [] | [] |
| 6 | [] | NULL | NULL | [] |
| 7 | NULL | [] | [] | NULL |
| 8 | [3,6] | [null,null] | [null,null] | [3,6] |
| 9 | [5,4,null] | [432,21,23] | [21,23,432] | [4,null,5] |
+------+------------+-------------+---------------+-----------------+

示例三:将数组 c3 按照按照 c2+c3 的和的升序排序。

select
c3,
c2,
array_map((x,y)-> x+y,c3,c2) as sum,
array_sort(array_map((x,y)-> x+y,c3,c2)) as sorted_sum,
array_sortby((x,y) -> x+y , c3,c2) as sorted_c3_by_sum
from test_array where c1=1;
+----------+---------+----------+------------+------------------+
| c3 | c2 | sum | sorted_sum | sorted_c3_by_sum |
+----------+---------+----------+------------+------------------+
| [82,1,4] | [4,3,5] | [86,4,9] | [4,9,86] | [1,4,82] |
+----------+---------+----------+------------+------------------+
CREATE TABLE test_array_sortby_muliti (
id INT(11) not null,
array_col1 ARRAY<INT>,
array_col2 ARRAY<DOUBLE>,
array_col3 ARRAY<VARCHAR(20)>,
array_col4 ARRAY<DATE>
) ENGINE=OLAP
DUPLICATE KEY(id)
COMMENT "OLAP"
DISTRIBUTED BY HASH(id)
PROPERTIES (
"replication_num" = "1",
"storage_format" = "DEFAULT",
"enable_persistent_index" = "false",
"compression" = "LZ4"
);

INSERT INTO test_array_sortby_multi VALUES
(1, [4, 3, 5], [1.1, 2.2, 2.2], ['a', 'b', 'c'], ['2023-01-01', '2023-01-02', '2023-01-03']),
(2, [6, 7, 8], [6.6, 5.5, 6.6], ['d', 'e', 'd'], ['2023-01-04', '2023-01-05', '2023-01-06']),
(3, NULL, [7.7, 8.8, 8.8], ['g', 'h', 'h'], ['2023-01-07', '2023-01-08', '2023-01-09']),
(4, [9, 10, 11], NULL, ['k', 'k', 'j'], ['2023-01-10', '2023-01-12', '2023-01-11']),
(5, [12, 13, 14], [10.10, 11.11, 11.11], NULL, ['2023-01-13', '2023-01-14', '2023-01-15']),
(6, [15, 16, 17], [14.14, 13.13, 14.14], ['m', 'o', 'o'], NULL),
(7, [18, 19, 20], [16.16, 16.16, 18.18], ['p', 'p', 'r'], ['2023-01-16', NULL, '2023-01-18']),
(8, [21, 22, 23], [19.19, 20.20, 19.19], ['a', 't', 'a'], ['2023-01-19', '2023-01-20', '2023-01-21']),
(9, [24, 25, 26], NULL, ['y', 'y', 'z'], ['2023-01-25', '2023-01-24', '2023-01-26']),
(10, [24, 25, 26], NULL, ['y', 'y', 'z'], ['2023-01-25', NULL, '2023-01-26']);


select * from test_array_sortby_multi order by id asc;
+------+------------+---------------------+---------------+------------------------------------------+
| id | array_col1 | array_col2 | array_col3 | array_col4 |
+------+------------+---------------------+---------------+------------------------------------------+
| 1 | [4,3,5] | [1.1,2.2,2.2] | ["a","b","c"] | ["2023-01-01","2023-01-02","2023-01-03"] |
| 2 | [6,7,8] | [6.6,5.5,6.6] | ["d","e","d"] | ["2023-01-04","2023-01-05","2023-01-06"] |
| 3 | NULL | [7.7,8.8,8.8] | ["g","h","h"] | ["2023-01-07","2023-01-08","2023-01-09"] |
| 4 | [9,10,11] | NULL | ["k","k","j"] | ["2023-01-10","2023-01-12","2023-01-11"] |
| 5 | [12,13,14] | [10.1,11.11,11.11] | NULL | ["2023-01-13","2023-01-14","2023-01-15"] |
| 6 | [15,16,17] | [14.14,13.13,14.14] | ["m","o","o"] | NULL |
| 7 | [18,19,20] | [16.16,16.16,18.18] | ["p","p","r"] | ["2023-01-16",null,"2023-01-18"] |
| 8 | [21,22,23] | [19.19,20.2,19.19] | ["a","t","a"] | ["2023-01-19","2023-01-20","2023-01-21"] |
| 9 | [24,25,26] | NULL | ["y","y","z"] | ["2023-01-25","2023-01-24","2023-01-26"] |
| 10 | [24,25,26] | NULL | ["y","y","z"] | ["2023-01-25",null,"2023-01-26"] |
+------+------------+---------------------+---------------+------------------------------------------+

示例一:将数组 array_col1 按照 array_col2 , array_col3的值进行升序排序。

select id, array_col1, array_col2, array_col3, array_sortby(array_col1, array_col2, array_col3) from test_array_sortby_multi order by id asc;
+------+------------+---------------------+---------------+--------------------------------------------------------+
| id | array_col1 | array_col2 | array_col3 | array_sortby(array_col1, array_col2, array_col3) |
+------+------------+---------------------+---------------+--------------------------------------------------------+
| 1 | [4,3,5] | [1.1,2.2,2.2] | ["a","b","c"] | [4,3,5] |
| 2 | [6,7,8] | [6.6,5.5,6.6] | ["d","e","d"] | [7,6,8] |
| 3 | NULL | [7.7,8.8,8.8] | ["g","h","h"] | NULL |
| 4 | [9,10,11] | NULL | ["k","k","j"] | [11,9,10] |
| 5 | [12,13,14] | [10.1,11.11,11.11] | NULL | [12,13,14] |
| 6 | [15,16,17] | [14.14,13.13,14.14] | ["m","o","o"] | [16,15,17] |
| 7 | [18,19,20] | [16.16,16.16,18.18] | ["p","p","r"] | [18,19,20] |
| 8 | [21,22,23] | [19.19,20.2,19.19] | ["a","t","a"] | [21,23,22] |
| 9 | [24,25,26] | NULL | ["y","y","z"] | [24,25,26] |
| 10 | [24,25,26] | NULL | ["y","y","z"] | [24,25,26] |
+------+------------+---------------------+---------------+--------------------------------------------------------+

示例二:将数组 array_col1 按照 array_col2 , array_col3, array_col4的值进行升序排序。

select id, array_col1, array_col2, array_col3, array_col4, array_sortby(array_col1, array_col2, array_col3, array_col4) from test_array_sortby_multi order by id asc;
+------+------------+---------------------+---------------+------------------------------------------+--------------------------------------------------------------------+
| id | array_col1 | array_col2 | array_col3 | array_col4 | array_sortby(array_col1, array_col2, array_col3, array_col4) |
+------+------------+---------------------+---------------+------------------------------------------+--------------------------------------------------------------------+
| 1 | [4,3,5] | [1.1,2.2,2.2] | ["a","b","c"] | ["2023-01-01","2023-01-02","2023-01-03"] | [4,3,5] |
| 2 | [6,7,8] | [6.6,5.5,6.6] | ["d","e","d"] | ["2023-01-04","2023-01-05","2023-01-06"] | [7,6,8] |
| 3 | NULL | [7.7,8.8,8.8] | ["g","h","h"] | ["2023-01-07","2023-01-08","2023-01-09"] | NULL |
| 4 | [9,10,11] | NULL | ["k","k","j"] | ["2023-01-10","2023-01-12","2023-01-11"] | [11,9,10] |
| 5 | [12,13,14] | [10.1,11.11,11.11] | NULL | ["2023-01-13","2023-01-14","2023-01-15"] | [12,13,14] |
| 6 | [15,16,17] | [14.14,13.13,14.14] | ["m","o","o"] | NULL | [16,15,17] |
| 7 | [18,19,20] | [16.16,16.16,18.18] | ["p","p","r"] | ["2023-01-16",null,"2023-01-18"] | [19,18,20] |
| 8 | [21,22,23] | [19.19,20.2,19.19] | ["a","t","a"] | ["2023-01-19","2023-01-20","2023-01-21"] | [21,23,22] |
| 9 | [24,25,26] | NULL | ["y","y","z"] | ["2023-01-25","2023-01-24","2023-01-26"] | [25,24,26] |
| 10 | [24,25,26] | NULL | ["y","y","z"] | ["2023-01-25",null,"2023-01-26"] | [25,24,26] |
+------+------------+---------------------+---------------+------------------------------------------+--------------------------------------------------------------------+

相关文档

array_sort