AUTO_INCREMENT
Since version 3.0, StarRocks supports the AUTO_INCREMENT
column attribute, which can simplify data management. This topic introduces the application scenarios, usage and features of the AUTO_INCREMENT
column attribute.
Introduction
When a new data row is loaded into a table and values are not specified for the AUTO_INCREMENT
column, StarRocks automatically assigns an integer value for the row's AUTO_INCREMENT
column as its unique ID across the table. The subsequent values for the AUTO_INCREMENT
column automatically increase at a specific step starting from the ID of the row. An AUTO_INCREMENT
column can be used to simplify data management and speed up some queries. Here are some application scenarios of an AUTO_INCREMENT
column:
- Serve as primary keys: An
AUTO_INCREMENT
column can be used as the primary key to ensure that each row has a unique ID and make it easy to query and manage data. - Join tables: When multiple tables are joined, an
AUTO_INCREMENT
column can be used as the Join Key, which can expedite queries compared to using a column whose data type is STRING, for example, UUID. - Count the number of distinct values in a high-cardinality column: An
AUTO_INCREMENT
column can be used to represent the unique value column in a dictionary. Compared to directly counting distinct STRING values, counting distinct integer values of theAUTO_INCREMENT
column can sometimes improve the query speed by several times or even tens of times.
You need to specify an AUTO_INCREMENT
column in the CREATE TABLE statement. The data types of an AUTO_INCREMENT
column must be BIGINT. The value for an AUTO_INCREMENT column can be implicitly assigned or explicitly specified. It starts from 1, and increments by 1 for each new row.
Basic operations
Specify AUTO_INCREMENT
column at table creation
Create a table named test_tbl1
with two columns, id
and number
. Specify the column number
as the AUTO_INCREMENT
column.
CREATE TABLE test_tbl1
(
id BIGINT NOT NULL,
number BIGINT NOT NULL AUTO_INCREMENT
)
PRIMARY KEY (id)
DISTRIBUTED BY HASH(id)
PROPERTIES("replicated_storage" = "true");
Assign values for AUTO_INCREMENT
column
Assign values implicitly
When you load data into a StarRocks table, you do not need to specify the values for the AUTO_INCREMENT
column. StarRocks automatically assigns unique integer values for that column and inserts them into the table.
INSERT INTO test_tbl1 (id) VALUES (1);
INSERT INTO test_tbl1 (id) VALUES (2);
INSERT INTO test_tbl1 (id) VALUES (3),(4),(5);
View data in the table.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
+------+--------+
5 rows in set (0.02 sec)
When you load data into a StarRocks table, you can also specify the values as DEFAULT
for the AUTO_INCREMENT
column. StarRocks automatically assigns unique integer values for that column and inserts them into the table.
INSERT INTO test_tbl1 (id, number) VALUES (6, DEFAULT);
View data in the table.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
+------+--------+
6 rows in set (0.02 sec)
In actual usage, the following result may be returned when you view the data in the table. This is because StarRocks cannot guarantee that the values for the AUTO_INCREMENT
column are strictly monotonic. But StarRocks can guarantee that the values roughly increase in chronological order. For more information, see Monotonicity.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 100001 |
| 3 | 200001 |
| 4 | 200002 |
| 5 | 200003 |
| 6 | 200004 |
+------+--------+
6 rows in set (0.01 sec)
Specify values explicitly
You can also explicitly specify the values for the AUTO_INCREMENT
column and insert them into the table.
INSERT INTO test_tbl1 (id, number) VALUES (7, 100);
-- view data in the table.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 100001 |
| 3 | 200001 |
| 4 | 200002 |
| 5 | 200003 |
| 6 | 200004 |
| 7 | 100 |
+------+--------+
7 rows in set (0.01 sec)
Moreover, explicitly specifying values does not affect the subsequent values generated by StarRocks for newly inserted data rows.
INSERT INTO test_tbl1 (id) VALUES (8);
-- view data in the table.
mysql > SELECT * FROM test_tbl1 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 100001 |
| 3 | 200001 |
| 4 | 200002 |
| 5 | 200003 |
| 6 | 200004 |
| 7 | 100 |
| 8 | 2 |
+------+--------+
8 rows in set (0.01 sec)
NOTICE
We recommend that you do not use implicitly assigned values and explicitly specified values for the AUTO_INCREMENT
column at the same time. Because the specified values may be the same as the values generated by StarRocks, breaking the global uniqueness of auto-incremented IDs.
Basic features
Uniqueness
In general, StarRocks guarantees that the values for a AUTO_INCREMENT
column are globally unique across a table. We recommend that you do not implicitly assign and explicitly specify the values for the AUTO_INCREMENT
column at the same time. If you do so, it may break the global uniqueness of auto-incremented IDs. Here is a simple example: Create a table named test_tbl2
with two columns, id
and number
. Specify the column number
as the AUTO_INCREMENT
column.
CREATE TABLE test_tbl2
(
id BIGINT NOT NULL,
number BIGINT NOT NULL AUTO_INCREMENT
)
PRIMARY KEY (id)
DISTRIBUTED BY HASH(id)
PROPERTIES("replicated_storage" = "true");
Implicitly assign and explicitly specify the values for the AUTO_INCREMENT
column number
in the table test_tbl2
.
INSERT INTO test_tbl2 (id, number) VALUES (1, DEFAULT);
INSERT INTO test_tbl2 (id, number) VALUES (2, 2);
INSERT INTO test_tbl2 (id) VALUES (3);
Query the table test_tbl2
.
mysql > SELECT * FROM test_tbl2 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 100001 |
+------+--------+
3 rows in set (0.08 sec)
Monotonicity
In order to improve the performance of allocating auto-incremented IDs, BEs cache some auto-incremented IDs locally. In this situation, StarRocks cannot guarantee that the values for the AUTO_INCREMENT
column are strictly monotonic. It can only be ensured that the values roughly increase in chronological order.
NOTE
The number of auto-incremented IDs cached by the BEs is determined by the FE dynamic parameter
auto_increment_cache_size
, which defaults to100,000
. You can modify the value by usingADMIN SET FRONTEND CONFIG ("auto_increment_cache_size" = "xxx");
For example, a StarRocks cluster has one FE node and two BE nodes. Create a table named test_tbl3
and insert five rows of data as follows:
CREATE TABLE test_tbl3
(
id BIGINT NOT NULL,
number BIGINT NOT NULL AUTO_INCREMENT
)
PRIMARY KEY (id)
DISTRIBUTED BY HASH(id)
PROPERTIES("replicated_storage" = "true");
INSERT INTO test_tbl3 VALUES (1, DEFAULT);
INSERT INTO test_tbl3 VALUES (2, DEFAULT);
INSERT INTO test_tbl3 VALUES (3, DEFAULT);
INSERT INTO test_tbl3 VALUES (4, DEFAULT);
INSERT INTO test_tbl3 VALUES (5, DEFAULT);
The auto-incremented IDs in the table test_tbl3
do not monotonically increase, because the two BE nodes cache auto-incremented IDs, [1, 100000] and [100001, 200000], respectively. When data is loaded by using multiple INSERT statements, the data is sent to different BE nodes which allocate auto-incremented IDs independently. Therefore, it cannot be guaranteed that auto-incremented IDs are strictly monotonic.
mysql > SELECT * FROM test_tbl3 ORDER BY id;
+------+--------+
| id | number |
+------+--------+
| 1 | 1 |
| 2 | 100001 |
| 3 | 200001 |
| 4 | 2 |
| 5 | 100002 |
+------+--------+
5 rows in set (0.07 sec)
Partial updates and AUTO_INCREMENT
column
This section explains how to update only a few specified columns in a table that contains an AUTO_INCREMENT
column.
NOTE
Currently, only primary key tables support partial updates.
AUTO_INCREMENT
column is primary key
You need to specify the primary key during partial updates. Therefore, if the AUTO_INCREMENT
column is the primary key or part of the primary key, the user behavior for partial updates is exactly the same as when the AUTO_INCREMENT
column is not defined.
-
Create a table
test_tbl4
in the databaseexample_db
and insert one data row.-- Create a table.
CREATE TABLE test_tbl4
(
id BIGINT AUTO_INCREMENT,
name BIGINT NOT NULL,
job1 BIGINT NOT NULL,
job2 BIGINT NOT NULL
)
PRIMARY KEY (id, name)
DISTRIBUTED BY HASH(id)
PROPERTIES("replicated_storage" = "true");
-- Prepared data.
mysql > INSERT INTO test_tbl4 (id, name, job1, job2) VALUES (0, 0, 1, 1);
Query OK, 1 row affected (0.04 sec)
{'label':'insert_6af28e77-7d2b-11ed-af6e-02424283676b', 'status':'VISIBLE', 'txnId':'152'}
-- Query the table.
mysql > SELECT * FROM test_tbl4 ORDER BY id;
+------+------+------+------+
| id | name | job1 | job2 |
+------+------+------+------+
| 0 | 0 | 1 | 1 |
+------+------+------+------+
1 row in set (0.01 sec) -
Prepare the CSV file my_data4.csv to update table
test_tbl4
. The CSV file includes values for theAUTO_INCREMENT
column and does not include values for the columnjob1
. The primary key of the first row already exists in tabletest_tbl4
, while the primary key of the second row does not exist in the table.0,0,99
1,1,99 -
Run a Stream Load job and use the CSV file to update table
test_tbl4
.curl --location-trusted -u <username>:<password> -H "label:1" \
-H "column_separator:," \
-H "partial_update:true" \
-H "columns:id,name,job2" \
-T my_data4.csv -XPUT \
http://<fe_host>:<fe_http_port>/api/example_db/test_tbl4/_stream_load -
Query the updated table. The first row of data already exists in table
test_tbl4
, and the value for the columnjob1
remains unchanged. The second row of data is newly inserted, and because the default value for the columnjob1
is not specified, the partial update framework directly sets the value for this column to0
.mysql > SELECT * FROM test_tbl4 ORDER BY id;
+------+------+------+------+
| id | name | job1 | job2 |
+------+------+------+------+
| 0 | 0 | 1 | 99 |
| 1 | 1 | 0 | 99 |
+------+------+------+------+
2 rows in set (0.01 sec)
AUTO_INCREMENT
column is not primary key
If the AUTO_INCREMENT
column is not a primary key or a part of the primary key, and auto-incremented IDs are not provided in a Stream Load job, the following situations occur:
- If the row already exists in the table, StarRocks does not update the auto-incremented ID.
- If the row is newly loaded into the table, StarRocks generates a new auto-incremented ID.
This feature can be used to build a dictionary table for quickly computing distinct STRING values.
-
In the database
example_db
, create a tabletest_tbl5
and specify the columnjob1
as theAUTO_INCREMENT
column and insert a data row into the tabletest_tbl5
.-- Create a table.
CREATE TABLE test_tbl5
(
id BIGINT NOT NULL,
name BIGINT NOT NULL,
job1 BIGINT NOT NULL AUTO_INCREMENT,
job2 BIGINT NOT NULL
)
PRIMARY KEY (id, name)
DISTRIBUTED BY HASH(id)
PROPERTIES("replicated_storage" = "true");
-- Prepare data.
mysql > INSERT INTO test_tbl5 VALUES (0, 0, -1, -1);
Query OK, 1 row affected (0.04 sec)
{'label':'insert_458d9487-80f6-11ed-ae56-aa528ccd0ebf', 'status':'VISIBLE', 'txnId':'94'}
-- Query the table.
mysql > SELECT * FROM test_tbl5 ORDER BY id;
+------+------+------+------+
| id | name | job1 | job2 |
+------+------+------+------+
| 0 | 0 | -1 | -1 |
+------+------+------+------+
1 row in set (0.01 sec) -
Prepare a CSV file my_data5.csv to update table
test_tbl5
. The CSV file does not contain values for theAUTO_INCREMENT
columnjob1
. The primary key of the first row already exists in the table while the primary keys of the second and third rows do not.0,0,99
1,1,99
2,2,99 -
Run a Stream Load job to load data from the CSV file into table
test_tbl5
.curl --location-trusted -u <username>:<password> -H "label:2" \
-H "column_separator:," \
-H "partial_update:true" \
-H "columns: id,name,job2" \
-T my_data5.csv -XPUT \
http://<fe_host>:<fe_http_port>/api/example_db/test_tbl5/_stream_load -
Query the updated table. The first row of data already exists in table
test_tbl5
, so theAUTO_INCREMENT
columnjob1
retains its original value. The second and third rows of data are newly inserted, so StarRocks generate new values for theAUTO_INCREMENT
columnjob1
.mysql > SELECT * FROM test_tbl5 ORDER BY id;
+------+------+--------+------+
| id | name | job1 | job2 |
+------+------+--------+------+
| 0 | 0 | -1 | 99 |
| 1 | 1 | 1 | 99 |
| 2 | 2 | 100001 | 99 |
+------+------+--------+------+
3 rows in set (0.01 sec)
Limits
-
When a table with an
AUTO_INCREMENT
column is created,'replicated_storage' = 'true'
must be set to ensure that all replicas have the same auto-incremented IDs. -
Each table can have only one
AUTO_INCREMENT
column. -
The data type of the
AUTO_INCREMENT
column must be BIGINT. -
The
AUTO_INCREMENT
column must beNOT NULL
and does not have a default value. -
You can delete data from a Primary Key table with an
AUTO_INCREMENT
column. However, if theAUTO_INCREMENT
column is not the primary key or part of the primary key, you need to note the following limits when you delete data in the following scenarios:- During the DELETE operation, there is also a load job for partial updates, which only contains UPSERT operations. If both the UPSERT and DELETE operations hit the same data row and the UPSERT operation is executed after the DELETE operation, the UPSERT operation may not take effect.
- There is a load job for partial updates, which includes multiple UPSERT and DELETE operations on the same data row. If a certain UPSERT operation is executed after the DELETE operation, the UPSERT operation may not take effect.
-
Adding the
AUTO_INCREMENT
attribute by using ALTER TABLE is not supported. -
Since version 3.1, StarRocks's shared-data mode supports the
AUTO_INCREMENT
attribute. -
StarRocks does not support specifying the starting value and step size for the
AUTO_INCREMENT
column.
Keywords
AUTO_INCREMENT, AUTO INCREMENT