Back up and restore data
This topic describes how to back up and restore data in StarRocks, or migrate data to a new StarRocks cluster.
StarRocks supports backing up data as snapshots into a remote storage system and restoring the data to any StarRocks clusters.
From v3.4.0 onwards, StarRocks have enhanced the functionality of BACKUP and RESTORE by supporting more objects and refactoring the syntax for better flexibility.
StarRocks supports the following remote storage systems:
- Apacheโข Hadoopยฎ (HDFS) cluster
- AWS S3
- Google GCS
- MinIO
StarRocks supports backing up the following objects:
- Internal databases, tables (of all types and partitioning strategies), and partitions
- Metadata of external catalogs (supported from v3.4.0 onwards)
- Synchronous materialized views and asynchronous materialized views
- Logical views (supported from v3.4.0 onwards)
- User-defined functions (supported from v3.4.0 onwards)
NOTE
Shared-data StarRocks clusters do not support data BACKUP and RESTORE.
Create a repositoryโ
Before backing up data, you need to create a repository, which is used to store data snapshots in a remote storage system. You can create multiple repositories in a StarRocks cluster. For detailed instructions, see CREATE REPOSITORY.
- Create a repository in HDFS
The following example creates a repository named test_repo in an HDFS cluster.
CREATE REPOSITORY test_repo
WITH BROKER
ON LOCATION "hdfs://<hdfs_host>:<hdfs_port>/repo_dir/backup"
PROPERTIES(
"username" = "<hdfs_username>",
"password" = "<hdfs_password>"
);
-
Create a repository in AWS S3
You can choose IAM user-based credential (Access Key and Secret Key), Instance Profile, or Assumed Role as the credential method for accessing AWS S3.
- The following example creates a repository named
test_repoin the AWS S3 bucketbucket_s3using IAM user-based credentials as the credential method.
CREATE REPOSITORY test_repo
WITH BROKER
ON LOCATION "s3a://bucket_s3/backup"
PROPERTIES(
"aws.s3.access_key" = "XXXXXXXXXXXXXXXXX",
"aws.s3.secret_key" = "yyyyyyyyyyyyyyyyyyyyyyyy",
"aws.s3.region" = "us-east-1"
);- The following example creates a repository named
test_repoin the AWS S3 bucketbucket_s3using Instance Profile as the credential method.
CREATE REPOSITORY test_repo
WITH BROKER
ON LOCATION "s3a://bucket_s3/backup"
PROPERTIES(
"aws.s3.use_instance_profile" = "true",
"aws.s3.region" = "us-east-1"
);- The following example creates a repository named
test_repoin the AWS S3 bucketbucket_s3using Assumed Role as the credential method.
CREATE REPOSITORY test_repo
WITH BROKER
ON LOCATION "s3a://bucket_s3/backup"
PROPERTIES(
"aws.s3.use_instance_profile" = "true",
"aws.s3.iam_role_arn" = "arn:aws:iam::xxxxxxxxxx:role/yyyyyyyy",
"aws.s3.region" = "us-east-1"
); - The following example creates a repository named
NOTE
StarRocks supports creating repositories in AWS S3 only according to the S3A protocol. Therefore, when you create repositories in AWS S3, you must replace
s3://in the S3 URI you pass as a repository location inON LOCATIONwiths3a://.
- Create a repository in Google GCS
The following example creates a repository named test_repo in the Google GCS bucket bucket_gcs.
CREATE REPOSITORY test_repo
WITH BROKER
ON LOCATION "s3a://bucket_gcs/backup"
PROPERTIES(
"fs.s3a.access.key" = "xxxxxxxxxxxxxxxxxxxx",
"fs.s3a.secret.key" = "yyyyyyyyyyyyyyyyyyyy",
"fs.s3a.endpoint" = "storage.googleapis.com"
);
NOTE
- StarRocks supports creating repositories in Google GCS only according to the S3A protocol. Therefore, when you create repositories in Google GCS, you must replace the prefix in the GCS URI you pass as a repository location in
ON LOCATIONwiths3a://.- Do not specify
httpsin the endpoint address.
- Create a repository in MinIO
The following example creates a repository named test_repo in the MinIO bucket bucket_minio.
CREATE REPOSITORY test_repo
WITH BROKER
ON LOCATION "s3://bucket_minio/backup"
PROPERTIES(
"aws.s3.access_key" = "XXXXXXXXXXXXXXXXX",
"aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy",
"aws.s3.endpoint" = "http://minio:9000"
);
After the repository is created, you can check the repository via SHOW REPOSITORIES. After restoring data, you can delete the repository in StarRocks using DROP REPOSITORY. However, data snapshots backed up in the remote storage system cannot be deleted through StarRocks. You need to delete them manually in the remote storage system.
Back up dataโ
After the repository is created, you need to create a data snapshot and back up it in the remote repository. For detailed instructions, see BACKUP. BACKUP is an asynchronous operation. You can check the status of a BACKUP job using SHOW BACKUP, or cancel a BACKUP job using CANCEL BACKUP.
StarRocks supports FULL backup on the granularity level of database, table, or partition.
If you have stored a large amount of data in a table, we recommend that you back up and restore data by partition. This way, you can reduce the cost of retries in case of job failures. If you need to back up incremental data on a regular basis, you can configure a partitioning plan for your table, and back up only new partitions each time.
Back up databaseโ
Performing a full BACKUP on a database will back up all tables, synchronous and asynchronous materialized views, logical views, and UDFs within the database.
The following examples back up the database sr_hub in the snapshot sr_hub_backup and upload the snapshot to the repository test_repo.
-- Supported from v3.4.0 onwards.
BACKUP DATABASE sr_hub SNAPSHOT sr_hub_backup
TO test_repo;
-- Compatible with the syntax in earlier versions.
BACKUP SNAPSHOT sr_hub.sr_hub_backup
TO test_repo;
Back up tableโ
StarRocks supports backing up and restoring tables of all types and partitioning strategies. Performing a full BACKUP on a table will back up the table and the synchronous materialized views built on it.
The following examples back up the table sr_member from the database sr_hub in the snapshot sr_member_backup and upload the snapshot to the repository test_repo.
-- Supported from v3.4.0 onwards.
BACKUP DATABASE sr_hub SNAPSHOT sr_member_backup
TO test_repo
ON (TABLE sr_member);
-- Compatible with the syntax in earlier versions.
BACKUP SNAPSHOT sr_hub.sr_member_backup
TO test_repo
ON (sr_member);
The following example backs up two tables, sr_member and sr_pmc, from the database sr_hub in the snapshot sr_core_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_core_backup
TO test_repo
ON (TABLE sr_member, TABLE sr_pmc);
The following example backs up all tables from the database sr_hub in the snapshot sr_all_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_all_backup
TO test_repo
ON (ALL TABLES);
Back up partitionโ
The following examples back up the partition p1 of the table sr_member from the database sr_hub in the snapshot sr_par_backup and upload the snapshot to the repository test_repo.
-- Supported from v3.4.0 onwards.
BACKUP DATABASE sr_hub SNAPSHOT sr_par_backup
TO test_repo
ON (TABLE sr_member PARTITION (p1));
-- Compatible with the syntax in earlier versions.
BACKUP SNAPSHOT sr_hub.sr_par_backup
TO test_repo
ON (sr_member PARTITION (p1));
You can specify multiple partition names separated by commas (,) to back up partitions in batch.
Back up materialized viewโ
You do not need to manually back up synchronous materialized views because they will be backed up along with the BACKUP operation of the base table.
Asynchronous materialized views can be backed up along with the BACKUP operation of the database it belongs to. You can also manually back up them.
The following example backs up the materialized view sr_mv1 from the database sr_hub in the snapshot sr_mv1_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_mv1_backup
TO test_repo
ON (MATERIALIZED VIEW sr_mv1);
The following example backs up two materialized views, sr_mv1 and sr_mv2, from the database sr_hub in the snapshot sr_mv2_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_mv2_backup
TO test_repo
ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2);
The following example backs up all materialized views from the database sr_hub in the snapshot sr_mv3_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_mv3_backup
TO test_repo
ON (ALL MATERIALIZED VIEWS);
Back up logical viewโ
The following example backs up the logical view sr_view1 from the database sr_hub in the snapshot sr_view1_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_view1_backup
TO test_repo
ON (VIEW sr_view1);
The following example backs up two logical views, sr_view1 and sr_view2, from the database sr_hub in the snapshot sr_view2_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_view2_backup
TO test_repo
ON (VIEW sr_view1, VIEW sr_view2);
The following example backs up all logical views from the database sr_hub in the snapshot sr_view3_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_view3_backup
TO test_repo
ON (ALL VIEWS);
Back up UDFโ
The following example backs up the UDF sr_udf1 from the database sr_hub in the snapshot sr_udf1_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_udf1_backup
TO test_repo
ON (FUNCTION sr_udf1);
The following example backs up two UDFs, sr_udf1 and sr_udf2, from the database sr_hub in the snapshot sr_udf2_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_udf2_backup
TO test_repo
ON (FUNCTION sr_udf1, FUNCTION sr_udf2);
The following example backs up all UDFs from the database sr_hub in the snapshot sr_udf3_backup and upload the snapshot to the repository test_repo.
BACKUP DATABASE sr_hub SNAPSHOT sr_udf3_backup
TO test_repo
ON (ALL FUNCTIONS);
Back up metadata of external catalogโ
The following example backs up the metadata of the external catalog iceberg in the snapshot iceberg_backup and upload the snapshot to the repository test_repo.
BACKUP EXTERNAL CATALOG (iceberg) SNAPSHOT iceberg_backup
TO test_repo;
The following example backs up the metadata of two external catalogs, iceberg and hive, in the snapshot iceberg_hive_backup and upload the snapshot to the repository test_repo.
BACKUP EXTERNAL CATALOGS (iceberg, hive) SNAPSHOT iceberg_hive_backup
TO test_repo;
The following example backs up the metadata of all external catalogs in the snapshot all_catalog_backup and upload the snapshot to the repository test_repo.
BACKUP ALL EXTERNAL CATALOGS SNAPSHOT all_catalog_backup
TO test_repo;
To cancel the BACKUP operation on external catalogs, execute the following statement:
CANCEL BACKUP FOR EXTERNAL CATALOG;
Restore dataโ
You can restore the data snapshot backed up in the remote storage system to the current or other StarRocks clusters to restore or migrate data.
When you restore an object from a snapshot, you must specify the timestamp of the snapshot.
Use the RESTORE statement to restore data snapshots in the remote storage system.
RESTORE is an asynchronous operation. You can check the status of a RESTORE job using SHOW RESTORE, or cancel a RESTORE job using CANCEL RESTORE.
(Optional) Create a repository in the new clusterโ
To migrate data to another StarRocks cluster, you need to create a repository with the same repository name and location in the target cluster, otherwise, you will not be able to view the previously backed-up data snapshots. See Create a repository for details.
Obtain snapshot timestampโ
Before restoring data, you can check the snapshots in the repository to obtain the timestamps using SHOW SNAPSHOT.
The following example checks the snapshot information in test_repo.
mysql> SHOW SNAPSHOT ON test_repo;
+------------------+-------------------------+--------+
| Snapshot | Timestamp | Status |
+------------------+-------------------------+--------+
| sr_member_backup | 2023-02-07-14-45-53-143 | OK |
+------------------+-------------------------+--------+
1 row in set (1.16 sec)
Restore databaseโ
The following examples restore the database sr_hub in the snapshot sr_hub_backup to the database sr_hub in the target cluster. If the database does not exist in the snapshot, the system will return an error. If the database does not exist in the target cluster, the system will create it automatically.
-- Supported from v3.4.0 onwards.
RESTORE SNAPSHOT sr_hub_backup
FROM test_repo
DATABASE sr_hub
PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842");
-- Compatible with the syntax in earlier versions.
RESTORE SNAPSHOT sr_hub.sr_hub_backup
FROM `test_repo`
PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842");
The following example restore the database sr_hub in the snapshot sr_hub_backup to the database sr_hub_new in the target cluster. If the database sr_hub does not exist in the snapshot, the system will return an error. If the database sr_hub_new does not exist in the target cluster, the system will create it automatically.
-- Supported from v3.4.0 onwards.
RESTORE SNAPSHOT sr_hub_backup
FROM test_repo
DATABASE sr_hub AS sr_hub_new
PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842");
Restore tableโ
The following examples restore the table sr_member of the database sr_hub in the snapshot sr_member_backup to the table sr_member of the database sr_hub in the target cluster.
-- Supported from v3.4.0 onwards.
RESTORE SNAPSHOT sr_member_backup
FROM test_repo
DATABASE sr_hub
ON (TABLE sr_member)
PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940");
-- Compatible with the syntax in earlier versions.
RESTORE SNAPSHOT sr_hub.sr_member_backup
FROM test_repo
ON (sr_member)
PROPERTIES ("backup_timestamp"="2024-12-09-10-52-10-940");
The following examples restore the table sr_member of the database sr_hub in the snapshot sr_member_backup to the table sr_member_new of the database sr_hub_new in the target cluster.
RESTORE SNAPSHOT sr_member_backup
FROM test_repo
DATABASE sr_hub AS sr_hub_new
ON (TABLE sr_member AS sr_member_new)
PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940");
The following example restores two tables, sr_member and sr_pmc, of the database sr_hub in the snapshot sr_core_backup to two tables, sr_member and sr_pmc, of the database sr_hub in the target cluster.
RESTORE SNAPSHOT sr_core_backup
FROM test_repo
DATABASE sr_hub
ON (TABLE sr_member, TABLE sr_pmc)
PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940");
The following example restores all tables from the database sr_hub in the snapshot sr_all_backup.
RESTORE SNAPSHOT sr_all_backup
FROM test_repo
DATABASE sr_hub
ON (ALL TABLES);
The following example restores one of all tables from the database sr_hub in the snapshot sr_all_backup.
RESTORE SNAPSHOT sr_all_backup
FROM test_repo
DATABASE sr_hub
ON (TABLE sr_member)
PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940");
Restore partitionโ
The following examples restore the partition p1 of the table sr_member in the snapshot sr_par_backup to the partition p1 of the table sr_member in the target cluster.
-- Supported from v3.4.0 onwards.
RESTORE SNAPSHOT sr_par_backup
FROM test_repo
DATABASE sr_hub
ON (TABLE sr_member PARTITION (p1))
PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940");
-- Compatible with the syntax in earlier versions.
RESTORE SNAPSHOT sr_hub.sr_par_backup
FROM test_repo
ON (sr_member PARTITION (p1))
PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940");
You can specify multiple partition names separated by commas (,) to restore partitions in batch.
Restore materialized viewโ
The following example restores the materialized view sr_mv1 from the database sr_hub in the snapshot sr_mv1_backup to the target cluster.
RESTORE SNAPSHOT sr_mv1_backup
FROM test_repo
DATABASE sr_hub
ON (MATERIALIZED VIEW sr_mv1)
PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"