Load data from Microsoft Azure Storage
StarRocks allows you to load data in bulk from Microsoft Azure Storage by using Broker Load.
Broker Load runs in asynchronous mode. An asynchronous Broker Load process handles making the connection to Azure, pulling the data, and storing the data in StarRocks.
Broker Load supports the Parquet, ORC, and CSV file formats.
Advantages of Broker Load
- Broker Load runs in the background and clients do not need to stay connected for the job to continue.
- Broker Load is preferred for long-running jobs, with the default timeout spanning 4 hours.
- In addition to Parquet and ORC file formats, Broker Load supports CSV files.
Data flow
- The user creates a load job.
- The frontend (FE) creates a query plan and distributes the plan to the backend nodes (BEs) or compute nodes (CNs).
- The BEs or CNs pull the data from the source and load the data into StarRocks.
Before you begin
Make source data ready
Make sure that the source data you want to load into StarRocks is properly stored in a container within your Azure storage account.
In this topic, suppose you want to load the data of a Parquet-formatted sample dataset (user_behavior_ten_million_rows.parquet
) stored in the root directory of a container (starrocks-container
) within an Azure Data Lake Storage Gen2 (ADLS Gen2) storage account (starrocks
).
Check privileges
You can load data into StarRocks tables only as a user who has the INSERT privilege on those StarRocks tables. If you do not have the INSERT privilege, follow the instructions provided in GRANT to grant the INSERT privilege to the user that you use to connect to your StarRocks cluster.