Feature Support: Data Loading and Unloading
This document outlines the features of various data loading and unloading methods supported by StarRocks.
File formatโ
Loading file formatsโ
| Data Source | File Format | |||||||
|---|---|---|---|---|---|---|---|---|
| CSV | JSON [3] | Parquet | ORC | Avro | ProtoBuf | Thrift | ||
| Stream Load | Local file systems, applications, connectors | Yes | Yes | To be supported | To be supported | To be supported | ||
| INSERT from FILES | HDFS, S3, OSS, Azure, GCS | Yes (v3.3+) | To be supported | Yes (v3.1+) | Yes (v3.1+) | To be supported | ||
| Broker Load | Yes | Yes (v3.2.3+) | Yes | Yes | To be supported | |||
| Routine Load | Kafka | Yes | Yes | To be supported | To be supported | Yes (v3.0+) [1] | To be supported | To be supported | 
| Spark Load | Yes | To be supported | Yes | Yes | To be supported | |||
| Connectors | Flink, Spark | Yes | Yes | To be supported | To be supported | To be supported | ||
| Kafka Connector [2] | Kafka | Yes (v3.0+) | To be supported | To be supported | Yes (v3.0+) | To be supported | ||
| PIPE [4] | Consistent with INSERT from FILES | |||||||
[1], [2]: Schema Registry is required.
[3]: JSON supports a variety of CDC formats. For details about the JSON CDC formats supported by StarRocks, see JSON CDC format.
[4]: Currently, only INSERT from FILES is supported for loading with PIPE.
JSON CDC formatsโ
| Stream Load | Routine Load | Broker Load | INSERT from FILES | Kafka Connector [1] | |
|---|---|---|---|---|---|
| Debezium | To be supported | To be supported | To be supported | To be supported | Yes (v3.0+) | 
| Canal | To be supported | ||||
| Maxwell | |||||
[1]: You must configure the transforms parameter while loading Debezium CDC format data into Primary Key tables in StarRocks.
Unloading file formatsโ
| Target | File format | |||||
|---|---|---|---|---|---|---|
| Table format | Remote storage | CSV | JSON | Parquet | ORC | |
| INSERT INTO FILES | N/A | HDFS, S3, OSS, Azure, GCS | Yes (v3.3+) | To be supported | Yes (v3.2+) | Yes (v3.3+) | 
| INSERT INTO Catalog | Hive | HDFS, S3, OSS, Azure, GCS | Yes (v3.3+) | To be supported | Yes (v3.2+) | Yes (v3.3+) | 
| Iceberg | HDFS, S3, OSS, Azure, GCS | To be supported | To be supported | Yes (v3.2+) | To be supported | |
| Hudi/Delta | To be supported | |||||
| EXPORT | N/A | HDFS, S3, OSS, Azure, GCS | Yes [1] | To be supported | To be supported | To be supported | 
| PIPE | To be supported [2] | |||||
[1]: Configuring Broker process is supported.
[2]: Currently, unloading data using PIPE is not supported.
File format-related parametersโ
Loading file format-related parametersโ
| File format | Parameter | Loading method | ||||
|---|---|---|---|---|---|---|
| Stream Load | INSERT from FILES | Broker Load | Routine Load | Spark Load | ||
| CSV | column_separator | Yes | Yes (v3.3+) | Yes [1] | ||
| row_delimiter | Yes | Yes [2] (v3.1+) | Yes [3] (v2.2+) | To be supported | ||
| enclose | Yes (v3.0+) | Yes (v3.0+) | Yes (v3.0+) | To be supported | ||
| escape | ||||||
| skip_header | To be supported | |||||
| trim_space | Yes (v3.0+) | |||||
| JSON | jsonpaths | Yes | To be supported | Yes (v3.2.3+) | Yes | To be supported | 
| strip_outer_array | ||||||
| json_root | ||||||
| ignore_json_size | To be supported | |||||
[1]: The corresponding parameter is COLUMNS TERMINATED BY.
[2]: The corresponding parameter is ROWS TERMINATED BY.
[3]: The corresponding parameter is ROWS TERMINATED BY.
Unloading file format-related parametersโ
| File format | Parameter | Unloading method | |
|---|---|---|---|
| INSERT INTO FILES | EXPORT | ||
| CSV | column_separator | Yes (v3.3+) | Yes | 
| line_delimiter [1] | |||
[1]: The corresponding parameter in data loading is row_delimiter.
Compression formatsโ
Loading compression formatsโ
| File format | Compression format | Loading method | ||||
|---|---|---|---|---|---|---|
| Stream Load | Broker Load | INSERT from FILES | Routine Load | Spark Load | ||
| CSV | 
 | Yes [1] | Yes [2] | To be supported | To be supported | To be supported | 
| JSON | Yes (v3.2.7+) [3] | To be supported | N/A | To be supported | N/A | |
| Parquet | 
 | N/A | Yes [4] | To be supported | Yes [4] | |
| ORC | ||||||
[1]: Currently, only when loading CSV files with Stream Load can you specify the compression format by using format=gzip, indicating gzip-compressed CSV files. deflate and bzip2 formats are also supported.
[2]: Broker Load does not support specifying the compression format of CSV files by using the parameter format. Broker Load identifies the compression format by using the suffix of the file. The suffix of gzip-compressed files is .gz, and that of the zstd-compressed files is .zst. Besides, other format-related parameters, such as trim_space and enclose, are also not supported.
[3]: Supports specifying the compression format by using compression = gzip.
[4]: Supported by Arrow Library. You do not need to configure the compression parameter.
Unloading compression formatsโ
| File format | Compression format | Unloading method | ||||
|---|---|---|---|---|---|---|
| INSERT INTO FILES | INSERT INTO Catalog | EXPORT | ||||
| Hive | Iceberg | Hudi/Delta | ||||
| CSV | 
 | To be supported | To be supported | To be supported | To be supported | To be supported | 
| JSON | N/A | N/A | N/A | N/A | N/A | N/A | 
| Parquet | 
 | Yes (v3.2+) | Yes (v3.2+) | Yes (v3.2+) | To be supported | N/A | 
| ORC | ||||||
Credentialsโ
Loading - Authenticationโ
| Authentication | Loading method | ||||
|---|---|---|---|---|---|
| Stream Load | INSERT from FILES | Broker Load | Routine Load | External Catalog | |
| Single Kerberos | N/A | Yes (v3.1+) | Yes [1] (versions earlier than v2.5) | Yes [2] (v3.1.4+) | Yes | 
| Kerberos Ticket Granting Ticket (TGT) | N/A | To be supported | Yes (v3.1.10+/v3.2.1+) | ||
| Single KDC Multiple Kerberos | N/A | ||||
| Basic access authentications (Access Key pair, IAM Role) | N/A | Yes (HDFS and S3-compatible object storage) | Yes [3] | Yes | |
[1]: For HDFS, StarRocks supports both simple authentication and Kerberos authentication.
[2]: When the security protocol is set to sasl_plaintext or sasl_ssl, both SASL and GSSAPI (Kerberos) authentications are supported.
[3]: When the security protocol is set to sasl_plaintext or sasl_ssl, both SASL and PLAIN authentications are supported.
Unloading - Authenticationโ
| INSERT INTO FILES | EXPORT | |
|---|---|---|
| Single Kerberos | To be supported | To be supported | 
Loading - Other parameters and featuresโ
| Parameter and feature | Loading method | |||||||
|---|---|---|---|---|---|---|---|---|
| Stream Load | INSERT from FILES | INSERT from SELECT/VALUES | Broker Load | PIPE | Routine Load | Spark Load | ||
| partial_update | Yes (v3.0+) | Yes [1] (v3.3+) | Yes (v3.0+) | N/A | Yes (v3.0+) | To be supported | ||
| partial_update_mode | Yes (v3.1+) | To be supported | Yes (v3.1+) | N/A | To be supported | To be supported | ||
| COLUMNS FROM PATH | N/A | Yes (v3.2+) | N/A | Yes | N/A | N/A | Yes | |
| timezone or session variable time_zone [2] | Yes [3] | Yes [4] | Yes [4] | Yes [4] | To be supported | Yes [4] | To be supported | |
| Time accuracy - Microsecond | Yes | Yes | Yes | Yes (v3.1.11+/v3.2.6+) | To be supported | Yes | Yes | |
[1]: From v3.3 onwards, StarRocks supports Partial Updates in Row mode for INSERT INTO by specifying the column list.
[2]: Setting the time zone by the parameter or the session variable will affect the results returned by functions such as strftime(), alignment_timestamp(), and from_unixtime().
[3]: Only the parameter timezone is supported.
[4]: Only the session variable time_zone is supported.
Unloading - Other parameters and featuresโ
| Parameter and feature | INSERT INTO FILES | EXPORT | 
|---|---|---|
| target_max_file_size | Yes (v3.2+) | To be supported | 
| single | ||
| Partitioned_by | ||
| Session variable time_zone | To be supported | |
| Time accuracy - Microsecond | To be supported | To be supported |