File external table
File external table is a special type of external table. It allows you to directly query Parquet and ORC data files in external storage systems without loading data into StarRocks. In addition, file external tables do not rely on a metastore. In the current version, StarRocks supports the following external storage systems: HDFS, Amazon S3, and other S3-compatible storage systems.
This feature is supported from StarRocks v2.5.
- From v3.1 onwards, StarRocks supports directly loading data from files on cloud storage using the INSERT command and the FILES function, thereby you do not need to create an external catalog or file external table first. Besides, FILES() can automatically infer the table schema of the files, greatly simplifying the process of data loading.
- The File External Table feature was designed to help with loading data into StarRocks, NOT to perform efficient queries against external systems as a normal operation. A more performant solution would be to load the data into StarRocks.
Limitsβ
- File external tables must be created in databases within the default_catalog. You can run SHOW CATALOGS to query catalogs created in the cluster.
- Only Parquet, ORC, Avro, RCFile, and SequenceFile data files are supported.
- You can only use file external tables to query data in the target data file. Data write operations such as INSERT, DELETE, and DROP are not supported.
Prerequisitesβ
Before you create a file external table, you must configure your StarRocks cluster so that StarRocks can access the external storage system where the target data file is stored. The configurations required for a file external table are the same as those required for a Hive catalog, except that you do not need to configure a metastore. See Hive catalog - Integration preparations for more information about configurations.
Create a database (Optional)β
After connecting to your StarRocks cluster, you can create a file external table in an existing database or create a new database to manage file external tables. To query existing databases in the cluster, run SHOW DATABASES. Then you can run USE <db_name> to switch to the target database.
The syntax for creating a database is as follows.
CREATE DATABASE [IF NOT EXISTS] <db_name>
Create a file external tableβ
After accessing the target database, you can create a file external table in this database.
Syntaxβ
CREATE EXTERNAL TABLE <table_name>
(
<col_name> <col_type> [NULL | NOT NULL] [COMMENT "<comment>"]
)
ENGINE=file
COMMENT ["comment"]
PROPERTIES
(
FileLayoutParams,
StorageCredentialParams
)
Parametersβ
| Parameter | Required | Description |
|---|---|---|
| table_name | Yes | The name of the file external table. The naming conventions are as follows:
|
| col_name | Yes | The column name in the file external table. The column names in the file external table must be the same as those in the target data file but are not case-sensitive. The order of columns in the file external table can be different from that in the target data file. |
| col_type | Yes | The column type in the file external table. You need to specify this parameter based on the column type in the target data file. For more information, see Mapping of column types. |
| NULL | NOT NULL | No | Whether the column in the file external table is allowed to be NULL.
|
| comment | No | The comment of column in the file external table. |
| ENGINE | Yes | The type of engine. Set the value to file. |
| comment | No | The description of the file external table. |
| PROPERTIES | Yes |
|