Skip to main content

StarRocks version 4.0

warning

Downgrade Notes

  • After upgrading StarRocks to v4.0, DO NOT downgrade it directly to v3.5.0 & v3.5.1, otherwise it will cause metadata incompatibility and FE crash. You must downgrade the cluster to v3.5.2 or later to prevent these issues.

  • Before downgrading clusters from v4.0.3 to v3.5.2~v3.5.10, execute the following statement:

    SET GLOBAL enable_rewrite_simple_agg_to_meta_scan=false;

4.0.3​

Release Date: December 25, 2025

Improvements​

  • Supports ORDER BY clauses for STRUCT data types #66035
  • Supports creating Iceberg views with properties and displaying properties in the output of SHOW CREATE VIEW. #65938
  • Supports altering Iceberg table partition specs using ALTER TABLE ADD/DROP PARTITION COLUMN. #65922
  • Supports COUNT/SUM/AVG(DISTINCT) aggregation over framed windows (for example, ORDER BY/PARTITION BY) with optimization options. #65815
  • Optimized CSV parsing performance by using memchr for single-character delimiters. #63715
  • Added an optimizer rule to push down Partial TopN to the Pre-Aggregation phase to reduce network overhead. #61497
  • Enhanced Data Cache monitoring
    • Added new metrics for memory/disk quota and usage. #66168
    • Added Page Cache statistics to the api/datacache/stat HTTP endpoint. #66240
    • Added hit rate statistics for native tables. #66198
  • Optimized Sort and Aggregation operators to support rapid memory release in OOM scenarios. #66157
  • Added TableSchemaService in FE for shared-data clusters to allow CNs to fetch specific schemas on demand. #66142
  • Optimized Fast Schema Evolution to retain history schemas until all dependent ingestion jobs are finished. #65799
  • Enhanced filterPartitionsByTTL to properly handle NULL partition values to prevent all partitions from being filtered. #65923
  • Optimized FusedMultiDistinctState to clear the associated MemPool upon reset. #66073
  • Made ICEBERG_CATALOG_SECURITY property check case-insensitive in Iceberg REST Catalog. #66028
  • Added HTTP endpoint GET /service_id to retrieve StarOS Service ID in shared-data clusters. #65816
  • Replaced deprecated metadata.broker.list with bootstrap.servers in Kafka consumer configurations. #65437
  • Added FE configuration lake_enable_fullvacuum (Default: false) to allow disabling the Full Vacuum Daemon. #66685
  • Updated lz4 library to v1.10.0. #67080

Bug Fixes​

The following issues have been fixed:

  • latest_cached_tablet_metadata could cause versions to be incorrectly skipped during batch Publish. #66558
  • Potential issues caused by ClusterSnapshot relative checks in CatalogRecycleBin when running in shared-nothing clusters. #66501
  • BE crash when writing complex data types (ARRAY/MAP/STRUCT) to Iceberg tables during Spill operations. #66209
  • Potential hang in Connector Chunk Sink when the writer's initialization or initial write fails. #65951
  • Connector Chunk Sink bug where PartitionChunkWriter initialization failure caused a null pointer dereference during close. #66097
  • Setting a non-existent system variable would silently succeed instead of reporting an error. #66022
  • Bundle metadata parsing failure when Data Cache is corrupted. #66021
  • MetaScan returned NULL instead of 0 for count columns when the result is empty. #66010
  • SHOW VERBOSE RESOURCE GROUP ALL displays NULL instead of default_mem_pool for resource groups created in earlier versions. #65982
  • A RuntimeException during query execution after disabling the flat_json table configuration. #65921
  • Type mismatch issue in shared-data clusters caused by rewriting min/max statistics to MetaScan after Schema Change. #65911
  • BE crash caused by ranking window optimization when PARTITION BY and ORDER BY are missing. #67093
  • Incorrect can_use_bf check when merging runtime filters, which could lead to wrong results or crashes. #67062
  • Pushing down runtime bitset filters into nested OR predicates causes incorrect results. #67061
  • Potential data race and data loss issues caused by write or flush operations after the DeltaWriter has finished. #66966
  • Execution error caused by mismatched nullable properties when rewriting simple aggregation to MetaScan. #67068
  • Incorrect row count calculation in the MetaScan rewrite rule. #66967
  • Versions might be incorrectly skipped during batch Publish due to inconsistent cached tablet metadata. #66575
  • Improper error handling for memory allocation failures in HyperLogLog operations. #66827

4.0.2​

Release Date: December 4, 2025

New Features​

  • Introduced a new resource group attribute, mem_pool, allowing multiple resource groups to share the same memory pool and enforce a joint memory limit for the pool. This feature is backward compatible. default_mem_pool is used if mem_pool is not specified. #64112

Improvements​

  • Reduced remote storage access during Vacuum after File Bundling is enabled. #65793
  • The File Bundling feature caches the latest tablet metadata. #65640
  • Improved safety and stability for long-string scenarios. #65433 #65148
  • Optimized the SplitTopNAggregateRule logic to avoid performance regression. #65478
  • Applied the Iceberg/DeltaLake table statistics collection strategy to other external data sources to avoid collecting statistics when the table is a single table. #65430
  • Added Page Cache metrics to the Data Cache HTTP API api/datacache/app_stat. #65341
  • Supports ORC file splitting to enable parallel scanning of a single large ORC file. #65188
  • Added selectivity estimation for IF predicates in the optimizer. #64962
  • Supports constant evaluation of hour, minute, and second for DATE and DATETIME types in the FE. #64953
  • Enabled rewrite of simple aggregation to MetaScan by default. #64698
  • Improved multiple-replica assignment handling in shared-data clusters for enhanced reliability. #64245
  • Exposes cache hit ratio in audit logs and metrics. #63964
  • Estimates per-bucket distinct counts for histograms using HyperLogLog or sampling to provide more accurate NDV for predicates and joins. #58516
  • Supports FULL OUTER JOIN USING with SQL-standard semantics. #65122
  • Prints memory information when Optimizer times out for diagnostics. #65206

Bug Fixes​

The following issues have been fixed:

  • DECIMAL56 mod-related issue. #65795
  • Issue related to Iceberg scan range handling. #65658
  • MetaScan rewrite issues on temporary partitions and random buckets. #65617
  • JsonPathRewriteRule uses the wrong table after transparent materialized view rewrite. #65597
  • Materialized view refresh failures when partition_retention_condition referenced generated columns. #65575
  • Iceberg min/max value typing issue. #65551
  • Issue with queries against information_schema.tables and views across different databases when enable_evaluate_schema_scan_rule is set to true. #65533
  • Integer overflow in JSON array comparison. #64981
  • MySQL Reader does not support SSL. #65291
  • ARM build issue caused by SVE build incompatibility. #65268
  • Queries based on bucket-aware execution may get stuck for bucketed Iceberg tables. #65261
  • Robust error propagation and memory safety issues for the lack of memory limit checks in OLAP table scan. #65131

Behavior Changes​

  • When a materialized view is inactivated, the system recursively inactivates its dependent materialized views. #65317
  • Uses the original materialized view query SQL (including comments/formatting) when generating SHOW CREATE output. #64318

4.0.1​

Release Date: November 17, 2025

Improvements​

  • Optimized TaskRun session variable handling to process known variables only. #64150
  • Supports collecting statistics of Iceberg and Delta Lake tables from metadata by default. #64140
  • Supports collecting statistics of Iceberg tables with bucket and truncate partition transform. #64122
  • Supports inspecting FE /proc profile for debugging. #63954
  • Enhanced OAuth2 and JWT authentication support for Iceberg REST catalogs. #63882
  • Improved bundle tablet metadata validation and recovery handling. #63949
  • Improved scan-range memory estimation logic. #64158

Bug Fixes​

The following issues have been fixed:

  • Transaction logs were deleted when publishing bundle tablets. #64030
  • The join algorithm cannot guarantee the sort property because, after joining, the sort property is not reset. #64086
  • Issues related to transparent materialized view rewrite. #63962

Behavior Changes​

  • Added the property enable_iceberg_table_cache to Iceberg Catalogs to optionally disable Iceberg table cache and allow it always to read the latest data. #64082
  • Ensured INSERT ... SELECT reads the freshest metadata by refreshing external tables before planning. #64026
  • Increased lock table slots to 256 and added rid to slow-lock logs. #63945
  • Temporarily disabled shared_scan due to incompatibility with event-based scheduling. #63543
  • Changed the default Hive Catalog cache TTL to 24 hours and removed unused parameters. #63459
  • Automatically determine the Partial Update mode based on the session variable and the number of inserted columns. #62091

4.0.0​

Release date: October 17, 2025

Data Lake Analytics​

  • Unified Page Cache and Data Cache for BE metadata, and adopted an adaptive strategy for scaling. #61640
  • Optimized metadata file parsing for Iceberg statistics to avoid repetitive parsing. #59955
  • Optimized COUNT/MIN/MAX queries against Iceberg metadata by efficiently skipping over data file scans, significantly improving aggregation query performance on large partitioned tables and reducing resource consumption. #60385
  • Supports compaction for Iceberg tables via procedure rewrite_data_files.
  • Supports Iceberg tables with hidden partitions, including creating, writing, and reading the tables. #58914
  • Supports setting sort keys when creating Iceberg tables.
  • Optimizes sink performance for Iceberg tables.
    • Iceberg Sink supports spilling large operators, global shuffle, and local sorting to optimize memory usage and address small file issues. #61963
    • Iceberg Sink optimizes local sorting based on Spill Partition Writer to improve write efficiency. #62096
    • Iceberg Sink supports global shuffle for partitions to further reduce small files. #62123
  • Enhanced bucket-aware execution for Iceberg tables to improve concurrency and distribution capabilities of bucketed tables. #61756
  • Supports the TIME data type in the Paimon catalog. #58292
  • Upgraded Iceberg version to 1.10.0. #63667

Security and Authentication​

  • In scenarios where JWT authentication and the Iceberg REST Catalog are used, StarRocks supports the passthrough of user login information to Iceberg via the REST Session Catalog for subsequent data access authentication. #59611 #58850
  • Supports vended credentials for the Iceberg catalog.
  • Supports granting StarRocks internal roles to external groups obtained via Group Provider. #63385 #63258
  • Added REFRESH privilege to external tables to control the permission to refresh them. #63385

Storage Optimization and Cluster Management​

  • Introduced β€―the File Bundling optimization for the cloud-native table in shared-data clusters to automatically bundle the data files generated by loading, Compaction, or Publish operations, thereby reducing the API cost caused by high-frequency access to the external storage system. File Bundling is enabled by default for tables created in v4.0 or later. #58316
  • Supports Multi-Table Write-Write Transaction to allow users to control the atomic submission of INSERT, UPDATE, and DELETE operations. The transaction supports Stream Load and INSERT INTO interfaces, effectively guaranteeing cross-table consistency in ETL and real-time write scenarios. #61362
  • Supports Kafka 4.0 for Routine Load.
  • Supports full-text inverted indexes on Primary Key tables in shared-nothing clusters.
  • Supports modifying aggregate keys of Aggregate tables. #62253
  • Supports enabling case-insensitive processing on names of catalogs, databases, tables, views, and materialized views. #61136
  • Supports blacklisting Compute Nodes in shared-data clusters. #60830
  • Supports global connection ID. #57256
  • Added the recyclebin_catalogs metadata view to Information Schema to display recoverable deleted metadata. #51007

Query and Performance Improvement​

  • Supports DECIMAL256 data type, expanding the upper limit of precision from 38 to 76 bits. Its 256-bit storage provides better adaptability to high-precision financial and scientific computing scenarios, effectively mitigating DECIMAL128's precision overflow problem in very large aggregations and high-order operations. #59645
  • Improved the performance for basic operators.#61691 #61632 #62585 #61405 #61429
  • Optimized the performance of the JOIN and AGG operators. #61691
  • [Preview] Introduced SQL Plan Manager to allow users to bind a query plan to a query, thereby preventing the query plan from changing due to system state changes (mainly data updates and statistics updates), thus stabilizing query performance. #56310
  • Introduced Partition-wise Spillable Aggregate/Distinct operators to replace the original Spill implementation based on sorted aggregation, significantly improving aggregation performance and reducing read/write overhead in complex and high-cardinality GROUP BY scenarios. #60216
  • Flat JSON V2:
    • Supports configuring Flat JSON on the table level. #57379
    • Enhance JSON columnar storage by retaining the V1 mechanism while adding page- and segment-level indexes (ZoneMaps, Bloom filters), predicate pushdown with late materialization, dictionary encoding, and integration of a low-cardinality global dictionary to significantly boost execution efficiency. #60953
  • Supports an adaptive ZoneMap index creation strategy for the STRING data type. #61960
  • Enhanced query observability:
    • Optimized EXPLAIN ANALYZE output to display the execution metrics by group and by operator for better readability. #63326
    • QueryDetailActionV2 and QueryProfileActionV2 now support JSON format, enhancing cross-FE query capabilities. #63235
    • Supports retrieving Query Profile information across all FEs. #61345
    • SHOW PROCESSLIST statements display Catalog, Query ID, and other information. #62552
    • Enhanced query queue and process monitoring, supporting display of Running/Pending statuses.#62261
  • Materialized view rewrites consider the distribution and sort keys of the original table, improving the selection of optimal materialized views. #62830

Functions and SQL Syntax​

  • Added the following functions:
  • Provides the following syntactic extensions:
    • Supports IF NOT EXISTS keywords in CREATE ANALYZE FULL TABLE. #59789
    • Supports EXCLUDE clauses in SELECT. #57411
    • Supports FILTER clauses in aggregate functions, improving readability and execution efficiency of conditional aggregations. #58937

Behavior Changes​

  • Adjust the logic of the materialized view parameter auto_partition_refresh_number to limit the number of partitions to refresh regardless of auto refresh or manual refresh. #62301
  • Flat JSON is enabled by default. #62097
  • The default value of the system variable enable_materialized_view_agg_pushdown_rewrite is set to true, indicating that aggregation pushdown for materialized view query rewrite is enabled by default. #60976
  • Changed the type of some columns in information_schema.materialized_views to better align with the corresponding data. #60054
  • The split_part function returns NULL when the delimiter is not matched. #56967
  • Use STRING to replace fixed-length CHAR in CTAS/CREATE MATERIALIZED VIEW to avoid deducing the wrong column length, which may cause materialized view refresh failures. #63114 #62476
  • Data Cache-related configurations are simplified. #61640
    • datacache_mem_size and datacache_disk_size are now effective.
    • storage_page_cache_limit, block_cache_mem_size, block_cache_disk_size are deprecated.
  • Added new catalog properties (remote_file_cache_memory_ratio for Hive, and iceberg_data_file_cache_memory_usage_ratio and iceberg_delete_file_cache_memory_usage_ratio for Iceberg) to limit the memory resources used for Hive and Iceberg metadata cache, and set the default values to 0.1 (10%). Adjust the metadata cache TTL to 24 hours. #63459 #63373 #61966 #62288
  • SHOW DATA DISTRIBUTION now will not merge the statistics of all materialized indexes with the same bucket sequence number. It only shows data distribution at the materialized index level. #59656
  • The default bucket size for automatic bucket tables is changed from 4GB to 1GB to improve performance and resource utilization. #63168
  • The system determines the Partial Update mode based on the corresponding session variable and the number of columns in the INSERT statement. #62091
  • Optimized the fe_tablet_schedules view in the Information Schema. #62073 #59813
    • Renamed the TABLET_STATUS column to SCHEDULE_REASON, the CLONE_SRC column to SRC_BE_ID, and the CLONE_DEST column to DEST_BE_ID.
    • The data types of the CREATE_TIME, SCHEDULE_TIME and FINISH_TIME columns have been changed from DOUBLE to DATETIME.
  • The is_leader label has been added to some FE metrics. #63004
  • Shared-data clusters using Microsoft Azure Blob Storage and Data Lake Storage Gen 2 as object storage will experience Data Cache failure after being upgraded to v4.0. The system will automatically reload the cache.
Rocky the happy otterStarRocks Assistant

AI generated answers are based on docs and other sources. Please test answers in non-production environments.