StarRocks version 4.0
warning
Downgrade Notes
-
After upgrading StarRocks to v4.0, DO NOT downgrade it directly to v3.5.0 & v3.5.1, otherwise it will cause metadata incompatibility and FE crash. You must downgrade the cluster to v3.5.2 or later to prevent these issues.
-
Before downgrading clusters from v4.0.3 to v3.5.2~v3.5.10, execute the following statement:
SET GLOBAL enable_rewrite_simple_agg_to_meta_scan=false;
4.0.3β
Release Date: December 25, 2025
Improvementsβ
- Supports
ORDER BYclauses for STRUCT data types #66035 - Supports creating Iceberg views with properties and displaying properties in the output of
SHOW CREATE VIEW. #65938 - Supports altering Iceberg table partition specs using
ALTER TABLE ADD/DROP PARTITION COLUMN. #65922 - Supports
COUNT/SUM/AVG(DISTINCT)aggregation over framed windows (for example,ORDER BY/PARTITION BY) with optimization options. #65815 - Optimized CSV parsing performance by using
memchrfor single-character delimiters. #63715 - Added an optimizer rule to push down Partial TopN to the Pre-Aggregation phase to reduce network overhead. #61497
- Enhanced Data Cache monitoring
- Optimized Sort and Aggregation operators to support rapid memory release in OOM scenarios. #66157
- Added
TableSchemaServicein FE for shared-data clusters to allow CNs to fetch specific schemas on demand. #66142 - Optimized Fast Schema Evolution to retain history schemas until all dependent ingestion jobs are finished. #65799
- Enhanced
filterPartitionsByTTLto properly handle NULL partition values to prevent all partitions from being filtered. #65923 - Optimized
FusedMultiDistinctStateto clear the associated MemPool upon reset. #66073 - Made
ICEBERG_CATALOG_SECURITYproperty check case-insensitive in Iceberg REST Catalog. #66028 - Added HTTP endpoint
GET /service_idto retrieve StarOS Service ID in shared-data clusters. #65816 - Replaced deprecated
metadata.broker.listwithbootstrap.serversin Kafka consumer configurations. #65437 - Added FE configuration
lake_enable_fullvacuum(Default: false) to allow disabling the Full Vacuum Daemon. #66685 - Updated lz4 library to v1.10.0. #67080
Bug Fixesβ
The following issues have been fixed:
latest_cached_tablet_metadatacould cause versions to be incorrectly skipped during batch Publish. #66558- Potential issues caused by
ClusterSnapshotrelative checks inCatalogRecycleBinwhen running in shared-nothing clusters. #66501 - BE crash when writing complex data types (ARRAY/MAP/STRUCT) to Iceberg tables during Spill operations. #66209
- Potential hang in Connector Chunk Sink when the writer's initialization or initial write fails. #65951
- Connector Chunk Sink bug where
PartitionChunkWriterinitialization failure caused a null pointer dereference during close. #66097 - Setting a non-existent system variable would silently succeed instead of reporting an error. #66022
- Bundle metadata parsing failure when Data Cache is corrupted. #66021
- MetaScan returned NULL instead of 0 for count columns when the result is empty. #66010
SHOW VERBOSE RESOURCE GROUP ALLdisplays NULL instead ofdefault_mem_poolfor resource groups created in earlier versions. #65982- A
RuntimeExceptionduring query execution after disabling theflat_jsontable configuration. #65921 - Type mismatch issue in shared-data clusters caused by rewriting
min/maxstatistics to MetaScan after Schema Change. #65911 - BE crash caused by ranking window optimization when
PARTITION BYandORDER BYare missing. #67093 - Incorrect
can_use_bfcheck when merging runtime filters, which could lead to wrong results or crashes. #67062 - Pushing down runtime bitset filters into nested OR predicates causes incorrect results. #67061
- Potential data race and data loss issues caused by write or flush operations after the DeltaWriter has finished. #66966
- Execution error caused by mismatched nullable properties when rewriting simple aggregation to MetaScan. #67068
- Incorrect row count calculation in the MetaScan rewrite rule. #66967
- Versions might be incorrectly skipped during batch Publish due to inconsistent cached tablet metadata. #66575
- Improper error handling for memory allocation failures in HyperLogLog operations. #66827
4.0.2β
Release Date: December 4, 2025
New Featuresβ
- Introduced a new resource group attribute,
mem_pool, allowing multiple resource groups to share the same memory pool and enforce a joint memory limit for the pool. This feature is backward compatible.default_mem_poolis used ifmem_poolis not specified. #64112
Improvementsβ
- Reduced remote storage access during Vacuum after File Bundling is enabled. #65793
- The File Bundling feature caches the latest tablet metadata. #65640
- Improved safety and stability for long-string scenarios. #65433 #65148
- Optimized the
SplitTopNAggregateRulelogic to avoid performance regression. #65478 - Applied the Iceberg/DeltaLake table statistics collection strategy to other external data sources to avoid collecting statistics when the table is a single table. #65430
- Added Page Cache metrics to the Data Cache HTTP API
api/datacache/app_stat. #65341 - Supports ORC file splitting to enable parallel scanning of a single large ORC file. #65188
- Added selectivity estimation for IF predicates in the optimizer. #64962
- Supports constant evaluation of
hour,minute, andsecondforDATEandDATETIMEtypes in the FE. #64953 - Enabled rewrite of simple aggregation to MetaScan by default. #64698
- Improved multiple-replica assignment handling in shared-data clusters for enhanced reliability. #64245
- Exposes cache hit ratio in audit logs and metrics. #63964
- Estimates per-bucket distinct counts for histograms using HyperLogLog or sampling to provide more accurate NDV for predicates and joins. #58516
- Supports FULL OUTER JOIN USING with SQL-standard semantics. #65122
- Prints memory information when Optimizer times out for diagnostics. #65206
Bug Fixesβ
The following issues have been fixed:
- DECIMAL56
mod-related issue. #65795 - Issue related to Iceberg scan range handling. #65658
- MetaScan rewrite issues on temporary partitions and random buckets. #65617
JsonPathRewriteRuleuses the wrong table after transparent materialized view rewrite. #65597- Materialized view refresh failures when
partition_retention_conditionreferenced generated columns. #65575 - Iceberg min/max value typing issue. #65551
- Issue with queries against
information_schema.tablesandviewsacross different databases whenenable_evaluate_schema_scan_ruleis set totrue. #65533 - Integer overflow in JSON array comparison. #64981
- MySQL Reader does not support SSL. #65291
- ARM build issue caused by SVE build incompatibility. #65268
- Queries based on bucket-aware execution may get stuck for bucketed Iceberg tables. #65261
- Robust error propagation and memory safety issues for the lack of memory limit checks in OLAP table scan. #65131
Behavior Changesβ
- When a materialized view is inactivated, the system recursively inactivates its dependent materialized views. #65317
- Uses the original materialized view query SQL (including comments/formatting) when generating SHOW CREATE output. #64318
4.0.1β
Release Date: November 17, 2025
Improvementsβ
- Optimized TaskRun session variable handling to process known variables only. #64150
- Supports collecting statistics of Iceberg and Delta Lake tables from metadata by default. #64140
- Supports collecting statistics of Iceberg tables with bucket and truncate partition transform. #64122
- Supports inspecting FE
/procprofile for debugging. #63954 - Enhanced OAuth2 and JWT authentication support for Iceberg REST catalogs. #63882
- Improved bundle tablet metadata validation and recovery handling. #63949
- Improved scan-range memory estimation logic. #64158
Bug Fixesβ
The following issues have been fixed:
- Transaction logs were deleted when publishing bundle tablets. #64030
- The join algorithm cannot guarantee the sort property because, after joining, the sort property is not reset. #64086
- Issues related to transparent materialized view rewrite. #63962
Behavior Changesβ
- Added the property
enable_iceberg_table_cacheto Iceberg Catalogs to optionally disable Iceberg table cache and allow it always to read the latest data. #64082 - Ensured
INSERT ... SELECTreads the freshest metadata by refreshing external tables before planning. #64026 - Increased lock table slots to 256 and added
ridto slow-lock logs. #63945 - Temporarily disabled
shared_scandue to incompatibility with event-based scheduling. #63543 - Changed the default Hive Catalog cache TTL to 24 hours and removed unused parameters. #63459
- Automatically determine the Partial Update mode based on the session variable and the number of inserted columns. #62091
4.0.0β
Release date: October 17, 2025
Data Lake Analyticsβ
- Unified Page Cache and Data Cache for BE metadata, and adopted an adaptive strategy for scaling. #61640
- Optimized metadata file parsing for Iceberg statistics to avoid repetitive parsing. #59955
- Optimized COUNT/MIN/MAX queries against Iceberg metadata by efficiently skipping over data file scans, significantly improving aggregation query performance on large partitioned tables and reducing resource consumption. #60385
- Supports compaction for Iceberg tables via procedure
rewrite_data_files. - Supports Iceberg tables with hidden partitions, including creating, writing, and reading the tables. #58914
- Supports setting sort keys when creating Iceberg tables.
- Optimizes sink performance for Iceberg tables.
- Iceberg Sink supports spilling large operators, global shuffle, and local sorting to optimize memory usage and address small file issues. #61963
- Iceberg Sink optimizes local sorting based on Spill Partition Writer to improve write efficiency. #62096
- Iceberg Sink supports global shuffle for partitions to further reduce small files. #62123
- Enhanced bucket-aware execution for Iceberg tables to improve concurrency and distribution capabilities of bucketed tables. #61756
- Supports the TIME data type in the Paimon catalog. #58292
- Upgraded Iceberg version to 1.10.0. #63667
Security and Authenticationβ
- In scenarios where JWT authentication and the Iceberg REST Catalog are used, StarRocks supports the passthrough of user login information to Iceberg via the REST Session Catalog for subsequent data access authentication. #59611 #58850
- Supports vended credentials for the Iceberg catalog.
- Supports granting StarRocks internal roles to external groups obtained via Group Provider. #63385 #63258
- Added REFRESH privilege to external tables to control the permission to refresh them. #63385
Storage Optimization and Cluster Managementβ
- Introduced β―the File Bundling optimization for the cloud-native table in shared-data clusters to automatically bundle the data files generated by loading, Compaction, or Publish operations, thereby reducing the API cost caused by high-frequency access to the external storage system. File Bundling is enabled by default for tables created in v4.0 or later. #58316
- Supports Multi-Table Write-Write Transaction to allow users to control the atomic submission of INSERT, UPDATE, and DELETE operations. The transaction supports Stream Load and INSERT INTO interfaces, effectively guaranteeing cross-table consistency in ETL and real-time write scenarios. #61362
- Supports Kafka 4.0 for Routine Load.
- Supports full-text inverted indexes on Primary Key tables in shared-nothing clusters.
- Supports modifying aggregate keys of Aggregate tables. #62253
- Supports enabling case-insensitive processing on names of catalogs, databases, tables, views, and materialized views. #61136
- Supports blacklisting Compute Nodes in shared-data clusters. #60830
- Supports global connection ID. #57256
- Added the
recyclebin_catalogsmetadata view to Information Schema to display recoverable deleted metadata. #51007
Query and Performance Improvementβ
- Supports DECIMAL256 data type, expanding the upper limit of precision from 38 to 76 bits. Its 256-bit storage provides better adaptability to high-precision financial and scientific computing scenarios, effectively mitigating DECIMAL128's precision overflow problem in very large aggregations and high-order operations. #59645
- Improved the performance for basic operators.#61691 #61632 #62585 #61405 #61429
- Optimized the performance of the JOIN and AGG operators. #61691
- [Preview] Introduced SQL Plan Manager to allow users to bind a query plan to a query, thereby preventing the query plan from changing due to system state changes (mainly data updates and statistics updates), thus stabilizing query performance. #56310
- Introduced Partition-wise Spillable Aggregate/Distinct operators to replace the original Spill implementation based on sorted aggregation, significantly improving aggregation performance and reducing read/write overhead in complex and high-cardinality GROUP BY scenarios. #60216
- Flat JSON V2:
- Supports configuring Flat JSON on the table level. #57379
- Enhance JSON columnar storage by retaining the V1 mechanism while adding page- and segment-level indexes (ZoneMaps, Bloom filters), predicate pushdown with late materialization, dictionary encoding, and integration of a low-cardinality global dictionary to significantly boost execution efficiency. #60953
- Supports an adaptive ZoneMap index creation strategy for the STRING data type. #61960
- Enhanced query observability:
- Optimized EXPLAIN ANALYZE output to display the execution metrics by group and by operator for better readability. #63326
QueryDetailActionV2andQueryProfileActionV2now support JSON format, enhancing cross-FE query capabilities. #63235- Supports retrieving Query Profile information across all FEs. #61345
- SHOW PROCESSLIST statements display Catalog, Query ID, and other information. #62552
- Enhanced query queue and process monitoring, supporting display of Running/Pending statuses.#62261
- Materialized view rewrites consider the distribution and sort keys of the original table, improving the selection of optimal materialized views. #62830
Functions and SQL Syntaxβ
- Added the following functions:
- Provides the following syntactic extensions:
Behavior Changesβ
- Adjust the logic of the materialized view parameter
auto_partition_refresh_numberto limit the number of partitions to refresh regardless of auto refresh or manual refresh. #62301 - Flat JSON is enabled by default. #62097
- The default value of the system variable
enable_materialized_view_agg_pushdown_rewriteis set totrue, indicating that aggregation pushdown for materialized view query rewrite is enabled by default. #60976 - Changed the type of some columns in
information_schema.materialized_viewsto better align with the corresponding data. #60054 - The
split_partfunction returns NULL when the delimiter is not matched. #56967 - Use STRING to replace fixed-length CHAR in CTAS/CREATE MATERIALIZED VIEW to avoid deducing the wrong column length, which may cause materialized view refresh failures. #63114 #62476
- Data Cache-related configurations are simplified. #61640
datacache_mem_sizeanddatacache_disk_sizeare now effective.storage_page_cache_limit,block_cache_mem_size,block_cache_disk_sizeare deprecated.
- Added new catalog properties (
remote_file_cache_memory_ratiofor Hive, andiceberg_data_file_cache_memory_usage_ratioandiceberg_delete_file_cache_memory_usage_ratiofor Iceberg) to limit the memory resources used for Hive and Iceberg metadata cache, and set the default values to0.1(10%). Adjust the metadata cache TTL to 24 hours. #63459 #63373 #61966 #62288 - SHOW DATA DISTRIBUTION now will not merge the statistics of all materialized indexes with the same bucket sequence number. It only shows data distribution at the materialized index level. #59656
- The default bucket size for automatic bucket tables is changed from 4GB to 1GB to improve performance and resource utilization. #63168
- The system determines the Partial Update mode based on the corresponding session variable and the number of columns in the INSERT statement. #62091
- Optimized the
fe_tablet_schedulesview in the Information Schema. #62073 #59813- Renamed the
TABLET_STATUScolumn toSCHEDULE_REASON, theCLONE_SRCcolumn toSRC_BE_ID, and theCLONE_DESTcolumn toDEST_BE_ID. - The data types of the
CREATE_TIME,SCHEDULE_TIMEandFINISH_TIMEcolumns have been changed fromDOUBLEtoDATETIME.
- Renamed the
- The
is_leaderlabel has been added to some FE metrics. #63004 - Shared-data clusters using Microsoft Azure Blob Storage and Data Lake Storage Gen 2 as object storage will experience Data Cache failure after being upgraded to v4.0. The system will automatically reload the cache.