Troubleshooting asynchronous materialized views
This topic describes how to examine your asynchronous materialized views and solve the problems you have encountered while working with them.
CAUTION
Some of the features shown below are only supported from StarRocks v3.1 onwards.
Examine an asynchronous materialized view
To get a whole picture of asynchronous materialized views that you are working with, you can first check their working state, refresh history, and resource consumption.
Check the working state of an asynchronous materialized view
You can check the working state of an asynchronous materialized view using SHOW MATERIALIZED VIEWS. Among all the information returned, you can focus on the following fields:
is_active
: Whether the state of the materialized view is active. Only an active materialized view can be used for query acceleration and rewrite.last_refresh_state
: The state of the last refresh, including PENDING, RUNNING, FAILED, and SUCCESS.last_refresh_error_message
: The reason why the last refresh failed (if the materialized view state is not active).rows
: The number of data rows in the materialized view. Please note that this value can be different from the actual row count of the materialized view because the updates can be deferred.
For detailed information on other fields returned, see SHOW MATERIALIZED VIEWS - Returns.
Example:
MySQL > SHOW MATERIALIZED VIEWS LIKE 'mv_pred_2'\G
***************************[ 1. row ]***************************
id | 112517
database_name | ssb_1g
name | mv_pred_2
refresh_type | ASYNC
is_active | true
inactive_reason | <null>
partition_type | UNPARTITIONED
task_id | 457930
task_name | mv-112517
last_refresh_start_time | 2023-08-04 16:46:50
last_refresh_finished_time | 2023-08-04 16:46:54
last_refresh_duration | 3.996
last_refresh_state | SUCCESS
last_refresh_force_refresh | false
last_refresh_start_partition |
last_refresh_end_partition |
last_refresh_base_refresh_partitions | {}
last_refresh_mv_refresh_partitions |
last_refresh_error_code | 0
last_refresh_error_message |
rows | 0
text | CREATE MATERIALIZED VIEW `mv_pred_2` (`lo_quantity`, `lo_revenue`, `sum`)
DISTRIBUTED BY HASH(`lo_quantity`, `lo_revenue`) BUCKETS 2
REFRESH ASYNC
PROPERTIES (
"replication_num" = "3",
"storage_medium" = "HDD"
)
AS SELECT `lineorder`.`lo_quantity`, `lineorder`.`lo_revenue`, sum(`lineorder`.`lo_tax`) AS `sum`
FROM `ssb_1g`.`lineorder`
WHERE `lineorder`.`lo_linenumber` = 1
GROUP BY 1, 2;
1 row in set
Time: 0.003s
View the refresh history of an asynchronous materialized view
You can view the refresh history of an asynchronous materialized view by querying the table task_runs
in the database information_schema
. Among all the information returned, you can focus on the following fields:
CREATE_TIME
andFINISH_TIME
: The start and end time of the refresh task.STATE
: The state of the refresh task, including PENDING, RUNNING, FAILED, and SUCCESS.ERROR_MESSAGE
: The reason why the refresh task failed.
Example:
MySQL > SELECT * FROM information_schema.task_runs WHERE task_name ='mv-112517' \G
***************************[ 1. row ]***************************
QUERY_ID | 7434cee5-32a3-11ee-b73a-8e20563011de
TASK_NAME | mv-112517
CREATE_TIME | 2023-08-04 16:46:50
FINISH_TIME | 2023-08-04 16:46:54
STATE | SUCCESS
DATABASE | ssb_1g
EXPIRE_TIME | 2023-08-05 16:46:50
ERROR_CODE | 0
ERROR_MESSAGE | <null>
PROGRESS | 100%
EXTRA_MESSAGE | {"forceRefresh":false,"mvPartitionsToRefresh":[],"refBasePartitionsToRefreshMap":{},"basePartitionsToRefreshMap":{}}
PROPERTIES | {"FORCE":"false"}
***************************[ 2. row ]***************************
QUERY_ID | 72dd2f16-32a3-11ee-b73a-8e20563011de
TASK_NAME | mv-112517
CREATE_TIME | 2023-08-04 16:46:48
FINISH_TIME | 2023-08-04 16:46:53
STATE | SUCCESS
DATABASE | ssb_1g
EXPIRE_TIME | 2023-08-05 16:46:48
ERROR_CODE | 0
ERROR_MESSAGE | <null>
PROGRESS | 100%
EXTRA_MESSAGE | {"forceRefresh":true,"mvPartitionsToRefresh":["mv_pred_2"],"refBasePartitionsToRefreshMap":{},"basePartitionsToRefreshMap":{"lineorder":["lineorder"]}}
PROPERTIES | {"FORCE":"true"}
Monitor the resource consumption of an asynchronous materialized view
You can monitor and analyze the resource consumed by an asynchronous materialized view during and after the refresh.
Monitor resource consumption during refresh
During a refresh task, you can monitor its real-time resource consumption using SHOW PROC '/current_queries'.
Among all the information returned, you can focus on the following fields:
ScanBytes
: The size of data that is scanned.ScanRows
: The number of data rows that are scanned.MemoryUsage
: The size of memory that is used.CPUTime
: CPU time cost.ExecTime
: The execution time of the query.
Example:
MySQL > SHOW PROC '/current_queries'\G
***************************[ 1. row ]***************************
StartTime | 2023-08-04 17:01:30
QueryId | 806eed7d-32a5-11ee-b73a-8e20563011de
ConnectionId | 0
Database | ssb_1g
User | root
ScanBytes | 70.981 MB
ScanRows | 6001215 rows
MemoryUsage | 73.748 MB
DiskSpillSize | 0.000
CPUTime | 2.515 s
ExecTime | 2.583 s
Analyze resource consumption after refresh
After a refresh task, you can analyze its resource consumption via query profiles.
While an asynchronous materialized view refreshes itself, an INSERT OVERWRITE statement is executed. You can check the corresponding query profile to analyze the time and resources consumed by the refresh task.
Among all the information returned, you can focus on the following metrics:
Total
: Total time consumed by the query.QueryCpuCost
: Total CPU time cost of the query. CPU time costs are aggregated for concurrent processes. As a result, the value of this metric may be greater than the actual execution time of the query.QueryMemCost
: Total memory cost of the query.- Other metrics for individual operators, such as join operators and aggregate operators.
For detailed information on how to check the query profile and understand other metrics, see Analyze query profile.
Verify whether queries are rewritten by an asynchronous materialized view
You can check whether a query can be rewritten with an asynchronous materialized view from its query plan using EXPLAIN.
If the metric SCAN
in the query plan shows the name of the corresponding materialized view, the query has been rewritten by the materialized view.
Example 1:
MySQL > SHOW CREATE TABLE mv_agg\G
***************************[ 1. row ]***************************
Materialized View | mv_agg
Create Materialized View | CREATE MATERIALIZED VIEW `mv_agg` (`c_custkey`)
DISTRIBUTED BY RANDOM
REFRESH ASYNC
PROPERTIES (
"replication_num" = "3",
"replicated_storage" = "true",
"storage_medium" = "HDD"
)
AS SELECT `customer`.`c_custkey`
FROM `ssb_1g`.`customer`
GROUP BY `customer`.`c_custkey`;
MySQL > EXPLAIN LOGICAL SELECT `customer`.`c_custkey`
-> FROM `ssb_1g`.`customer`
-> GROUP BY `customer`.`c_custkey`;
+-----------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------+
| - Output => [1:c_custkey] |
| - SCAN [mv_agg] => [1:c_custkey] |
| Estimates: {row: 30000, cpu: ?, memory: ?, network: ?, cost: 15000.0} |
| partitionRatio: 1/1, tabletRatio: 12/12 |
| 1:c_custkey := 10:c_custkey |
+-----------------------------------------------------------------------------------+
If you disable the query rewrite feature, StarRocks adopts the regular query plan.
Example 2:
MySQL > SET enable_materialized_view_rewrite = false;
MySQL > EXPLAIN LOGICAL SELECT `customer`.`c_custkey`
-> FROM `ssb_1g`.`customer`
-> GROUP BY `customer`.`c_custkey`;
+---------------------------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------------------------+
| - Output => [1:c_custkey] |
| - AGGREGATE(GLOBAL) [1:c_custkey] |
| Estimates: {row: 15000, cpu: ?, memory: ?, network: ?, cost: 120000.0} |
| - SCAN [mv_bitmap] => [1:c_custkey] |
| Estimates: {row: 60000, cpu: ?, memory: ?, network: ?, cost: 30000.0} |
| partitionRatio: 1/1, tabletRatio: 12/12 |
+---------------------------------------------------------------------------------------+
Diagnose and solve problems
Here we list some common problems you might encounter while working with an asynchronous materialized view, and the corresponding solutions.
Materialized view building failure
If you failed to create an asynchronous materialized view, that is, the CREATE MATERIALIZED VIEW statement cannot be executed, you can look into the following aspects:
-
Check whether you have mistakenly used the SQL statement for synchronous materialized views.
StarRocks provides two different materialized views: the synchronous materialized view and the asynchronous materialized view.
The basic SQL statement used to create a synchronous materialized view is as follows:
CREATE MATERIALIZED VIEW <mv_name>
AS <query>However, the SQL statement used to create an asynchronous materialized view contains more parameters:
CREATE MATERIALIZED VIEW <mv_name>
REFRESH ASYNC -- The refresh strategy of the asynchronous materialized view.
DISTRIBUTED BY HASH(<column>) -- The data distribution strategy of the asynchronous materialized view.
AS <query>In addition to the SQL statement, the main difference between the two materialized views is that asynchronous materialized views support all query syntax that StarRocks provides, but synchronous materialized views only support limited choices of aggregate functions.
-
Check whether you have specified a correct partition column.
When creating an asynchronous materialized view, you can specify a partitioning strategy, which allows you to refresh the materialized view on a finer granularity level.
Currently, StarRocks only supports range partitioning, and only supports referencing a single column from the SELECT expression in the query statement that is used to build the materialized view. You can use the date_trunc() function to truncate the column to change the granularity level of the partitioning strategy. Please note that any other expressions are not supported.
-
Check whether you have the necessary privileges to create the materialized view.
When creating an asynchronous materialized view, you need the SELECT privileges of all objects (tables, views, materialized views) that are queried. When UDFs are used in the query, you also need the USAGE privileges of the functions.
Materialized view refresh failure
If the materialized view fails to refresh, that is, the state of the refresh task is not SUCCESS, you can look into the following aspects:
-
Check whether you have adopted an inappropriate refresh strategy.
By default, the materialized view refreshes immediately after it is created. However, in v2.5 and early versions, materialized views that adopt the MANUAL refresh strategy do not refresh after being created. You must refresh it manually using REFRESH MATERIALIZED VIEW.
-
Check whether the refresh task exceeds the memory limit.
Usually, when an asynchronous materialized view involves large-scale aggregation or join calculations that exhaust the memory resources. To solve this problem, you can:
- Specify a partitioning strategy for the materialized view to refresh one partition each time.
- Enable the Spill to Disk feature for the refresh task. From v3.1 onwards, StarRocks supports spilling the intermediate results to disks when refreshing a materialized view. Execute the following statement to enable Spill to Disk:
-- Define the properties when creating the materialized view.
CREATE MATERIALIZED VIEW mv1
REFRESH ASYNC
PROPERTIES ( 'session.enable_spill'='true' )
AS <query>;
-- Add the properties to an existing materialized view.
ALTER MATERIALIZED VIEW mv2 SET ('session.enable_spill' = 'true');