SSB Flat-table Benchmarking
Star schema benchmark (SSB) is designed to test basic performance metrics of OLAP database products. SSB uses a star schema test set that is widely applied in academia and industry. For more information, see the paper Star Schema Benchmark. ClickHouse flattens the star schema into a wide flat table and rewrites the SSB into a single-table benchmark. For more information, see Star schema benchmark of ClickHouse This test compares the performance of StarRocks, Apache Druid, and ClickHouse against SSB single-table datasets.
Test conclusion
- Among the 13 queries performed on SSB standard datasets, StarRocks has an overall query performance 2.1x that of ClickHouse and 8.7x that of Apache Druid.
- After Bitmap Indexing of StarRocks is enabled, the performance is 1.3x compared to when this feature is disabled. The overall performance of StarRocks is 2.8x that of ClickHouse and 11.4x that of Apache Druid.
Test preparation
Hardware
Machine | 4 cloud hosts |
---|---|
CPU | 16-Core Intel (R) Xeon (R) Platinum 8269CY CPU @2.50GHz Cache size: 36608 KB |
Memory | 64 GB |
Network bandwidth | 5 Gbit/s |
Disk | ESSD |
Software
StarRocks, Apache Druid, and ClickHouse are deployed on hosts of the same configurations.
- StarRocks: one FE and three BEs. The FE can be separately or hybrid deployed with BEs.
- ClickHouse: three nodes with distributed tables
- Apache Druid: three nodes. One is deployed with Master Servers and Data Servers, one is deployed with Query Servers and Data Servers, and the third is deployed only with Data Servers.
Kernel version: Linux 3.10.0-1160.59.1.el7.x86_64
OS version: CentOS Linux release 7.9.2009
Software version: StarRocks Community Version 3.0, ClickHouse 23.3, Apache Druid 25.0.0
Test data and results
Test data
Table | Record | Description |
---|---|---|
lineorder | 600 million | Lineorder fact table |
customer | 3 million | Customer dimension table |
part | 1.4 million | Parts dimension table |
supplier | 200 thousand | Supplier dimension table |
dates | 2,556 | Date dimension table |
lineorder_flat | 600 million | lineorder flat table |
Test results
The following table shows the performance test results on thirteen queries. The unit of query latency is ms. All queries are warmed up 1 time and then executed 3 times to take the average value as the result. ClickHouse vs StarRocks
in the table header means using the query response time of ClickHouse to divide the query response time of StarRocks. A larger value indicates better performance of StarRocks.
StarRocks-3.0 | StarRocks-3.0-index | ClickHouse-23.3 | ClickHouse vs StarRocks | Druid-25.0.0 | Druid vs StarRocks | |
---|---|---|---|---|---|---|
Q1.1 | 33 | 30 | 48 | 1.45 | 430 | 13.03 |
Q1.2 | 10 | 10 | 15 | 1.50 | 270 | 27.00 |
Q1.3 | 23 | 30 | 14 | 0.61 | 820 | 35.65 |
Q2.1 | 186 | 116 | 301 | 1.62 | 760 | 4.09 |
Q2.2 | 156 | 50 | 273 | 1.75 | 920 | 5.90 |
Q2.3 | 73 | 36 | 255 | 3.49 | 910 | 12.47 |
Q3.1 | 173 | 233 | 398 | 2.30 | 1080 | 6.24 |
Q3.2 | 120 | 80 | 319 | 2.66 | 850 | 7.08 |
Q3.3 | 123 | 30 | 227 | 1.85 | 890 | 7.24 |
Q3.4 | 13 | 16 | 18 | 1.38 | 750 | 57.69 |
Q4.1 | 203 | 196 | 469 | 2.31 | 1230 | 6.06 |
Q4.2 | 73 | 76 | 160 | 2.19 | 1020 | 13.97 |
Q4.3 | 50 | 36 | 148 | 2.96 | 820 | 16.40 |
sum | 1236 | 939 | 2645 | 2.14 | 10750 | 8.70 |
Test procedure
For more information about how to create a ClickHouse table and load data to the table, see ClickHouse official doc. The following sections describe data generation and data loading of StarRocks.
Generate data
Download the ssb-poc toolkit and compile it.
wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-1.0.zip
unzip ssb-poc-1.0.zip
cd ssb-poc-1.0/
make && make install
cd output/
After the compilation, all the related tools are installed to the output
directories and the following operations are all performed under this directory.
First, generate data for SSB standard dataset scale factor=100
.
sh bin/gen-ssb.sh 100 data_dir
Create table schema
-
Modify the configuration file
conf/starrocks.conf
and specify the cluster address. Pay special attention tomysql_host
andmysql_port
. -
Run the following command to create a table:
sh bin/create_db_table.sh ddl_100
Query data
sh bin/benchmark.sh ssb-flat
Enable Bitmap Indexing
StarRocks performs better with Bitmap Indexing enabled. If you want to test the performance of StarRocks with Bitmap Indexing enabled, especially on Q2.2, Q2.3, and Q3.3, you can create Bitmap Indexes for all STRING columns.
-
Create another
lineorder_flat
table and create Bitmap Indexes.sh bin/create_db_table.sh ddl_100_bitmap_index
-
Add the following configuration to the
be.conf
file of all BEs and restart the BEs for the configurations to take effect.bitmap_max_filter_ratio=1000
-
Run the data loading script.
sh bin/flat_insert.sh data_dir
After data is loaded, wait for data version compaction to complete and then perform 4.4 again to query the data after Bitmap Indexing is enabled.
You can view the progress of data version compaction by running select CANDIDATES_NUM from information_schema.be_compactions
. For the three BE nodes, the following results show compaction is completed:
mysql> select CANDIDATES_NUM from information_schema.be_compactions;
+----------------+
| CANDIDATES_NUM |
+----------------+
| 0 |
| 0 |
| 0 |
+----------------+
3 rows in set (0.01 sec)
Test SQL and table creation statements
Test SQL
--Q1.1
SELECT sum(lo_extendedprice * lo_discount) AS `revenue`
FROM lineorder_flat
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31'
AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
--Q1.2
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31'
AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
--Q1.3
SELECT sum(lo_extendedprice * lo_discount) AS revenue
FROM lineorder_flat
WHERE weekofyear(lo_orderdate) = 6
AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31'
AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
--Q2.1
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.2
SELECT
sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.3
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q3.1
SELECT
c_nation,
s_nation,
year(lo_orderdate) AS year,
sum(lo_revenue) AS revenue FROM lineorder_flat
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01'
AND lo_orderdate <= '1997-12-31'
GROUP BY c_nation, s_nation, year
ORDER BY year ASC, revenue DESC;
--Q3.2
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES'
AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.3
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ('UNITED KI1', 'UNITED KI5')
AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.4
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ('UNITED KI1', 'UNITED KI5')
AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q4.1
SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ('MFGR#1', 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;
--Q4.2
SELECT year(lo_orderdate) AS year,
s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA'
AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31'
AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, s_nation, p_category
ORDER BY year ASC, s_nation ASC, p_category ASC;
--Q4.3
SELECT year(lo_orderdate) AS year, s_city, p_brand,
sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE s_nation = 'UNITED STATES'
AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31'
AND p_category = 'MFGR#14'
GROUP BY year, s_city, p_brand
ORDER BY year ASC, s_city ASC, p_brand ASC;