StarRocks with Helm
Goals
The goals of this quickstart are:
- Deploy the StarRocks Kubernetes Operator and a StarRocks cluster with Helm
- Configure a password for the StarRocks database user
root
- Provide for high-availability with three FEs and three BEs
- Store metadata in persistent storage
- Store data in persistent storage
- Allow MySQL clients to connect from outside the Kubernetes cluster
- Allow loading data from outside the Kubernetes cluster using Stream Load
- Load some public datasets
- Query the data
The datasets and queries are the same as the ones used in the Basic Quick Start. The main difference here is deploying with Helm and the StarRocks Operator.
The data used is provided by NYC OpenData and the National Centers for Environmental Information.
Both of these datasets are large, and because this tutorial is intended to help you get exposed to working with StarRocks we are not going to load data for the past 120 years. You can run this with a GKE Kubernetes cluster built on three e2-standard-4 machines (or similar) with 80GB disk. For larger deployments, we have other documentation and will provide that later.
There is a lot of information in this document, and it is presented with step-by-step content at the beginning, and the technical details at the end. This is done to serve these purposes in this order:
- Get the system deployed with Helm.
- Allow the reader to load data in StarRocks and analyze that data.
- Explain the basics of data transformation during loading.
Prerequisites
Kubernetes environment
The Kubernetes environment used while writing this guide consists of three nodes with four vCPUS, and 16GB RAM each (GCP e2-standard-4
machines). The Kubernetes cluster was deployed with this gcloud
command:
This command is for your reference, if you are using AWS, Azure, or any other Kubernetes provider you will need to modify this for your environment. In Google Cloud you will need to specify your own project and an appropriate location.
gcloud container --project enterprise-demo-422514 \
clusters create ee-docs \
--location=southamerica-west1-b \
--machine-type e2-standard-4 --disk-size 80 --num-nodes 3
Helm
Helm is a package manager for Kubernetes that simplifies the deployment and management of applications. In this lab you will use Helm to deploy the CelerData Enterprise Edition Kubernetes operator and the sample StarRocks cluster.
SQL client
You can use the SQL client provided in the Kubernetes environment, or use one on your system. This guide uses the mysql CLI
Many MySQL-compatible clients will work.
curl
curl
is used to issue the data load job to StarRocks, and to download the datasets. Check to see if you have it installed by running curl
or curl.exe
at your OS prompt. If curl is not installed, get curl here.
Terminology
FE
Frontend nodes are responsible for metadata management, client connection management, query planning, and query scheduling. Each FE stores and maintains a complete copy of metadata in its memory, which guarantees indiscriminate services among the FEs.
BE
Backend nodes are responsible for both data storage and executing query plans.
Add the StarRocks Helm chart repo
The Helm Chart contains the definitions of the StarRocks Operator and the custom resource StarRocksCluster.
-
Add the Helm Chart Repo.
helm repo add starrocks https://starrocks.github.io/starrocks-kubernetes-operator
-
Update the Helm Chart Repo to the latest version.
helm repo update
-
View the Helm Chart Repo that you added.
helm search repo starrocks
NAME CHART VERSION APP VERSION DESCRIPTION
starrocks/kube-starrocks 1.9.7 3.2-latest kube-starrocks includes two subcharts, operator...
starrocks/operator 1.9.7 1.9.7 A Helm chart for StarRocks operator
starrocks/starrocks 1.9.7 3.2-latest A Helm chart for StarRocks cluster
starrocks/warehouse 1.9.7 3.2-latest Warehouse is currently a feature of the StarRoc...
Download the data
Download these two datasets to your machine.