Understanding the Kubernetes Deployment of a SQL Server Big Data Cluster

Deploying a SQL Server Big Data Cluster is a very simple process, because everything is fully automated with the azdata tool as I have shown in my previous blog posting. When you have finally a successful deployment, you will get in sum 21 Kubernetes Pods, which are hosting 45 Docker Containers – in a non-HA deployment!

Therefore, I want to give you in today’s blog posting an overview about the various involved Kubernetes Pods and the Docker Containers.

Overview

When you deploy a SQL Server Big Data Cluster, the Kubernetes Pods are grouped with so-called Labels in 2 distinct groups:

Control Plane
Data Plane

The Pods in the Control Plane are used to manage the SQL Server Big Data Cluster, and the Pods in the Data Plane are used to provide you the necessary services, namely:

SQL Server Master Instance
Apache HDFS
Apache Spark

You can also see which Pod belongs to which Plane by running the following 2 commands through kubectl:

kubectl get pods -n mssql-cluster -l plane=control
kubectl get pods -n mssql-cluster -l plane=data

Let’s have now a more detailed look on the various Kubernetes Pods in each Plane.

Control Plane

The Control Plane of a SQL Server Big Data Cluster consists of the following 12 Pods:

control
controldb-0
controlwd
gateway-0
logsdb-0
logsui
metricsdb-0
3x metricsdc
metricsui
mgmtproxy

The first large group of Kubernetes Pods are belonging to the Controller Service of the SQL Server Big Data Cluster which exposes a REST API to manage your SQL Server Big Data Cluster. You will find here the following deployed Kubernetes Pods and Docker Containers:

control

controller
security-support
fluentbit

controldb-0

mssql-server
fluentbit

controlwd

controlwatchdog

The gateway-0 Pod contains the following 2 Docker Containers:

knox
fluentbit

The purpose of this Pod is to provide you a gateway to be able to access HDFS Files, Spark (over Knox), and HTTPS endpoints for accessing WebHDFS and Spark.

The next group of Kubernetes Pods are belonging to the Logging functionality provided by a SQL Server Big Data Cluster.

logsdb-0

init-sysctl (a Init Container)
elasticsearch

logsui

kibana

Another large group of Kubernetes Pods are belonging to the Metrics functionality in a SQL Server Big Data Cluster:

metricsdb-0

influxdb

3x metricsdc

telegraf

metricsui

grafana

The last Kubernetes Pod of the Control Plane is the mgmtproxy Pod, which contains the following 2 Docker Containers:

service-proxy
fluentbit

Data Plane

The Data Plane is the heart of a SQL Server Big Data Cluster, because it provides you all the various application features, namely the SQL Server Master Instance, the Compute Pool, Apache HDFS, and Apache Spark. Let’s have a more detailed look at these Pods.

master-0: SQL Server Master Instance

mssql-server
collectd
fluentbit

approxy: Used for Application Deployments

app-service-proxy
fluentbit

compute-0-0: Compute Pool

mssql-server
collectd
fluentbit

data-0-0: Data Pool

mssql-server
collectd
fluentbit

data-0-1: Data Pool

mssql-server
collectd
fluentbit

The Apache HDFS deployment is provided by the following Kubernetes Pods:

nmnode-0-0: Apache HDFS Name Node

hadoop
fluentbit

storage-0-0: Apache HDFS Data Node

hadoop
mssql-server
collectd
fluentbit

storage-0-1: Apache HDFS Data Node

hadoop
mssql-server
collectd
fluentbit

And finally, the Apache Spark deployment is provided by the following Kubernetes Pod:

sparkhead-0: Apache Spark

hadoop-yarn-jobhistory
hadoop-livy-sparkhistory
hadoop-hivemetastore
fluentbit

Wow, that’s a lot of different Kubernetes Pods and Docker Containers!

Summary

A SQL Server Big Data Cluster is a huge Kubernetes Deployment with a lot of different Pods. Throughout this blog posting I gave you an overview about the various involved Pods and their usage. As you have also seen there are a lot of other Open Source technologies that Microsoft has integrated into a SQL Server Big Data Cluster, like collectd, fluentbit, Grafana, Kibana, InfluxDB, and ElasticSearch.

Thanks for your time,

-Klaus

1 thought on “Understanding the Kubernetes Deployment of a SQL Server Big Data Cluster”

Amy Luo
03/16/2020 at 5:26 PM

Is the “SQL Server master instance” in Big Data Cluster on Premise or has to be in Cloud as the HDFS is in Cloud? We like to keep our Relational Databases on Premise but add HDFS feature to get non-relational data like Excel, PowerPoint etc in the Big Data Cluster. Is it possible?

thanks!

Reply

Understanding the Kubernetes Deployment of a SQL Server Big Data Cluster

Overview

Control Plane

Data Plane

Summary

1 thought on “Understanding the Kubernetes Deployment of a SQL Server Big Data Cluster”

Leave a Comment Cancel Reply

SQLpassion Performance Tuning Training Plan – Signup

Your SQL Server is slow and nobody knows why?

Check out my brand new SQL Rescue Sprint!

No risk. No endless projects. No surprises.