Understanding the Kubernetes Deployment of a SQL Server Big Data Cluster

Deploying a SQL Server Big Data Cluster is a very simple process, because everything is fully automated with the azdata tool as I have shown in my previous blog posting. When you have finally a successful deployment, you will get in sum 21 Kubernetes Pods, which are hosting 45 Docker Containers – in a non-HA deployment!

There are a lot of different Kubernetes Pods!

Therefore, I want to give you in today’s blog posting an overview about the various involved Kubernetes Pods and the Docker Containers.

Overview

When you deploy a SQL Server Big Data Cluster, the Kubernetes Pods are grouped with so-called Labels in 2 distinct groups:

  • Control Plane
  • Data Plane

The Pods in the Control Plane are used to manage the SQL Server Big Data Cluster, and the Pods in the Data Plane are used to provide you the necessary services, namely:

  • SQL Server Master Instance
  • Apache HDFS
  • Apache Spark

You can also see which Pod belongs to which Plane by running the following 2 commands through kubectl:

kubectl get pods -n mssql-cluster -l plane=control
kubectl get pods -n mssql-cluster -l plane=data

The Kubernetes Pods of each Plane

Let’s have now a more detailed look on the various Kubernetes Pods in each Plane.

Control Plane

The Control Plane of a SQL Server Big Data Cluster consists of the following 12 Pods:

  • control
  • controldb-0
  • controlwd
  • gateway-0
  • logsdb-0
  • logsui
  • metricsdb-0
  • 3x metricsdc
  • metricsui
  • mgmtproxy

The first large group of Kubernetes Pods are belonging to the Controller Service of the SQL Server Big Data Cluster which exposes a REST API to manage your SQL Server Big Data Cluster. You will find here the following deployed Kubernetes Pods and Docker Containers:

  • control
    • controller
    • security-support
    • fluentbit
  • controldb-0
    • mssql-server
    • fluentbit
  • controlwd
    • controlwatchdog

The gateway-0 Pod contains the following 2 Docker Containers:

  • knox
  • fluentbit

The purpose of this Pod is to provide you a gateway to be able to access HDFS Files, Spark (over Knox), and HTTPS endpoints for accessing WebHDFS and Spark.

The next group of Kubernetes Pods are belonging to the Logging functionality provided by a SQL Server Big Data Cluster.

Another large group of Kubernetes Pods are belonging to the Metrics functionality in a SQL Server Big Data Cluster:

  • metricsdb-0
    • influxdb
  • 3x metricsdc
    • telegraf
  • metricsui
    • grafana

The last Kubernetes Pod of the Control Plane is the mgmtproxy Pod, which contains the following 2 Docker Containers:

  • service-proxy
  • fluentbit

Data Plane

The Data Plane is the heart of a SQL Server Big Data Cluster, because it provides you all the various application features, namely the SQL Server Master Instance, the Compute Pool, Apache HDFS, and Apache Spark. Let’s have a more detailed look at these Pods.

The Apache HDFS deployment is provided by the following Kubernetes Pods:

And finally, the Apache Spark deployment is provided by the following Kubernetes Pod:

  • sparkhead-0: Apache Spark
    • hadoop-yarn-jobhistory
    • hadoop-livy-sparkhistory
    • hadoop-hivemetastore
    • fluentbit

Wow, that’s a lot of different Kubernetes Pods and Docker Containers!

Summary

A SQL Server Big Data Cluster is a huge Kubernetes Deployment with a lot of different Pods. Throughout this blog posting I gave you an overview about the various involved Pods and their usage. As you have also seen there are a lot of other Open Source technologies that Microsoft has integrated into a SQL Server Big Data Cluster, like collectd, fluentbit, Grafana, Kibana, InfluxDB, and ElasticSearch.

Thanks for your time,

-Klaus

1 thought on “Understanding the Kubernetes Deployment of a SQL Server Big Data Cluster”

  1. Is the “SQL Server master instance” in Big Data Cluster on Premise or has to be in Cloud as the HDFS is in Cloud? We like to keep our Relational Databases on Premise but add HDFS feature to get non-relational data like Excel, PowerPoint etc in the Big Data Cluster. Is it possible?

    thanks!

Leave a Comment

Your email address will not be published. Required fields are marked *