Monitor Kubernetes Effectively

This page provides insights for effectively monitoring your Kubernetes environment with Sumo Logic, and is organized according to the individual areas of the Kubernetes architecture.

Critical areas to monitor

The critical areas for monitoring in Kubernetes include the control plane, individual nodes, and pods. The following illustration provides a high-level view of the Kubernetes cluster architecture.

Control Plane

The Kubernetes control plane manages how Kubernetes communicates with your cluster. The various parts of the control plane work together in managing the health and performance of a cluster. The control plane consists of the API server, etcd, controller manager, and scheduler. Each part of the control plane has specific areas that should be monitored for the optimum health and performance of your cluster.

API Server - kube-apiserver

The API server is the front door to Kubernetes. Any changes you make in the environment are communicated through the API server. The API server uses all your resources to commit the changes inside the cluster. For the kube-apiserver, you should monitor the following:

API server latency
Requests per minute (RPM)
Threshold issues with etcd

ectd

The ectd of the control plane is a value store that Kubernetes uses this for storing all of the desired states for the cluster. Changes in the cluster are compared to the desired state to determine the necessary response. For ectd, you should monitor the following:

Leader changes
Quorum - if quorum is lost, etcd falls into a read-only state and you can't make changes to the API or get new state coming back from your cluster
Disk space - if you lose your backup for etcd you may lose all the state for the entire cluster

Controller Manager

The controller manager is responsible for continually coalescing the running state of the cluster versus the desired state. The controller manager monitors all changes made to the API server to see if the changes are in alignment with the desired state of the cluster, and is in constant contact with the cloud provider. The controller manager checks to see if all nodes are running and that there are enough pods; continually iterating to make sure the cluster maintains the desired state. For the controller manager, you should monitor:

End-to-end scheduling latency
Cloud provider latency
Failed requests to your cloud provider

Scheduler - kube-scheduler

The scheduler is responsible for scheduling the individual pods on the nodes. In order to schedule a pod, the scheduler compares the available resources with resource requests, as well as the availability of individual nodes and new nodes that are coming online due to scaling. For the scheduler, you should monitor:

Pods that fail to schedule
Requests, quota limits, anti-affinity
Bin packing - the number of pods that can fit on an individual node

Nodes

A node is a worker machine in a Kubernetes cluster that serves as a delivery vehicle for your applications. Nodes can be virtual machines (VMs) or physical machines. Nodes are essentially bare-bones servers. Like any other server, you should monitor the following on each node:

CPU
Memory consumption
System load
File system activity
Network activity

Each node is managed by the cluster and contains the services necessary to run pods. Node services include kubelet and kube-proxy.

Kubelet

The kubelet daemon runs on each individual node and monitors the health of the containers on the node, interacting with the container manager. The container manager can be Docker, rkt (pronounced Rocket), or any other Open Source container manager that is supported by Kubernetes.

For the kubelet, you should monitor the container manager which is itself a monitoring service.

Kube Proxy - kube-proxy

The kube-proxy is a network proxy that runs on each node in your cluster. The kube-proxy maintains network rules on nodes that facilitate network communication to the pods from network sessions inside or outside of the cluster. For the kube proxy, you should monitor:

Network traffic load
Cluster DNS
Service cluster IPs and ports

Pods

Pods reside on a given node, and a container can contain several pods. For pods, you should monitor:

Scheduler health for individual pods - so they don't get stuck in a restart loop
Pod health - availability, resource consumption, and performance

Kubernetes logs and metrics

Sumo Logic utilizes the following Open Source applications to provide insights into logs and metrics in your Kubernetes environment:

FluentD, an Open Source collector, is used to collect logs from the Kubernetes cluster, then forwards them to Sumo Logic. This includes cluster and container orchestration logs, as well as logs generated by apps on your pods.
Prometheus, an Open Source monitoring tool, is used to collect metrics that provide data on the cluster, node, and pod level.

Metrics rich monitoring

Sumo Logic provides scraper utility that collects Prometheus formatted metrics from exporters. The metrics are then enriched with metadata and sent to Sumo Logic via HTTP. Sumo Logic Kubernetes apps show Prometheus format metrics in graphic display dashboards for intuitive analysis.

How Kubernetes exposes metrics

Exporters expose metrics on a particular port and endpoint.
Node exporter can be run to expose node metrics, such as CPU and memory utilization which can be used to monitor the health of your nodes.
Kube State add-on exposes cluster level state information.

Monitor Kubernetes Effectively

Critical areas to monitor ​

Control Plane​

API Server - kube-apiserver​

ectd​

Controller Manager​

Scheduler - kube-scheduler​

Nodes​

Kubelet​

Kube Proxy - kube-proxy​

Pods​

Kubernetes logs and metrics​

Metrics rich monitoring​

How Kubernetes exposes metrics​