Skip to main content

Sumo Logic App for Kafka

icon

This guide provides an overview of Kafka related features and technologies. In addition, it contains recommendations on best practices, tutorials for getting started, and troubleshooting information for common situations.

The Sumo Logic App for Kafka is a unified logs and metrics app. The app helps you to monitor the availability, performance, and resource utilization of Kafka messaging/streaming clusters. Pre-configured dashboards provide insights into the cluster status, throughput, broker operations, topics, replication, zookeepers, node resource utilization, and error logs.

This App has been tested with following Kafka versions:

  • 2.6.0
  • 2.7.0

Sample Logs

{
"timestamp":1617392000686,
"log":"[2021-04-02 19:33:20,598] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)",
"stream":"stdout",
"time":"2021-04-02T19:33:20.599066311Z"
}

Sample Queries

This sample query string is from the Logs panel of the Kafka - Logs dashboard.

messaging_cluster=* messaging_system="kafka" \
| json auto maxdepth 1 nodrop | if (isEmpty(log), _raw, log) as kafka_log_message \
| parse field=kafka_log_message "[*] * *" as date_time,severity,msg | where severity in ("ERROR", "FATAL") \
| count by date_time, severity, msg | sort by date_time | limit 10

Collecting Logs and Metrics for Kafka

This section provides instructions for configuring log and metric collection for the Sumo Logic App for Kakfa.

Configure Fields in Sumo Logic

Create the following Fields in Sumo Logic prior to configuring collection. This ensures that your logs and metrics are tagged with relevant metadata, which is required by the app dashboards. For information on setting up fields, see Sumo Logic Fields.

If you're using Kafka in a Kubernetes environment, create the fields:

  • pod_labels_component
  • pod_labels_environment
  • pod_labels_messaging_system
  • pod_labels_messaging_cluster

Configure Collection for Kafka

Sumo Logic supports collection of logs and metrics data from Kafka in both Kubernetes and non-Kubernetes environments.

Click on the appropriate link below based on the environment where your Kafka clusters are hosted.

In Kubernetes environments, we use the Telegraf Operator, which is packaged with our Kubernetes collection. You can learn more about it here. The following diagram illustrates how data is collected from Kafka in Kubernetes environments. In the following architecture, there are four services that make up the metric collection pipeline: Telegraf, Prometheus, Fluentd and FluentBit.

non k8s-diagram

The first service in the pipeline is Telegraf. Telegraf collects metrics from Kafka. We’re running Telegraf in each pod we want to collect metrics from as a sidecar deployment. In other words, Telegraf runs in the same pod as the containers it monitors. Telegraf uses the Jolokia input plugin to obtain metrics, (For simplicity, the diagram doesn’t show the input plugins.) The injection of the Telegraf sidecar container is done by the Telegraf Operator. We also have Fluentbit that collects logs written to standard out and forwards them to FluentD, which in turn sends all the logs and metrics data to a Sumo Logic HTTP Source.

Configure Metrics Collection

Follow these steps to collect metrics from a Kubernetes environment:

  1. Setup Kubernetes Collection with the Telegraf operator. Ensure that you are monitoring your Kubernetes clusters with the Telegraf operator enabled. If you are not, then follow these instructions to do so.
  2. Add annotations on your Kafka pods.
    1. Open this yaml file and add the annotations mentioned there.
    2. Enter in values for the parameters marked with CHANGE_ME in the yaml file:
    • telegraf.influxdata.com/inputs - As telegraf will be run as a sidecar the urls should always be localhost.
    • In the input plugins section:
      • urls - The URL to the Kafka server. As telegraf will be run as a sidecar the urls should always be localhost. This can be a comma-separated list to connect to multiple Kafka servers.
    • In the tags sections, ([inputs.jolokia2_agent.tags] and [inputs.disk.tags]):
      • environment - This is the deployment environment where the Kafka cluster identified by the value of servers resides. For example: dev, prod or qa. While this value is optional we highly recommend setting it.
      • messaging_cluster - Enter a name to identify this Kafka cluster. This cluster name will be shown in the Sumo Logic dashboards.

Here’s an explanation for additional values set by this configuration that we request you do not modify these values as they will cause the Sumo Logic apps to not function correctly.

  • telegraf.influxdata.com/class: sumologic-prometheus - This instructs the Telegraf operator what output to use. This should not be changed.
  • prometheus.io/scrape: "true" - This ensures our Prometheus plugin will scrape the metrics.
  • prometheus.io/port: "9273" - This tells Prometheus what ports to scrape metrics from. This should not be changed.
  • telegraf.influxdata.com/inputs
    • In the tags sections [inputs.jolokia2_agent/diskio/disk]
      • component: “messaging” - This value is used by Sumo Logic apps to identify application components.
      • messaging_system: “kafka” - This value identifies the database system.

For more information on all other parameters, see this doc for more parameters that can be configured in the Telegraf agent globally.

For more information on configuring the Joloka input plugin for Telegraf see[ this doc] (https://github.com/influxdata/telegraf/tree/master/plugins/inputs/jolokia2).

  1. Configure your Kafka Pod to use the Jolokia Telegraf Input Plugin. Jolokia agent needs to be available to the Kafka Pods. Starting Kubernetes 1.10.0, you can store a binary file in a configMap. This makes it very easy to load the Jolokia jar file, and make it available to your pods.
  2. Download the latest version of the Jolokia JVM-Agent from Jolokia.
  3. Rename the file to jolokia.jar.
  4. Create a configMap jolokia from the binary file:
    kubectl create configmap jolokia --from-file=jolokia.jar
  5. Modify your Kafka Pod definition to include volume (type ConfigMap) and volumeMounts. Finally, update the env (environment variable) to start Jolokia, and apply the updated Kafka pod definition.
    spec:
    volumes:
    - name: jolokia
    configMap:
    name: jolokia
    containers:
    - name: XYZ
    image: XYZ
    env:
    - name: KAFKA_OPTS
    value: "-javaagent:/opt/jolokia/jolokia.jar=port=8778,host=0.0.0.0"
    volumeMounts:
    - mountPath: "/opt/jolokia"
    name: jolokia
  6. Verification Step: You can ssh to Kafka pod and run following commands to make sure Telegraf (and Jolokia) is scraping metrics from your Kafka Pod:
    curl localhost:9273/metrics
    curl http://localhost:8778/jolokia/list
    echo $KAFKA_OPTS

It should give you the following result:

-javaagent:/opt/jolokia/jolokia.jar=port=8778,host=0.0.0.0
  1. Make sure jolokia.jar exists at /opt/jolokia/ directory of kafka pod. This is an example of what a Pod definition file looks like.
  2. Once this has been done, the Sumo Logic Kubernetes collection will automatically start collecting metrics from the pods having the labels and annotations defined in the previous step. Verify metrics are flowing into Sumo Logic by running the following metrics query:
    component="messaging" and messaging_system="kafka"

Configure Logs Collection

This section explains the steps to collect Kafka logs from a Kubernetes environment.

  1. Collect Kafka logs written to standard output. If your Kafka helm chart/pod is writing the logs to standard output then follow the steps listed below to collect the logs:
    1. Apply the following labels to your Kafka pods: environment: "prod-CHANGE_ME" \ component: "messaging" \ messaging_system: "kafka" \ messaging_cluster: "kafka_prod_cluster01-CHANGE_ME”
    2. Enter in values for the following parameters (marked in bold and CHANGE_ME above):
      • environment - This is the deployment environment where the Kafka cluster identified by the value of servers resides. For example: dev, prod or qa. While this value is optional we highly recommend setting it.
      • messaging_cluster - Enter a name to identify this Kafka cluster. This cluster name will be shown in the Sumo Logic dashboards.
      • Here’s an explanation for additional values set by this configuration that we request you do not modify as they will cause the Sumo Logic apps to not function correctly.
      • component: “messaging” - This value is used by Sumo Logic apps to identify application components.
      • messaging_system: “kafka” - This value identifies the messaging system.
      • For all other parameters, see this doc for more parameters that can be configured in the Telegraf agent globally.
    3. The Sumologic-Kubernetes-Collection will automatically capture the logs from stdout and will send the logs to Sumologic. For more information on deploying Sumologic-Kubernetes-Collection, see this page.
  2. Collect Kafka logs written to log files (Optional). If your Kafka helm chart/pod is writing its logs to log files, you can use a sidecar to send log files to standard out. To do this:
    1. Determine the location of the Kafka log file on Kubernetes. This can be determined from helm chart configurations.
    2. Install the Sumo Logic tailing sidecar operator.
    3. Add the following annotation in addition to the existing annotations.
      annotations:
      tailing-sidecar: sidecarconfig;<mount>:<path_of_kafka_log_file>/<kafka_log_file_name>`
      Example:
      annotations:
      tailing-sidecar: sidecarconfig;data:/opt/Kafka/kafka_<VERSION>/logs/server.log
    4. Make sure that the Kafka pods are running and annotations are applied by using the command:
      kubectl describe pod <Kafka_pod_name>
    5. Sumo Logic Kubernetes collection will automatically start collecting logs from the pods having the annotations defined above.
  3. Add an FER to normalize the fields in Kubernetes environments. Labels created in Kubernetes environments automatically are prefixed with pod_labels. To normalize these for our app to work, we need to create a Field Extraction Rule if not already created for Messaging Application Components. To do so:
    1. Go to Manage Data > Logs > Field Extraction Rules.
    2. Click the + Add button on the top right of the table.
    3. The Add Field Extraction Rule form will appear. Enter the following options:
    • Rule Name. Enter the name as App Component Observability - Messaging.
    • Applied At. Choose Ingest Time
    • Scope. Select Specific Data
    • Scope: Enter the following keyword search expression:
      pod_labels_environment=* pod_labels_component=messaging
      pod_labels_messaging_system=kafka pod_labels_messaging_cluster=*
    • Parse Expression. Enter the following parse expression:
      if (!isEmpty(pod_labels_environment), pod_labels_environment, "") as environment
      | pod_labels_component as component
      | pod_labels_messaging_system as messaging_system
      | pod_labels_messaging_cluster as messaging_cluster
    1. Click Save to create the rule.
    2. Verify logs are flowing into Sumo Logic by running the following logs query:
      component="messaging" and messaging_system="kafka"

Installing Kafka Alerts

This section and below provide instructions for installing the Sumo App and Alerts for Kafka and descriptions of each of the app dashboards. These instructions assume you have already set up the collection as described in Collect Logs and Metrics for Kafka.

Pre-Packaged Alerts

Sumo Logic has provided out-of-the-box alerts available through Sumo Logic monitors to help you quickly determine if the Kafka cluster is available and performing as expected. These alerts are built based on metrics datasets and have preset thresholds based on industry best practices and recommendations. See Kafka Alerts for more details.

  • To install these alerts, you need to have the Manage Monitors role capability.
  • Alerts can be installed by either importing a JSON or a Terraform script.
  • There are limits to how many alerts can be enabled - see the Alerts FAQ for details.

Method A: Importing a JSON file

  1. Download a JSON file that describes the monitors.
    1. The JSON contains the alerts that are based on Sumo Logic searches that do not have any scope filters and therefore will be applicable to all Kafka clusters, the data for which has been collected via the instructions in the previous sections. However, if you would like to restrict these alerts to specific clusters or environments, update the JSON file by replacing the text 'messaging_system=kafka with '<Your Custom Filter>. Custom filter examples:
    • For alerts applicable only to a specific cluster, your custom filter would be: messaging_cluster=Kafka-prod.01
    • For alerts applicable to all clusters that start with Kafka-prod, your custom filter would be: messaging_cluster=Kafka-prod*
    • For alerts applicable to a specific cluster within a production environment, your custom filter would be: messaging_cluster=Kafka-1 and environment=prod (This assumes you have set the optional environment tag while configuring collection)
    1. Go to Manage Data > Alerts > Monitors.
    2. Click Add
    3. Click Import to import monitors from the JSON above.

The monitors are disabled by default. Once you have installed the alerts using this method, navigate to the Kafka folder under Monitors to configure them. See this document to enable monitors. To send notifications to teams or connections, see the instructions detailed in Step 4 of this document.

Method B: Using a Terraform script

  1. Generate an access key and access ID for a user that has the Manage Monitors role capability in Sumo Logic using these instructions. Identify which deployment your Sumo Logic account is in using this link.
  2. Download and install Terraform 0.13 or later.
  3. Download the Sumo Logic Terraform package for Kafka alerts. The alerts package is available in the Sumo Logic github repository. You can either download it through the “git clone” command or as a zip file.
  4. Alert Configuration. After the package has been extracted, navigate to the package directory terraform-sumologic-sumo-logic-monitor/monitor_packages/Kafka.
    1. Edit the monitor.auto.tfvars file and add the Sumo Logic Access Key, Access Id and Deployment from Step 1.
      access_id   = "<SUMOLOGIC ACCESS ID>"
      access_key = "<SUMOLOGIC ACCESS KEY>"
      environment = "<SUMOLOGIC DEPLOYMENT>"
    2. The Terraform script installs the alerts without any scope filters, if you would like to restrict the alerts to specific clusters or environments, update the variable ’kafka_data_source’. Custom filter examples:
    • For alerts applicable only to a specific cluster, your custom filter would be: messaging_cluster=Kafka-prod.01
    • For alerts applicable to all clusters that start with Kafka-prod, your custom filter would be: messaging_cluster=Kafka-prod*
    • For alerts applicable to a specific cluster within a production environment, your custom filter would be: messaging_cluster=Kafka-1 and environment=prod. This assumes you have set the optional environment tag while configuring collection.

All monitors are disabled by default on installation, if you would like to enable all the monitors, set the parameter monitors_disabled to false in this file.

By default, the monitors are configured in a monitor folder called “Kafka”, if you would like to change the name of the folder, update the monitor folder name in this file.

  1. To send email or connection notifications, modify the file notifications.auto.tfvars file and fill in the connection_notifications and email_notifications sections. See the examples for PagerDuty and email notifications below. See this document for creating payloads with other connection types.
Pagerduty Connection Example
connection_notifications = [
{
connection_type = "PagerDuty",
connection_id = "<CONNECTION_ID>",
payload_override = "{\"service_key\": \"your_pagerduty_api_integration_key\",\"event_type\": \"trigger\",\"description\": \"Alert: Triggered {{TriggerType}} for Monitor {{Name}}\",\"client\": \"Sumo Logic\",\"client_url\": \"{{QueryUrl}}\"}",
run_for_trigger_types = ["Critical", "ResolvedCritical"]
},
{
connection_type = "Webhook",
connection_id = "<CONNECTION_ID>",
payload_override = "",
run_for_trigger_types = ["Critical", "ResolvedCritical"]
}
]

Replace <CONNECTION_ID> with the connection id of the webhook connection. The webhook connection id can be retrieved by calling the Monitors API.

Email Notifications Example
email_notifications = [
{
connection_type = "Email",
recipients = ["abc@example.com"],
subject = "Monitor Alert: {{TriggerType}} on {{Name}}",
time_zone = "PST",
message_body = "Triggered {{TriggerType}} Alert on {{Name}}: {{QueryURL}}",
run_for_trigger_types = ["Critical", "ResolvedCritical"]
}
]
  1. Install the Alerts
    1. Navigate to the package directory terraform-sumologic-sumo-logic-monitor/monitor_packages/Kafka/ and run terraform init. This will initialize Terraform and will download the required components.
    2. Run terraform plan to view the monitors which will be created/modified by Terraform.
    3. Run terraform apply.
  2. Post Installation. If you haven’t enabled alerts and/or configured notifications through the Terraform procedure outlined above, we highly recommend enabling alerts of interest and configuring each enabled alert to send notifications to other people or services. This is detailed in Step 4 of this document.

Installing the Kafka App

This section demonstrates how to install the Kafka App.

Locate and install the app you need from the App Catalog. If you want to see a preview of the dashboards included with the app before installing, click Preview Dashboards.

  1. From the App Catalog, search for and select the app.
  2. Select the version of the service you're using and click Add to Library. Version selection is applicable only to a few apps currently. For more information, see the Install the Apps from the Library.
  3. To install the app, complete the following fields.
    • App Name. You can retain the existing name, or enter a name of your choice for the app.

    • Data Source. Choose Enter a Custom Data Filter, and enter a custom Kafka cluster filter. Examples:
      • For all Kafka clusters messaging_cluster=*
      • For a specific cluster: messaging_cluster=Kafka.dev.01.

      • Clusters within a specific environment: messaging_cluster=Kafka-1 and environment=prod. This assumes you have set the optional environment tag while configuring collection.
  4. Advanced. Select the Location in Library (the default is the Personal folder in the library), or click New Folder to add a new folder.
  5. Click Add to Library.

When an app is installed, it will appear in your Personal folder, or another folder that you specified. From here, you can share it with your organization.

Panels will start to fill automatically. It's important to note that each panel slowly fills with data matching the time range query and received since the panel was created. Results won't immediately be available, but with a bit of time, you'll see full graphs and maps.

Viewing the Kafka Dashboards

Filters with Template Variables

Template variables provide dynamic dashboards that rescope data on the fly. As you apply variables to troubleshoot through your dashboard, you can view dynamic changes to the data for a fast resolution to the root cause. For more information, see the Filter with template variables help page.

Kafka - Cluster Overview

The Kafka - Cluster Overview dashboard gives you an at-a-glance view of your Kafka deployment across brokers, controllers, topics, partitions and zookeepers.

Use this dashboard to:

  • Identify when brokers don’t have active controllers
  • Analyze trends across Request Handler Idle percentage metrics. Kafka’s request handler threads are responsible for servicing client requests ( read/write disk). If the request handler threads get overloaded, the time taken for requests to complete will be longer. If the request handler idle percent is constantly below 0.2 (20%), it may indicate that your cluster is overloaded and requires more resources.
  • Determine the number of leaders, partitions and zookeepers across each cluster and ensure they match with expectations
Kafka dashboards

Kafka - Outlier Analysis

The Kafka - Outlier Analysis dashboard helps you identify outliers for key metrics across your Kafka clusters.

Use this dashboard to:

  • To analyze trends, and quickly discover outliers across key metrics of your Kafka clusters
Kafka dashboards

Kafka - Replication

The Kafka - Replication dashboard helps you understand the state of replicas in your Kafka clusters.

Use this dashboard to monitor the following key metrics:

  • In-Sync Replicas (ISR) Expand Rate - The ISR Expand Rate metric displays the one-minute rate of increases in the number of In-Sync Replicas (ISR). ISR expansions occur when a broker comes online, such as when recovering from a failure or adding a new node. This increases the number of in-sync replicas available for each partition on that broker.The expected value for this rate is normally zero.
  • In-Sync Replicas (ISR) Shrink Rate - The ISR Shrink Rate metric displays the one-minute rate of decreases in the number of In-Sync Replicas (ISR). ISR shrinks occur when an in-sync broker goes down, as it decreases the number of in-sync replicas available for each partition replica on that broker.The expected value for this rate is normally zero.
    • ISR Shrink Vs Expand Rate - If you see a Spike in ISR Shrink followed by ISR Expand Rate - this may be because of nodes that have fallen behind replication and they may have either recovered or are in the process of recovering now.
    • Failed ISR Updates
    • Under Replicated Partitions Count
    • Under Min ISR Partitions Count -The Under Min ISR Partitions metric displays the number of partitions, where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified. The two most common causes of under-min ISR partitions are that one or more brokers are unresponsive, or the cluster is experiencing performance issues and one or more brokers are falling behind.
  • The expected value for this rate is normally zero.
Kafka dashboards

Kafka - Zookeeper

The Kafka -Zookeeper dashboard provides an at-a-glance view of the state of your partitions, active controllers, leaders, throughput and network across Kafka brokers and clusters.

Use this dashboard to monitor key Zookeeper metrics such as:

  • Zookeeper disconnect rate - This metric indicates if a Zookeeper node has lostits connection to a Kafka broker.
  • Authentication Failures - This metric indicates a Kafka Broker is unable to connect to its Zookeeper node.
  • Session Expiration - When a Kafka broker - Zookeeper node session expires, leader changes can occur and the broker can be assigned a new controller. If this metric is increasing we recommend you:
    1. Check the health of your network.
    2. Check for garbage collection issues and tune your JVMs accordingly.
  • Connection Rate.
Kafka dashboards

Kafka - Broker

The Kafka - Broker dashboard provides an at-a-glance view of the state of your partitions, active controllers, leaders, throughput, and network across Kafka brokers and clusters.

Use this dashboard to:

  • Monitor Under Replicaed and offline partitions to quickly identify if a Kafka broker is down or over utilized.
  • Monitor Unclean Leader Election count metrics - this metric shows the number of failures to elect a suitable leader per second. Unclean leader elections are caused when there are no available in-sync replicas for a partition (either due to network issues, lag causing the broker to fall behind, or brokers going down completely), so an out of sync replica is the only option for the leader. When an out of sync replica is elected leader, all data not replicated from the previous leader is lost forever.
  • Monitor producer and fetch request rates.
  • Monitor Log flush rate to determine the rate at which log data is written to disk
Kafka dashboards

Kafka - Failures and Delayed Operations

The Kafka - Failures and Delayed Operations dashboard gives you insight into all failures and delayed operations associated with your Kafka clusters.

Use this dashboard to:

  • Analyze failed produce requests - A failed produce request occurs when a problem is encountered when processing a produce request. This could be for a variety of reasons, however some common reasons are:
    • The destination topic doesn’t exist (if auto-create is enabled then subsequent messages should be sent successfully).
    • The message is too large.
    • The producer is using request.required.acks=all or –1, and fewer than the required number of acknowledgements are received.
  • Analyze failed Fetch Request - A failed fetch request occurs when a problem is encountered when processing a fetch request. This could be for a variety of reasons, but the most common cause is consumer requests timing out.
  • Monitor delayed Operations metrics - This contains metrics regarding the number of requests that are delayed and waiting in purgatory. The purgatory size metric can be used to determine the root cause of latency. For example, increased consumer fetch times could be explained by an increased number of fetch requests waiting in purgatory. Available metrics are:
    • Fetch Purgatory Size - The Fetch Purgatory Size metric shows the number of fetch requests currently waiting in purgatory. Fetch requests are added to purgatory if there is not enough data to fulfil the request (determined by fetch.min.bytes in the consumer configuration) and the requests wait in purgatory until the time specified by fetch.wait.max.ms is reached, or enough data becomes available.
    • Produce Purgatory Size - The Produce Purgatory Size metric shows the number of produce requests currently waiting in purgatory. Produce requests are added to purgatory if request.required.acks is set to -1 or all, and the requests wait in purgatory until the partition leader receives an acknowledgement from all its followers. If the purgatory size metric keeps growing, some partition replicas may be overloaded. If this is the case, you can choose to increase the capacity of your cluster, or decrease the amount of produce requests being generated.
Kafka dashboards

Kafka - Request-Response Times

The Kafka - Request-Response Times dashboard helps you get insight into key request and response latencies of your Kafka cluster.

Use this dashboard to:

  • Monitor request time metrics - The Request Metrics metric group contains information regarding different types of request to and from the cluster. Important request metrics to monitor:
    1. Fetch Consumer Request Total Time - The Fetch Consumer Request Total Time metric shows the maximum and mean amount of time taken for processing, and the number of requests from consumers to get new data. Reasons for increased time taken could be: increased load on the node (creating processing delays), or perhaps requests are being held in purgatory for a long time (determined by fetch.min.bytes and fetch.wait.max.ms metrics).
    2. Fetch Follower Request Total Time - The Fetch Follower Request Total Time metric displays the maximum and mean amount of time taken while processing, and the number of requests to get new data from Kafka brokers that are followers of a partition. Common causes of increased time taken are increased load on the node causing delays in processing requests, or that some partition replicas may be overloaded or temporarily unavailable.
    3. Produce Request Total Time - The Produce Request Total Time metric displays the maximum and mean amount of time taken for processing, and the number of requests from producers to send data. Some reasons for increased time taken could be: increased load on the node causing delays in processing the requests, or perhaps requests are being held in purgatory for a long time (if the requests.required.acks metrics is equal to '1' or all).
Kafka dashboards

Kafka - Logs

This dashboard helps you quickly analyze your Kafka error logs across all clusters.

Use this dashboard to:

  • Identify critical events in your Kafka broker and controller logs;
  • Examine trends to detect spikes in Error or Fatal events
  • Monitor Broker added/started and shutdown events in your cluster.
  • Quickly determine patterns across all logs in a given Kafka cluster.
Kafka dashboards

Kafka Broker - Performance Overview

The Kafka Broker - Performance Overview dashboards helps you Get an at-a-glance view of the performance and resource utilization of your Kafka brokers and their JVMs.

Use this dashboard to:

  • Monitor the number of open file descriptors. If the number of open file descriptors reaches the maximum file descriptor, it can cause an IOException error
  • Get insight into Garbage collection and its impact on CPU usage and memory
  • Examine how threads are distributed
  • Understand the behavior of class count. If class count keeps on increasing, you may have a problem with the same classes loaded by multiple classloaders.
Kafka dashboards

Kafka Broker - CPU

The Kafka Broker - CPU dashboard shows information about the CPU utilization of individual Broker machines.

Use this dashboard to:

  • Get insights into the process and user CPU load of Kafka brokers. High CPU utilization can make Kafka flaky and can cause read/write timeouts.
Kafka dashboards

Kafka Broker - Memory

The Kafka Broker - Memory dashboard shows the percentage of the heap and non-heap memory used, physical and swap memory usage of your Kafka broker’s JVM.

Use this dashboard to:

  • Understand how memory is used across Heap and Non-Heap memory.
  • Examine physical and swap memory usage and make resource adjustments as needed.
  • Examine the pending object finalization count which when high can lead to excessive memory usage.
Kafka dashboards

Kafka Broker - Disk Usage

The Kafka Broker - Disk Usage dashboard helps monitor disk usage across your Kafka Brokers.

Use this dashboard to:

  • Monitor Disk Usage percentage on Kafka Brokers. This is critical as Kafka brokers use disk space to store messages for each topic. Other factors that affect disk utilization are:
    1. Topic replication factor of Kafka topics.
    2. Log retention settings.
  • Analyze trends in disk throughput and find any spikes. This is especially important as disk throughput can be a performance bottleneck.
  • Monitor iNodes bytes used, and disk read vs writes. These metrics are important to monitor as Kafka may not necessarily distribute data from a heavily occupied disk, which itself can bring the Kafka down.
Kafka dashboards

Kafka Broker - Garbage Collection

The Kafka Broker - Garbage Collection dashboard shows key Garbage Collector statistics like the duration of the last GC run, objects collected, threads used, and memory cleared in the last GC run of your java virtual machine.

Use this dashboard to:

  • Understand the amount of time spent in garbage collection. If this time keeps increasing, your Kakfa brokers may have more CPU usage.
  • Understand the amount of memory cleared by garbage collectors across memory pools and their impact on the Heap memory.
Kafka dashboards

Kafka Broker - Threads

The Kafka Broker - Threads dashboard shows the key insights into the usage and type of threads created in your Kafka broker JVM

Use this dashboard to:

  • Understand the dynamic behavior of the system using peak, daemon, and current threads.
  • Gain insights into the memory and CPU time of the last executed thread.
Kafka dashboards

Kafka Broker - Class Loading and Compilation

The Kafka Broker - Class Loading and Compilation dashboard helps you get insights into the behavior of class count trends.

Use this dashboard to:

  • Determine If the class count keeps increasing, this indicates that the same classes are loaded by multiple classloaders.
  • Get insights into time spent by Java Virtual machines during compilation.
Kafka dashboards

Kafka - Topic Overview

The Kafka - Topic Overview dashboard helps you quickly identify under-replicated partitions, and incoming bytes by Kafka topic, server and cluster.

Use this dashboard to:

  • Monitor under replicated partitions - The Under Replicated Partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor. A partition will also be considered under-replicated if the correct number of replicas exist, but one or more of the replicas have fallen significantly behind the partition leader. The two most common causes of under-replicated partitions are that one or more brokers are unresponsive, or the cluster is experiencing performance issues and one or more brokers have fallen behind.

    This metric is tagged with cluster, server, and topic info for easy troubleshooting. The colors in the Honeycomb chart are coded as follows:

  1. Green indicates there are no under Replicated Partitions.
  2. Red indicates a given partition is under replicated.
Kafka dashboards

Kafka - Topic Details

The Kafka - Topic Details dashboard gives you insight into throughput, partition sizes and offsets across Kafka brokers, topics and clusters.

Use this dashboard to:

  • Monitor metrics like Log partition size, log start offset, and log segment count metrics.
  • Identify offline/under replicated partitions count. Partitions can be in this state on account of resource shortages or broker unavailability.
  • Monitor the In Sync replica (ISR) Shrink rate. ISR shrinks occur when an in-sync broker goes down, as it decreases the number of in-sync replicas available for each partition replica on that broker.
  • Monitor In Sync replica (ISR) Expand rate. ISR expansions occur when a broker comes online, such as when recovering from a failure or adding a new node. This increases the number of in-sync replicas available for each partition on that broker.
Kafka dashboards

Kafka Alerts

Alert NameAlert Description and conditionsAlert ConditionRecover Condition
Kafka - High CPU on Broker nodeThis alert fires when we detect that the average CPU utilization for a broker node is high (>=85%) for an interval of 5 minutes.
Kafka - High Broker Disk UtilizationThis alert fires when we detect that a disk on a broker node is more than 85% full.>=85< 85
Kafka - Garbage collectionThis alert fires when we detect that the average Garbage Collection time on a given Kafka broker node over a 5 minute interval is more than one second.> = 1< 1
Kafka - High Broker Memory UtilizationThis alert fires when the average memory utilization within a 5 minute interval for a given Kafka node is high (>=85%).>= 85< 85
Kafka - Large number of broker errorsThis alert fires when we detect that there are 5 or more errors on a Broker node within a time interval of 5 minutes.
Kafka - Large number of broker warningsThis alert fires when we detect that there are 5 or more warnings on a Broker node within a time interval of 5 minutes.
Kafka - Out of Sync Followers
Kafka - Unavailable ReplicasThis alert when we detect that there are replicas that are unavailable.
Kafka - Consumer LagThis alert fires when we detect that a Kafka consumer has a 30 minutes and increasing lag
Kafka - Fatal Event on BrokerThis alert fires when we detect a fatal operation on a Kafka broker node>=1<1
Kafka - Multiple Errors on BrokerThis alert fires when we detect five or more errors on a Kafka broker node in a 5 minute interval.>=5<5
Kafka - Underreplicated PartitionsThis alert fires when we detect underreplicated partitions on a given Kafka broker.
Kafka - Offline PartitionsThis alert fires when we detect offline partitions on a given Kafka broker.
Kafka - High Leader election rateThis alert fires when we detect high leader election rate.
Kafka - Failed Zookeeper connectionsThis alert fires when we detect Broker to Zookeeper connection failures
Kafka - Replica LagThis alert fires when we detect that a Kafka replica has a lag of over 30 minutes
Kafka - Lower Producer-Consumer buffer timeThis alert fires when we detect that there is only one hour of time remaining between earliest offset and consumer position.

Kafka Metrics

Here's a list of available Kafka metrics.

kafka_broker_disk_freekafka_replica_manager_FailedIsrUpdatesPerSec_Count
kafka_broker_disk_inodes_totalkafka_replica_manager_FailedIsrUpdatesPerSec_MeanRate
kafka_broker_disk_inodes_usedkafka_replica_manager_FailedIsrUpdatesPerSec_OneMinuteRate
kafka_broker_disk_totalkafka_replica_manager_IsrExpandsPerSec_FifteenMinuteRate
kafka_broker_disk_used_percentkafka_replica_manager_IsrExpandsPerSec_FiveMinuteRate
kafka_broker_diskio_io_timekafka_replica_manager_IsrExpandsPerSec_MeanRate
kafka_broker_diskio_iops_in_progresskafka_replica_manager_IsrShrinksPerSec_MeanRate
kafka_broker_diskio_merged_readskafka_replica_manager_LeaderCount_Value
kafka_broker_diskio_merged_writeskafka_replica_manager_PartitionCount_Value
kafka_broker_diskio_read_byteskafka_replica_manager_ReassigningPartitions_Value
kafka_broker_diskio_read_timekafka_replica_manager_UnderMinIsrPartitionCount_Value
kafka_broker_diskio_readskafka_replica_manager_UnderReplicatedPartitions_Value
kafka_broker_diskio_weighted_io_timekafka_request_handlers_MeanRate
kafka_broker_diskio_write_byteskafka_request_LocalTimeMs_50thPercentile
kafka_broker_diskio_write_timekafka_request_LocalTimeMs_75thPercentile
kafka_broker_diskio_writeskafka_request_LocalTimeMs_95thPercentile
kafka_controller_ActiveControllerCount_Valuekafka_request_LocalTimeMs_98thPercentile
kafka_controller_AutoLeaderBalanceRateAndTimeMs_50thPercentilekafka_request_LocalTimeMs_999thPercentile
kafka_controller_AutoLeaderBalanceRateAndTimeMs_75thPercentilekafka_request_LocalTimeMs_99thPercentile
kafka_controller_AutoLeaderBalanceRateAndTimeMs_98thPercentilekafka_request_LocalTimeMs_Count
kafka_controller_AutoLeaderBalanceRateAndTimeMs_99thPercentilekafka_request_LocalTimeMs_Max
kafka_controller_AutoLeaderBalanceRateAndTimeMs_Countkafka_request_LocalTimeMs_Mean
kafka_controller_AutoLeaderBalanceRateAndTimeMs_FifteenMinuteRatekafka_request_LocalTimeMs_Min
kafka_controller_AutoLeaderBalanceRateAndTimeMs_Maxkafka_request_LocalTimeMs_StdDev
kafka_controller_AutoLeaderBalanceRateAndTimeMs_Meankafka_request_MessageConversionsTimeMs_50thPercentile
kafka_controller_AutoLeaderBalanceRateAndTimeMs_Minkafka_request_MessageConversionsTimeMs_75thPercentile
kafka_controller_AutoLeaderBalanceRateAndTimeMs_StdDevkafka_request_MessageConversionsTimeMs_95thPercentile
kafka_controller_ControlledShutdownRateAndTimeMs_99thPercentilekafka_request_MessageConversionsTimeMs_98thPercentile
kafka_controller_ControlledShutdownRateAndTimeMs_FiveMinuteRatekafka_request_MessageConversionsTimeMs_99thPercentile
kafka_controller_ControlledShutdownRateAndTimeMs_Minkafka_request_MessageConversionsTimeMs_Count
kafka_controller_ControllerChangeRateAndTimeMs_50thPercentilekafka_request_MessageConversionsTimeMs_Max
kafka_controller_ControllerChangeRateAndTimeMs_75thPercentilekafka_request_MessageConversionsTimeMs_Min
kafka_controller_ControllerChangeRateAndTimeMs_98thPercentilekafka_request_RemoteTimeMs_50thPercentile
kafka_controller_ControllerChangeRateAndTimeMs_99thPercentilekafka_request_RemoteTimeMs_75thPercentile
kafka_controller_ControllerChangeRateAndTimeMs_Maxkafka_request_RemoteTimeMs_95thPercentile
kafka_controller_ControllerChangeRateAndTimeMs_MeanRatekafka_request_RemoteTimeMs_98thPercentile
kafka_controller_ControllerChangeRateAndTimeMs_StdDevkafka_request_RemoteTimeMs_999thPercentile
kafka_controller_ControllerShutdownRateAndTimeMs_50thPercentilekafka_request_RemoteTimeMs_99thPercentile
kafka_controller_ControllerShutdownRateAndTimeMs_75thPercentilekafka_request_RemoteTimeMs_Count
kafka_controller_ControllerShutdownRateAndTimeMs_99thPercentilekafka_request_RemoteTimeMs_Max
kafka_controller_ControllerShutdownRateAndTimeMs_Countkafka_request_RemoteTimeMs_Mean
kafka_controller_ControllerShutdownRateAndTimeMs_FifteenMinuteRatekafka_request_RemoteTimeMs_Min
kafka_controller_ControllerShutdownRateAndTimeMs_Minkafka_request_RemoteTimeMs_StdDev
kafka_controller_ControllerShutdownRateAndTimeMs_StdDevkafka_request_RequestBytes_50thPercentile
kafka_controller_EventQueueSize_Valuekafka_request_RequestBytes_75thPercentile
kafka_controller_EventQueueTimeMs_95thPercentilekafka_request_RequestBytes_95thPercentile
kafka_controller_EventQueueTimeMs_98thPercentilekafka_request_RequestBytes_98thPercentile
kafka_controller_EventQueueTimeMs_999thPercentilekafka_request_RequestBytes_999thPercentile
kafka_controller_EventQueueTimeMs_Minkafka_request_RequestBytes_99thPercentile
kafka_controller_GlobalPartitionCount_Valuekafka_request_RequestBytes_Count
kafka_controller_GlobalTopicCount_Valuekafka_request_RequestBytes_Max
kafka_controller_IsrChangeRateAndTimeMs_50thPercentilekafka_request_RequestBytes_Mean
kafka_controller_IsrChangeRateAndTimeMs_75thPercentilekafka_request_RequestBytes_Min
kafka_controller_IsrChangeRateAndTimeMs_95thPercentilekafka_request_RequestBytes_StdDev
kafka_controller_IsrChangeRateAndTimeMs_98thPercentilekafka_request_RequestQueueTimeMs_50thPercentile
kafka_controller_IsrChangeRateAndTimeMs_99thPercentilekafka_request_RequestQueueTimeMs_75thPercentile
kafka_controller_IsrChangeRateAndTimeMs_Countkafka_request_RequestQueueTimeMs_95thPercentile
kafka_controller_IsrChangeRateAndTimeMs_FifteenMinuteRatekafka_request_RequestQueueTimeMs_98thPercentile
kafka_controller_IsrChangeRateAndTimeMs_FiveMinuteRatekafka_request_RequestQueueTimeMs_999thPercentile
kafka_controller_LeaderAndIsrResponseReceivedRateAndTimeMs_75thPercentilekafka_request_RequestQueueTimeMs_99thPercentile
kafka_controller_LeaderAndIsrResponseReceivedRateAndTimeMs_95thPercentilekafka_request_RequestQueueTimeMs_Count
kafka_controller_LeaderAndIsrResponseReceivedRateAndTimeMs_FiveMinuteRatekafka_request_RequestQueueTimeMs_Max
kafka_controller_LeaderAndIsrResponseReceivedRateAndTimeMs_MeanRatekafka_request_RequestQueueTimeMs_Mean
kafka_controller_LeaderAndIsrResponseReceivedRateAndTimeMs_Minkafka_request_RequestQueueTimeMs_Min
kafka_controller_LeaderAndIsrResponseReceivedRateAndTimeMs_OneMinuteRatekafka_request_RequestQueueTimeMs_StdDev
kafka_controller_LeaderElectionRateAndTimeMs_95thPercentilekafka_request_ResponseQueueTimeMs_50thPercentile
kafka_controller_LeaderElectionRateAndTimeMs_999thPercentilekafka_request_ResponseQueueTimeMs_75thPercentile
kafka_controller_LeaderElectionRateAndTimeMs_FifteenMinuteRatekafka_request_ResponseQueueTimeMs_95thPercentile
kafka_controller_LeaderElectionRateAndTimeMs_Maxkafka_request_ResponseQueueTimeMs_98thPercentile
kafka_controller_LeaderElectionRateAndTimeMs_Minkafka_request_ResponseQueueTimeMs_999thPercentile
kafka_controller_ListPartitionReassignmentRateAndTimeMs_50thPercentilekafka_request_ResponseQueueTimeMs_99thPercentile
kafka_controller_ListPartitionReassignmentRateAndTimeMs_95thPercentilekafka_request_ResponseQueueTimeMs_Count
kafka_controller_ListPartitionReassignmentRateAndTimeMs_999thPercentilekafka_request_ResponseQueueTimeMs_Max
kafka_controller_ListPartitionReassignmentRateAndTimeMs_Meankafka_request_ResponseQueueTimeMs_Mean
kafka_controller_ListPartitionReassignmentRateAndTimeMs_Minkafka_request_ResponseQueueTimeMs_Min
kafka_controller_ListPartitionReassignmentRateAndTimeMs_OneMinuteRatekafka_request_ResponseQueueTimeMs_StdDev
kafka_controller_LogDirChangeRateAndTimeMs_75thPercentilekafka_request_ResponseSendTimeMs_50thPercentile
kafka_controller_LogDirChangeRateAndTimeMs_999thPercentilekafka_request_ResponseSendTimeMs_75thPercentile
kafka_controller_LogDirChangeRateAndTimeMs_99thPercentilekafka_request_ResponseSendTimeMs_95thPercentile
kafka_controller_LogDirChangeRateAndTimeMs_Countkafka_request_ResponseSendTimeMs_98thPercentile
kafka_controller_LogDirChangeRateAndTimeMs_FifteenMinuteRatekafka_request_ResponseSendTimeMs_999thPercentile
kafka_controller_ManualLeaderBalanceRateAndTimeMs_50thPercentilekafka_request_ResponseSendTimeMs_99thPercentile
kafka_controller_ManualLeaderBalanceRateAndTimeMs_75thPercentilekafka_request_ResponseSendTimeMs_Count
kafka_controller_ManualLeaderBalanceRateAndTimeMs_98thPercentilekafka_request_ResponseSendTimeMs_Max
kafka_controller_ManualLeaderBalanceRateAndTimeMs_999thPercentilekafka_request_ResponseSendTimeMs_Mean
kafka_controller_ManualLeaderBalanceRateAndTimeMs_FiveMinuteRatekafka_request_ResponseSendTimeMs_Min
kafka_controller_ManualLeaderBalanceRateAndTimeMs_Meankafka_request_ResponseSendTimeMs_StdDev
kafka_controller_ManualLeaderBalanceRateAndTimeMs_Minkafka_request_TemporaryMemoryBytes_75thPercentile
kafka_controller_ManualLeaderBalanceRateAndTimeMs_OneMinuteRatekafka_request_TemporaryMemoryBytes_98thPercentile
kafka_controller_PartitionReassignmentRateAndTimeMs_50thPercentilekafka_request_TemporaryMemoryBytes_999thPercentile
kafka_controller_PartitionReassignmentRateAndTimeMs_75thPercentilekafka_request_TemporaryMemoryBytes_99thPercentile
kafka_controller_PartitionReassignmentRateAndTimeMs_98thPercentilekafka_request_TemporaryMemoryBytes_Max
kafka_controller_PartitionReassignmentRateAndTimeMs_999thPercentilekafka_request_TemporaryMemoryBytes_Mean
kafka_controller_PartitionReassignmentRateAndTimeMs_99thPercentilekafka_request_TemporaryMemoryBytes_Min
kafka_controller_PartitionReassignmentRateAndTimeMs_Countkafka_request_TemporaryMemoryBytes_StdDev
kafka_controller_PartitionReassignmentRateAndTimeMs_FiveMinuteRatekafka_request_ThrottleTimeMs_50thPercentile
kafka_controller_PartitionReassignmentRateAndTimeMs_Maxkafka_request_ThrottleTimeMs_75thPercentile
kafka_controller_PartitionReassignmentRateAndTimeMs_Meankafka_request_ThrottleTimeMs_95thPercentile
kafka_controller_PartitionReassignmentRateAndTimeMs_MeanRatekafka_request_ThrottleTimeMs_98thPercentile
kafka_controller_PartitionReassignmentRateAndTimeMs_OneMinuteRatekafka_request_ThrottleTimeMs_999thPercentile
kafka_controller_PreferredReplicaImbalanceCount_Valuekafka_request_ThrottleTimeMs_99thPercentile
kafka_controller_ReplicasIneligibleToDeleteCount_Valuekafka_request_ThrottleTimeMs_Count
kafka_controller_TopicChangeRateAndTimeMs_99thPercentilekafka_request_ThrottleTimeMs_Max
kafka_controller_TopicChangeRateAndTimeMs_Countkafka_request_ThrottleTimeMs_Mean
kafka_controller_TopicChangeRateAndTimeMs_FiveMinuteRatekafka_request_ThrottleTimeMs_Min
kafka_controller_TopicChangeRateAndTimeMs_Meankafka_request_ThrottleTimeMs_StdDev
kafka_controller_TopicChangeRateAndTimeMs_MeanRatekafka_request_TotalTimeMs_50thPercentile
kafka_controller_TopicChangeRateAndTimeMs_Minkafka_request_TotalTimeMs_75thPercentile
kafka_controller_TopicChangeRateAndTimeMs_StdDevkafka_request_TotalTimeMs_95thPercentile
kafka_controller_TopicDeletionRateAndTimeMs_75thPercentilekafka_request_TotalTimeMs_98thPercentile
kafka_controller_TopicDeletionRateAndTimeMs_95thPercentilekafka_request_TotalTimeMs_999thPercentile
kafka_controller_TopicDeletionRateAndTimeMs_98thPercentilekafka_request_TotalTimeMs_99thPercentile
kafka_controller_TopicDeletionRateAndTimeMs_Countkafka_request_TotalTimeMs_Count
kafka_controller_TopicDeletionRateAndTimeMs_FifteenMinuteRatekafka_request_TotalTimeMs_Max
kafka_controller_TopicDeletionRateAndTimeMs_Maxkafka_request_TotalTimeMs_Mean
kafka_controller_TopicDeletionRateAndTimeMs_OneMinuteRatekafka_request_TotalTimeMs_Min
kafka_controller_TopicsToDeleteCount_Valuekafka_request_TotalTimeMs_StdDev
kafka_controller_TopicUncleanLeaderElectionEnableRateAndTimeMs_98thPercentilekafka_topic_BytesInPerSec_Count
kafka_controller_TopicUncleanLeaderElectionEnableRateAndTimeMs_999thPercentilekafka_topic_BytesInPerSec_FiveMinuteRate
kafka_controller_TopicUncleanLeaderElectionEnableRateAndTimeMs_Countkafka_topic_BytesInPerSec_MeanRate
kafka_controller_TopicUncleanLeaderElectionEnableRateAndTimeMs_FifteenMinuteRatekafka_topic_BytesInPerSec_OneMinuteRate
kafka_controller_TotalQueueSize_Valuekafka_topic_BytesOutPerSec_FiveMinuteRate
kafka_controller_UncleanLeaderElectionEnableRateAndTimeMs_50thPercentilekafka_topic_BytesOutPerSec_MeanRate
kafka_controller_UncleanLeaderElectionEnableRateAndTimeMs_75thPercentilekafka_topic_MessagesInPerSec_Count
kafka_controller_UncleanLeaderElectionEnableRateAndTimeMs_95thPercentilekafka_topic_TotalFetchRequestsPerSec_FifteenMinuteRate
kafka_controller_UncleanLeaderElectionEnableRateAndTimeMs_98thPercentilekafka_topic_TotalFetchRequestsPerSec_FiveMinuteRate
kafka_controller_UncleanLeaderElectionEnableRateAndTimeMs_Countkafka_topic_TotalFetchRequestsPerSec_MeanRate
kafka_controller_UncleanLeaderElectionEnableRateAndTimeMs_FifteenMinuteRatekafka_topic_TotalFetchRequestsPerSec_OneMinuteRate
kafka_controller_UncleanLeaderElectionEnableRateAndTimeMs_FiveMinuteRatekafka_topic_TotalProduceRequestsPerSec_Count
kafka_controller_UncleanLeaderElectionEnableRateAndTimeMs_MeanRatekafka_topic_TotalProduceRequestsPerSec_FifteenMinuteRate
kafka_controller_UncleanLeaderElectionEnableRateAndTimeMs_Minkafka_topic_TotalProduceRequestsPerSec_FiveMinuteRate
kafka_controller_UncleanLeaderElectionsPerSec_FifteenMinuteRatekafka_topic_TotalProduceRequestsPerSec_MeanRate
kafka_controller_UpdateFeaturesRateAndTimeMs_MeanRatekafka_topics_BytesInPerSec_Count
kafka_controller_UpdateFeaturesRateAndTimeMs_StdDevkafka_topics_BytesInPerSec_FifteenMinuteRate
kafka_java_lang_GarbageCollector_CollectionCountkafka_topics_BytesInPerSec_MeanRate
kafka_java_lang_GarbageCollector_CollectionTimekafka_topics_BytesInPerSec_OneMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_endTimekafka_topics_BytesOutPerSec_MeanRate
kafka_java_lang_GarbageCollector_LastGcInfo_GcThreadCountkafka_topics_BytesOutPerSec_OneMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_idkafka_topics_BytesRejectedPerSec_Count
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_Code_Cache_maxkafka_topics_BytesRejectedPerSec_FiveMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_Code_Cache_usedkafka_topics_BytesRejectedPerSec_MeanRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_CodeHeap__non_nmethods__initkafka_topics_FailedFetchRequestsPerSec_MeanRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_CodeHeap__non_profiled_nmethods__usedkafka_topics_FailedProduceRequestsPerSec_FifteenMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_Compressed_Class_Space_initkafka_topics_FailedProduceRequestsPerSec_FiveMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_G1_Eden_Space_committedkafka_topics_FailedProduceRequestsPerSec_MeanRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_G1_Eden_Space_initkafka_topics_FailedProduceRequestsPerSec_OneMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_G1_Eden_Space_maxkafka_topics_InvalidMagicNumberRecordsPerSec_FifteenMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_G1_Old_Gen_committedkafka_topics_InvalidMagicNumberRecordsPerSec_FiveMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_G1_Old_Gen_usedkafka_topics_InvalidMagicNumberRecordsPerSec_MeanRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_G1_Survivor_Space_initkafka_topics_InvalidMessageCrcRecordsPerSec_FifteenMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_G1_Survivor_Space_usedkafka_topics_InvalidOffsetOrSequenceRecordsPerSec_FiveMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_Code_Cache_initkafka_topics_InvalidOffsetOrSequenceRecordsPerSec_MeanRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_Code_Cache_maxkafka_topics_InvalidOffsetOrSequenceRecordsPerSec_OneMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_CodeHeap__non_nmethods__committedkafka_topics_MessagesInPerSec_Count
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_CodeHeap__profiled_nmethods__usedkafka_topics_MessagesInPerSec_FifteenMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_Compressed_Class_Space_usedkafka_topics_MessagesInPerSec_FiveMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_G1_Eden_Space_committedkafka_topics_NoKeyCompactedTopicRecordsPerSec_Count
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_G1_Eden_Space_initkafka_topics_NoKeyCompactedTopicRecordsPerSec_FifteenMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_G1_Eden_Space_maxkafka_topics_NoKeyCompactedTopicRecordsPerSec_FiveMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_G1_Old_Gen_committedkafka_topics_NoKeyCompactedTopicRecordsPerSec_MeanRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_G1_Old_Gen_initkafka_topics_ProduceMessageConversionsPerSec_FifteenMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_G1_Old_Gen_usedkafka_topics_ProduceMessageConversionsPerSec_OneMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_G1_Survivor_Space_maxkafka_topics_ReassignmentBytesInPerSec_Count
kafka_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_Metaspace_usedkafka_topics_ReassignmentBytesInPerSec_FifteenMinuteRate
kafka_java_lang_GarbageCollector_LastGcInfo_startTimekafka_topics_ReassignmentBytesInPerSec_FiveMinuteRate
kafka_java_lang_Memory_HeapMemoryUsage_committedkafka_topics_ReassignmentBytesInPerSec_MeanRate
kafka_java_lang_Memory_HeapMemoryUsage_initkafka_topics_ReassignmentBytesInPerSec_OneMinuteRate
kafka_java_lang_Memory_HeapMemoryUsage_usedkafka_topics_ReassignmentBytesOutPerSec_Count
kafka_java_lang_MemoryPool_CollectionUsage_committedkafka_topics_ReassignmentBytesOutPerSec_FifteenMinuteRate
kafka_java_lang_MemoryPool_CollectionUsage_initkafka_topics_ReassignmentBytesOutPerSec_MeanRate
kafka_java_lang_MemoryPool_CollectionUsage_maxkafka_topics_ReassignmentBytesOutPerSec_OneMinuteRate
kafka_java_lang_MemoryPool_CollectionUsage_usedkafka_topics_ReplicationBytesInPerSec_Count
kafka_java_lang_MemoryPool_CollectionUsageThresholdSupportedkafka_topics_ReplicationBytesInPerSec_MeanRate
kafka_java_lang_MemoryPool_PeakUsage_committedkafka_topics_ReplicationBytesOutPerSec_Count
kafka_java_lang_MemoryPool_PeakUsage_initkafka_topics_ReplicationBytesOutPerSec_FiveMinuteRate
kafka_java_lang_MemoryPool_PeakUsage_maxkafka_topics_ReplicationBytesOutPerSec_MeanRate
kafka_java_lang_MemoryPool_PeakUsage_usedkafka_topics_ReplicationBytesOutPerSec_OneMinuteRate
kafka_java_lang_MemoryPool_Usage_committedkafka_topics_TotalFetchRequestsPerSec_Count
kafka_java_lang_MemoryPool_Usage_initkafka_topics_TotalFetchRequestsPerSec_FifteenMinuteRate
kafka_java_lang_MemoryPool_Usage_maxkafka_topics_TotalFetchRequestsPerSec_FiveMinuteRate
kafka_java_lang_MemoryPool_Usage_usedkafka_topics_TotalFetchRequestsPerSec_MeanRate
kafka_java_lang_MemoryPool_UsageThresholdSupportedkafka_topics_TotalProduceRequestsPerSec_FiveMinuteRate
kafka_java_lang_OperatingSystem_CommittedVirtualMemorySizekafka_topics_TotalProduceRequestsPerSec_MeanRate
kafka_java_lang_OperatingSystem_FreePhysicalMemorySizekafka_topics_TotalProduceRequestsPerSec_OneMinuteRate
kafka_java_lang_OperatingSystem_MaxFileDescriptorCountkafka_zookeeper_auth_failures_FifteenMinuteRate
kafka_java_lang_OperatingSystem_ProcessCpuTimekafka_zookeeper_auth_failures_FiveMinuteRate
kafka_java_lang_OperatingSystem_TotalSwapSpaceSizekafka_zookeeper_authentications_Count
kafka_java_lang_Runtime_BootClassPathSupportedkafka_zookeeper_authentications_OneMinuteRate
kafka_java_lang_Threading_CurrentThreadCpuTimekafka_zookeeper_disconnects_FiveMinuteRate
kafka_java_lang_Threading_SynchronizerUsageSupportedkafka_zookeeper_expires_FifteenMinuteRate
kafka_java_lang_Threading_ThreadAllocatedMemoryEnabledkafka_zookeeper_expires_FiveMinuteRate
kafka_java_lang_Threading_ThreadAllocatedMemorySupportedkafka_zookeeper_expires_MeanRate
kafka_java_lang_Threading_ThreadCpuTimeEnabledkafka_zookeeper_expires_OneMinuteRate
kafka_network_ResponseQueueSizeValuekafka_zookeeper_readonly_connects_FifteenMinuteRate
kafka_partition_LogEndOffsetkafka_zookeeper_readonly_connects_MeanRate
kafka_partition_LogStartOffsetkafka_zookeeper_sync_connects_FifteenMinuteRate
kafka_partition_NumLogSegmentskafka_zookeeper_sync_connects_MeanRate
kafka_partition_Sizekafka_zookeeper_sync_connects_OneMinuteRate
kafka_partition_UnderReplicatedPartitions
kafka_purgatory_Heartbeat_NumDelayedOperations
kafka_purgatory_Produce_NumDelayedOperations
kafka_purgatory_Produce_PurgatorySize
kafka_purgatory_Rebalance_NumDelayedOperations
kafka_purgatory_topic_NumDelayedOperations
kafka_purgatory_topic_PurgatorySize
Sumo Logic YouTubeSumo Logic Twitter
Legal
Privacy Statement
Terms of Use

Copyright © 2022 by Sumo Logic, Inc.