Host Metrics Sumo Logic App
The Host Metrics app allows you to monitor the performance and resource utilization of hosts and processes that your mission critical applications are dependent upon. Preconfigured dashboards provide insight into CPU, memory, network, file descriptors, page faults, and TCP connectors. This app uses the Sumo Logic installed collector for the collection of host metrics data.
Collecting Metrics for the Host Metrics App
This procedure explains how to collect metrics from a host machine and ingest them into Sumo Logic for metrics visualization.
Configure a Collector
Configure an Installed Collector. Collectors can be installed on Linux, Windows, or Mac OS hosts.
Configure a Source
- Configure a Host Metrics Source. Choose Add Source and select Host Metrics as the source type.
- Configure the Source Fields as follows:
- Name. Required. Description is optional. The source name is stored in a searchable field called
_sourceName
. - Source Host. Enter the host name of the machine from which the metrics will be collected.
- Source Category. Required. The Source Category metadata field is a fundamental building block to organize and label Sources. For details see Best Practices.
- Scan Interval. Select the frequency for the Source to scan for hostmetrics data. Selecting a short interval will increase the message volume and could cause your deployment to incur additional charges. The default is 1 minute.
- Metrics. Select check boxes for the metrics to collect. By default, all CPU and memory metrics are collected. Select the top level check box to select all metrics in that category. A blue checkmark icon indicates that the category is selected. To select individual metrics, click the right-facing arrow to expand the category and select the individual metrics. The icon changes to , as shown below.
- Name. Required. Description is optional. The source name is stored in a searchable field called
- Click Save.
Metric Types
Available metrics include:
- CPU
- Memory
- TCP
- Network
- Disk
The metrics that are collected are described in Host Metrics for Installed Collectors.
Host metrics are gathered by the open-source SIGAR library.
The following tables list the available host metrics.
CPU Metrics
Metric | Units | Description |
CPU_User | % | Total system cpu user time |
CPU_Sys | % | Total system cpu kernel time |
CPU_Nice | % | Total system cpu nice time |
CPU_Idle | % | Total system cpu idle time |
CPU_IOWait | % | Total system cpu IO wait time |
CPU_Irq | % | Total system cpu time servicing interrupts |
CPU_SoftIrq | % | Total system cpu time servicing softirqs |
CPU_Stolen | % | Total system cpu involuntary wait time |
CPU_LoadAvg_1min* | Average | System load average for past 1 minute |
CPU_LoadAvg_5min* | Average | System load average for past 5 minutes |
CPU_LoadAvg_15min* | Average | System load average for past 15 minutes |
CPU_Total | % | Total system CPU usage time |
Load averages are not available on Windows platform.
Memory Metrics
Metric | Units | Description |
Mem_Total | Bytes | Total amount of physical RAM |
Mem_Free | Bytes | The amount of physical RAM left unused by the system |
Mem_Used | Bytes | Total used system memory, calculated as
This metric includes the space allocated in buffers and in the Page Cache, which can make it appear that a larger portion of physical RAM is being consumed than is actually in use. See |
Mem_ActualFree | Bytes | Actual total free system memory calculated as:
Where
|
Mem_ActualUsed | Bytes | Actual total used system memory calculated as: Mem_Total - Mem_Actual_Free This metric better represents the amount of physical RAM in use than Mem_Used . |
Mem_UsedPercent | % | Percent total used system memory calculated as: (Mem_Total - Mem_Actual_Free) / Mem_total |
Mem_FreePercent | % | Percent total free system memory |
Mem_PhysicalRam | Bytes | System random access memory |
TCP Metrics
Metric | Units | Description |
TCP_InboundTotal | Count | TCP inbound connection count |
TCP_OutboundTotal | Count | TCP outbound connection count |
TCP_Established | Count | TCP established connection count |
TCP_Listen | Count | TCP listen connection count |
TCP_Idle | Count | TCP idle connection count |
TCP_Closing | Count | TCP closing connection count |
TCP_CloseWait | Count | TCP close_wait connection count |
TCP_Close | Count | TCP close connection count |
TCP_TimeWait | Count | TCP time_wait connection count |
Networking Metrics
These have two additional dimensions:
- Interface: Name of the network interface (example:
eth0
) - Description: Description of the network interface (example:
Dual Band Wireless-AC 8265
)
Networking metrics are cumulative, so you can use the rate operator to display these metrics as a rate per second. For example: metric=Net_InBytes Interface=eth0 | rate
.
Metric | Units | Description |
Net_InPackets | Packets | Number of received packets |
Net_OutPackets | Packets | Number of sent packets |
Net_InBytes | Bytes | Number of received bytes |
Net_OutBytes | Bytes | Number of sent bytes |
Disk Metrics
Disk metrics have two additional dimensions:
- DevName: Device name, such as the mount name (example:
udev
) - DirName: Directory name, such as the mount directory (example:
/dev
)
Disk_Reads
, Disk_Writes
, Disk_ReadBytes
, and Disk_WriteBytes
are cumulative, so you can use the rate operator to display these metrics as a rate per second. For example: metric=Disk_WriteBytes | rate
.
Metric | Units | Description |
Disk_Reads | Operations | Number of physical disk reads |
Disk_ReadBytes | Bytes | Number of physical disk bytes read |
Disk_Writes | Operations | Number of physical disk writes |
Disk_WriteBytes | Bytes | Number of physical disk bytes written |
Disk_Queue | Operations | Number of disk queue operations |
Disk_InodesAvailable* | Nodes | Number of free file nodes |
Disk_Used | Bytes | Total used bytes on filesystem |
Disk_UsedPercent | % | Percentage of filesystem space used |
Disk_Available | Bytes | Total available bytes on filesystem |
Disk_InodesAvailable
is not available on Windows platform.
Time Intervals
The time interval determines how frequently the Source is scanned for metrics data. Sumo Logic supports pre-specified time intervals (10 seconds, 15 seconds, 30 seconds, 1 minute, and 5 minutes).
You can also specify a time interval in JSON by using the interval parameter, as follows:
"interval" : 60000
The JSON parameter is in milliseconds. We recommend 60 seconds (60000 ms) or longer granularity. Specifying a shorter interval will increase the message volume and could cause your deployment to incur additional charges.
AWS Metadata
Collectors running on AWS EC2 instances can optionally collect AWS Metadata such as EC2 tags to make it easier to search for Host Metrics. For more information, see AWS Metadata Source for Metrics.
Only one AWS Metadata Source for Metrics is required to collect EC2 tags from multiple hosts.
Installing the Host Metrics App
Now that you have configured Host Metrics, install the Sumo Logic App for Host Metrics to take advantage of the preconfigured searches and dashboards to analyze your Host Metrics data.
To install the app:
Locate and install the app you need from the App Catalog. If you want to see a preview of the dashboards included with the app before installing, click Preview Dashboards.
- From the App Catalog, search for and select the app.
- Select the version of the service you're using and click Add to Library. Version selection is applicable only to a few apps currently. For more information, see the Install the Apps from the Library.
- To install the app, complete the following fields.
- App Name. You can retain the existing name, or enter a name of your choice for the app.
- Data Source. Select either of these options for the data source.
- Choose Source Category, and select a source category from the list.
- Choose Enter a Custom Data Filter, and enter a custom source category beginning with an underscore. Example: (
_sourceCategory=MyCategory
).
- Advanced. Select the Location in Library (the default is the Personal folder in the library), or click New Folder to add a new folder.
- Click Add to Library.
Once an app is installed, it will appear in your Personal folder, or other folder that you specified. From here, you can share it with your organization.
Panels will start to fill automatically. It's important to note that each panel slowly fills with data matching the time range query and received since the panel was created. Results won't immediately be available, but with a bit of time, you'll see full graphs and maps.
Viewing Host Metrics Dashboards
Overview
Overall Average CPU Idle. Displays the CPU idle time averaged across all hosts in a line chart on a timeline for the last hour. You can modify the list of hosts using the provided filters.
Overall Average CPU Load (1m, 5m, 15m). Shows the CPU load time for one, five, and 15 minutes averaged across all hosts in a line chart on a timeline for the last hour.
Total Free System Memory per Host. Provides information on the total free system memory per host in a line chart on a timeline for the last hour.
Total Used, Less Buffers and Cached Memory per Host. Displays the total memory used less buffers and cached memory per host in a line chart on a timeline for the last hour.
Disk Used Bytes per Host. Shows the disk used bytes per host in a line chart on a timeline for the last hour.
Disk Available Bytes per Host. Provides the disk available bytes per host in a line chart on a timeline for the last hour.
Network InBytes Rate per Host. Displays the rate of network InBytes per host in a line chart on a timeline for the last hour.
Network OutBytes Rate per Host. Shows the rate of network OutBytes per host in a line chart on a timeline for the last hour.
CPU
CPU User Time per Host. Displays the CPU user time per host in a line chart on a timeline for the last hour.
Overall Average CPU User Time. Shows the CPU user time averaged across all hosts in a line chart on a timeline for the last hour.
CPU System Time per Host. Provides details on CPU system time per host in a line chart on a timeline for the last hour.
Overall Average CPU System Time. Displays the CPU system time averaged across all hosts in a line chart on a timeline for the last hour.
CPU 1 min Average Load per Host. Shows the CPU 1 minute average load per host in a line chart on a timeline for the last hour.
Overall Average CPU Load (1m, 5m, 15m). Provides the CPU load time for one, five, and 15 minutes averaged across all hosts in a line chart on a timeline for the last hour.
CPU Idle Time per Host. Displays the CPU idle time per host in a line chart on a timeline for the last hour.
Overall Average CPU Idle Time. Shows the CPU idle time averaged across all hosts in a line chart on a timeline for the last hour.
CPU IO Wait Time per Host. Displays the CPU IO wait time per host on a line chart on a timeline for the last hour
Disk
Disk Used Bytes per Host. Displays disk used bytes per host in a line chart on a timeline for the last hour.
Disk Available Bytes per Host. Shows disk available bytes per host in a line chart on a timeline for the last hour.
Disk Read Rate per Host. Provides details on disk read rate per host in a line chart on a timeline for the last hour.
Disk Read Byte Rate per Host. Displays disk read byte rate per host in a line chart on a timeline for the last hour.
Disk Write Rate per Host. Shows disk write rate per host in a line chart on a timeline for the last hour.
Disk Write Byte Rate per Host. Provides details on disk write byte rate per host in a line chart on a timeline for the last hour.
Memory
Total Memory per Host. Displays total memory per host in a line chart on a timeline for the last hour.
Percent Memory Used per Host. Shows percent memory used per host in a line chart on a timeline for the last hour.
Total Free, Buffers, and Cached Memory per Host. Provides details on the total free, buffers, and cached memory per host (from a metric called ActualFree) in a line chart on a timeline for the last hour.
Total Used, Less Buffers, and Cached Memory per Host. Displays the total used, buffers, and cached memory (from a metric called ActualUsed) in a line chart on a timeline for the last hour.
Total Free Memory per Host. Shows the amount of total free memory per host available in a line chart on a timeline for the last hour.
Total Used System Memory per Host. Provides details on the total system memory per host used in a line chart on a timeline for the last hour.
Network
Network InPacket Rate per Host. Displays network InPacket rate per host in a line chart on a timeline for the last hour.
Network OutPacket Rate per Host. Shows network OutPacket rate per host in a line chart on a timeline for the last hour.
Network InByte Rate per Host. Provides details on network InByte rate per host in a line chart on a timeline for the last hour.
Network OutByte Rate per Host. Displays network OutByte rate per host in a line chart on a timeline for the last hour.
TCP
Inbound Connections per Host. Displays inbound connections per host in a line chart on a timeline for the last hour.
Outbound Connections per Host. Shows outbound connections per host in a line chart on a timeline for the last hour.
Listen Connections per Host. Provides details on listen connections per host in a line chart on a timeline for the last hour.
Established Connections per Host. Displays established connections per host in a line chart on a timeline for the last hour.
CloseWait Connections per Host. Shows CloseWait connections per host in a line chart on a timeline for the last hour.
TimeWait Connections per Host. Provides details on TimeWait connections per host in a line chart on a timeline for the last hour.
Filters
The supported filters are:
_sourceCategory
_sourceHost
_source
_collector