Skip to main content

Windows Sensor Health Status Messages

note

The CSE Windows Sensor has reached end of life and is no longer supported. Please migrate to a Sumo Logic  Installed Collector. For more information see the end of life notice

This page has information about the health events generated by versions 1.10 and higher of the CSE Windows sensor. You can view health events in the CIP interface to learn about issues that may be affecting the sensor’s effectiveness in reporting event logs and directory inventory from your network.

To view health events, go to Alerts > Health Events in the CIP UI.

Health status messages are grouped into general categories, called Tracker IDs. Each Tracker ID can be broken down into a set of Error IDs that more specifically report potential problems. Each Tracker ID and its associated Error IDs are described below.

CSE Windows Error

Invalid Configuration Detected

Cause: There is a problem with the sensor’s configuration settings that prevents the sensor from running. The health status message Description provides more information about the issue. 

Example Description: The EventLogForwarderEnable setting is set to true, but an EventLogForwarderHostName has not been specified. 

Clears when: This error clears when the configuration issue has been resolved by editing and saving the settings.conf file, and the sensor has been restarted. Allow several minutes after successful sensor boot to complete.

Troubleshooting: Fix the error described in the health status event Description. Review the sensor log files for additional details. For information about sensor configuration options, see Windows Sensor Configuration Options Configuration Settings.

Runtime Error

Cause: A critical error occurred while the sensor was running. The health status message Description will provide more information about the issue. 

Example Description: An exception occurred in the main processing loop.

Clears when: This error clears when no additional runtime errors have occurred in the past 30 minutes.

Troubleshooting: Review the sensor log files for additional details. You may need to restart the sensor.

Runtime Warning

Source: General, Event Log, or Directory

Cause: The sensor has detected something unusual while running. The health status message Description will provide more information about the specific issue(s)  that occurred. It might occur under the following circumstances:

  • Many Event Log (or Directory) Upload failures have occurred. If the sensor encounters more than 10 errors while trying to upload batch files during the Status Report Interval (5 minutes, by default), a Runtime Warning is reported. Warning clears when the number of errors during the interval drops below 10.

    Example Description: Many Event Log Upload failures have occurred.  

  • The Sensor has detected an internal thread not running, and has attempted to restart it. Warning clears after 30 minutes, if no additional issues are detected during that time.

    Example DescriptionDirectoryUploadThread not running. Attempting to restart it.

Troubleshooting: Review the sensor log files for additional details. Make sure that the computer running the sensor service has sufficient resources, and that the network connection to Sumo or CSE server is stable, with firewall and proxy settings properly configured, as appropriate, as appropriate. Confirm that Ingest Budgets have not been exceeded.

Too Many Event Log Monitors

Cause: When the sensor runs in Domain Controller mode for event log monitoring, the sensor periodically checks the number of event logs that it is monitoring. This warning occurs in two situations, differentiated by the accompanying health status event Description.

  • The Sensor detects more Domain Controllers than the maximum number of recommended Domain Controllers to be monitored by a single CSE sensor instance. (25 Domain Controllers is the maximum, by default.)

    Example Description: Current number of running EventLogWatchers (27) exceeds the maximum recommended (25). Please enable WEC Mode.  

  • The Sensor detects zero Domain Controllers, despite being configured to monitor Domain Controllers.

    Example Description: Event Log Monitoring for Domain Controllers is enabled, but there were no DCs detected.

Clears when: This error clears when the number of active Domain Controllers falls within the expected range.

Troubleshooting: Check the health status event Description and sensor log files for more information. The sensor configuration may need to be adjusted based on the number of Domain Controllers on the domain. You can increase the MaxDomainControllerConnections setting if sensor performance is considered to be an issue. Otherwise, the sensor may need to be reconfigured to either monitor a subset of the Domain Controllers on the network or to use WEC mode

CSE Windows Access Error

Invalid User Permissions

An Invalid User Permissions error can occur when the account running the sensor service does not have the required permissions, or when the sensor is unable to connect to one or more Event Logs. You can tell the difference between these error conditions based on the Health Status event Description and Source fields.

Invalid User Permissions: Service Account Permissions 

Source: BootPermissions

Cause: This error occurs when the sensor service is running under a Windows Service account rather than the Local System account, and the service account does not have sufficient permissions. When a BootPermissions error occurs, the sensor service will fail to start.

  • The Windows Service account is not in the Event Log Reader group of the domain (when the sensor is monitoring Domain Controllers). Example Description:   Username\<usernam\> is not in Event Log Reader Group of domain: mydomain  
  • The Windows Service account is not in the Event Log Reader group of the local computer (when the sensor is monitoring the local computer). Example DescriptionUsername\<usernam\> is not in Event Log Reader Group of the localhost  
  • The Windows Service account is not in the Performance Monitor Users group of the local computer. Example Description: Username\<usernam\> is not in Performance Monitor Users Group of the localhost

Clears when: This error clears when the permissions issues have been corrected and the sensor has successfully restarted. Allow several minutes after successful sensor boot for error to clear.

Troubleshooting: Check the health status event Description for more information about the specific permissions that are missing. Sensor log files may have additional details. Adjust the user permissions on the Windows Domain or on the local computer and restart the sensor service.

Invalid User Permissions: Unable to Connect to Event Log 

Source: Event Log Server Name - Channel Name

Cause: The sensor service periodically checks all specified Event Log monitors (localhost, Domain Controller, and/or WEC) to make sure that it can both connect to the server and access the specified event channel. When the test connection fails, this error is reported and the sensor ceases to monitor events for this server and channel for a few minutes before attempting to reconnect.

Example descriptions:

The chosen log Security is inaccessible due to security, server name, or server is inaccessible.  Server myserver was reported to be turned on.  Please check permissions.

Unable to connect to event log Security on myserver.  myserver may be offline.

Clears when: This error clears when the sensor service is able to successfully re-connect to the specified channel. Allow several minutes after successful connection for the error status to clear.

Troubleshooting: Check the health status event Description and sensor log files for more information. Make sure that the target server is online and that the sensor service can access it on the network. Confirm that the sensor service account has the appropriate permissions to read the Event Logs.

Unable To Write To Queue Files

Source: Event Log or Directory

Cause: The sensor writes incoming Event Log and Directory data to temporary queue files, stored in a queue folder, while the data awaits uploading. This error occurs when the sensor is unable to find or create the queue folder, or when it fails to write or delete a file to the queue directory. When this error occurs, the sensor fails to start.

Clears when: The sensor service is able to create and locate the queue directory, as well as write and delete a file to this directory. Allow several minutes after successful sensor restart for the error status to clear.

  
Troubleshooting: Check the health status event Description and sensor log files for more information about the specific error condition that occurred. Queue directories are located on the computer running the sensor service. The Event Log queue folder should be located at C:\ProgramData\Sumo Logic\CSE Windows Sensor\EventLogQueue, and the Directory queue folder is located at C:\ProgramData\Sumo Logic\CSE Windows Sensor\DirectoryQueue. Make sure that the service account running the sensor service has appropriate permissions to create and delete folders and files in these locations.

CSE Windows Sensor Out of Storage

Approaching Storage Limit

Source: General, Event Log Queue or Directory Queue 

Cause: The sensor has detected that it is close to hitting configured limits for available hard disk space or queue folder size. This warning is generated when conditions are within 10% (default, configurable as DiskSpaceWarningThresholdPercentOfLimit) of the configured space limit. For more information about storage limits, see MaxDirectoryQueueFolderDirectorySize, MaxEventLogQueueFolderDirectorySize, and MinPercentDiskSpaceLeft in Windows Sensor Configuration Settings.

 
Clears when: This error clears when storage conditions have improved so that they are no longer within 10% of the space limit, or have escalated to “Storage Limit Exceeded” error. Allow several minutes after the disk space issue has been resolved for the warning status to clear.

Troubleshooting: Check the health status event Description and Source fields for more information. Look for details about storage limits and current storage status in the health status event fields. General disk space issues may be caused by low hard drive space conditions unrelated to the sensor service, and should be resolved by clearing space on the hard drive. When queue folder sizes are approaching limits, it is likely caused by an upload backlog. (See CSE Windows Excessive Backlog error, below.)

Storage Limit Exceeded

Source: General, Event Log Queue or Directory Queue 

Cause: The sensor has detected that it has exceeded configured limits for available hard disk space or queue folder size, or that exceeding the limit is imminent. This error is generated when the sensor begins dropping queued messages to free up space. For more information about storage limits, see MaxDirectoryQueueFolderDirectorySize, MaxEventLogQueueFolderDirectorySize, and MinPercentDiskSpaceLeft in Windows Sensor Configuration Settings.

 
Clears when: This error clears when storage conditions have improved so that the sensor is no longer dropping messages. Allow several minutes after the disk space issue has been resolved for the error status to clear.

Troubleshooting: Check the health status event Description and Source fields for more information. Look for details about storage limits and current storage status in the health status event fields. General disk space issues may be caused by low hard drive space conditions unrelated to the sensor service, and should be resolved by clearing space on the hard drive. When queue folder sizes are approaching limits, it is likely caused by an upload backlog. (See Windows Excessive Backlog, below.)

Errors Appending To Queue Files

Source: Directory, Event Log Server Name - Channel Name

Cause: The sensor has detected that a large number of errors have occurred while attempting to write records to queue files. For Directory records, this warning is sent whenever 20% or more of the Active Directory record appends resulted in errors during an Active Directory dump. For Event Log records, this warning is sent when more than 500 event log record appends have resulted in errors in the past hour.

 
Clears when: For Directory records, the warning is cleared whenever less than 5% of the Active Directory record appends resulted in errors. For Event Log records, the warning is cleared when fewer than 20 event log record appends have resulted in errors in the past hour.

Troubleshooting: Check the health status event Description, Source, and Last Error fields for more information. Sensor log files may provide additional detail, especially with TRACE logging enabled. Make sure that the computer running the sensor service has sufficient resources, check hard drive health and disk space, and confirm that the service account running the sensor still has the appropriate permissions on the queue folder(s).

CSE Windows Parsing Error

Errors While Parsing Records

Source: Directory, Event Log Server Name - Channel Name

Cause: The sensor has detected that a large number of errors have occurred while attempting to parse incoming event log or directory entry records into JSON. For Directory records, this warning is sent whenever 20% or more of the Active Directory record appends resulted in errors during an Active Directory dump. For Event Log records, this warning is sent when more than 500 event log record parses have resulted in errors in the past hour.

 
Clears when: For Directory records, the warning is cleared whenever less than 5% of the Active Directory record parses resulted in errors. For Event Log records, the warning is cleared when fewer than 20 event log record parses have resulted in errors in the past hour.

Troubleshooting: Check the health status event Description, Source, and Last Error fields for more information. Sensor log files may provide additional detail, especially with TRACE logging enabled \<link to how to turn on TRACE logging in Troubleshooting document).

Issues that may cause parse errors include:

  • Event logs that have a format not supported by the CSE Windows Sensor parser. Events that cannot be parsed may be sent in raw XML instead of JSON.
  • Invalid Event Log handles may be caused if the sensor loses access to the Event Log source between being notified of the event and parsing the event. This can happen if the source computer reboots, or if the event log is deleted from the source based on Event Log data retention policies.
  • An exception occurred while retrieving a directory entry.
  • An exception occurred while serializing a directory entry.
  • Other exceptions while handling incoming event or directory records. See sensor log files for more details.

CSE Windows Excessive Backlog

When the sensor service is not able to send records to the CIP or CSE servers faster than new records are being processed, a backlog may develop. These two Error IDs report symptoms that could indicate that a backlog has formed.

Too Many Files Pending Upload

Source: Event Log or Directory

Cause: The sensor has detected that a larger than expected number of files are queued, awaiting upload. 

Clears when: This error clears when the number of files in the queue directory drops below the threshold. Allow several minutes after the backlog has cleared for the warning status to clear.

Troubleshooting: Check the health status event Source field to determine which type of backlog may be occurring. The Description and Number of Files Pending fields on the health status event will provide clues to the severity of the potential backlog.

If the warning status persists over several hours or more, check the CIP UI to determine if there are larger than expected delays between event time and event receipt time, which would further confirm the possibility of an upload backlog.

On the computer running the sensor service, use Windows File Explorer to view the contents of the Event Log queue folder or the Directory Queue folder, as appropriate. Additionally, the sensor logs should be examined for errors relating to queue file upload.

If a backlog is confirmed, a more thorough analysis needs to be done to determine a solution. A minor backlog may be mitigated by adjusting sensor settings like DirectoryMaxParallelUploads, EventLogMaxParallelUploads, and the thresholds mentioned above. For more severe, recurring backlogs, some factors to consider are the resources of the computer running the sensor, including processing power and network bandwidth, the rate of incoming records, the number of Domain Controllers being monitored per sensor, and so on.

Oldest Record Timestamp In Queue Exceeds Threshold

Source: Event Log or Directory

Cause: The sensor has detected at least one file in a queue folder with a timestamp older than the specified threshold. 

Clears when: This error clears when the sensor detects no files in a queue directory that are older than the specified threshold. Allow several minutes after the backlog has cleared for the error status to clear.

Troubleshooting: Check the health status event Source field to determine which type of backlog may be occurring. The Description and Oldest Timestamp in Queue fields on the health status event will provide clues to the severity of the potential backlog.

If the warning status persists over several hours or more, use the CIP UI to determine if there are larger than expected delays between event time and event receipt time, which would further confirm the possibility of an upload backlog.

On the computer running the sensor service, use Windows File Explorer on the computer running the sensor service to view the contents of the Event Log queue folder or the Directory Queue folder, as appropriate.

If a backlog is confirmed, you’ll need to perform a more thorough analysis to determine a solution. A minor backlog may be mitigated by adjusting sensor settings like DirectoryMaxParallelUploads, EventLogMaxParallelUploads, and the thresholds mentioned above. For more severe, recurring backlogs, some factors to consider are the resources of the computer running the sensor, including processing power and network bandwidth, the rate of incoming records, the number of Domain Controllers being monitored per sensor, and so on.

note

Occasionally, this warning can be triggered by “orphaned” queue files. If the Oldest Timestamp field is more than several hours old, but there is no unusual delta between event time and event receipt time, a single orphaned queue file may be the culprit rather than an upload backlog. You can resolve orphaned queue files by restarting the sensor service, which will force upload of the orphaned file, or by manually deleting the old file from the queue folder.