Check_MK Monitoring Part V: Event Console and SNMP Traps

This is part 5 of the Check_MK Monitoring Guide in which we’ll discuss how to configure the system to receive SNMP traps.  The advantage of  SNMP is it’s vendor neutral;  you can send traps from a wide array of systems, such as hardware appliances, your own deployment shell scripts, Java application, and so on.  Check_MK will process these essentially right away.

This asynchronous alert model is different than everything we’ve looked at so far, where checks were performed by polling agents on a fixed interval.

So far we’ve gone over using Check_MK to do active and passive monitoring.  In both, the Check_MK monitoring server has to connect to the hosts/agents being monitored and retrieve the output of all the checks, even custom checks.  In this way it’s polling the agent and then processing the output.

Using SNMP traps for your monitoring is actually the other way around:  rather than wait for the Check_MK monitoring server to connect to your agent during a regular interval, perhaps you have a batch process, process or just an important event that you’d like the monitoring server to process right away, asynchronously.

First, a word about the Event Console.

Event Console

The Event Console is Check_MK’s enhancement to the underlying Nagios core that allows for real-time processing of events.  These events can be from any app, but most commonly syslog and, for the purposes of this article, SNMP.

To integrate traps – which can be sent at any time – into the Check_MK monitoring system, we need a slightly different way to process incoming messages. Configuring the Event Console to make this possible consists of configuring the following components:

  • mkeventd, a daemon to which you send your messages
  • Check_MK GUI (Multisite), the rules logic, which converts messages into a state, lets you get a real-time view into the messages coming in, including acknowledging them.
  • mkeventd snmptrapd, a daemon which receives your SNMP trap and is specific to the CMK Event Console.

Enabling mkeventd and snmptrapd

Note: first disable any native snmpd daemon on your system to avoid port conflicts.  It is possible to integrate CMK with a native snmpd daemon, but that’s outside the scope of this article.

Run omd config <site> and go into the Addons menu to enable mkeventd and snmptrapd (this will require a restart).

This is optional, but highly recommended:  in the new versions of CMK, unknown traps don’t even get archived in the Event Console, so you never get to verify that they got there.  Let’s change that default behavior by going into Global Settings:

Testing with a command-line trap (Linux)

Now a simple test from the command line is possible.  I use snmptrap on my Linux system, but use it using any client of your choice:

The result should be immediate:

 

An Event Rule is Now Required

Great, so we’re receiving traps.  Since there’s no rule associated with it, the system auto-archives it, something that I find useful when I tell other people to send me traps from their scripts / applications so that I can configure the monitoring system to actually do something about it.  I can confirm I’m getting their traps, and now it’s up to me to configure something to do with them.

So our next step is creating an Event Rule, so that we begin to process the event that we can later create a Notification rule on.

First head to the Event Console section of WATO and expand the default rule pack (or create a new rule pack if you prefer):

Next click New Rule at the top:

I won’t go into all the fields as you can explore the many options by clicking Help (again, the books icon at the top right of the screen) to tell you what some of the fields are for.   There’s a lot you can do with events, such as toggle alerts on and off depending on arbitrary string matching rules.  It’s quite powerful.  Here’s my simple sample rule:

Now when I run the snmptrap I can see that the rule is being applied, and my level of WARN for this event is being honored.  Again, you can configure a cancel / resolve type action if you want to automatically resolve a problem, which can be exceedingly useful.  But back to our event rule result:

It looks like it’s made exactly the type of transformations we told it to in the rule above.  Alright.  We’re almost done, because by now you definitely want to hook this into a Notification rule.  Here’s mine for this event:

That’s it.

Final Tip:  It’s helpful to tail /opt/omd/sites/YOURSITEHERE/var/log/notify.log when testing your notification rules.

Happy trapping.