Check_MK Monitoring Part III – Check Types

By now you’ve got some basic knowledge of Check_MK and you’d like to understand what the types of checks there are, and you want to be able to quickly spin up new checks as-needed.

If you came here and instead want an overview of Check_MK, head to part 1 of this series.

If you’re already an expert on the basics, then you should skip this and head to the Tuning Guide in Part IV.

Types of Check_MK Checks

The Check_MK terminology around the check types wasn’t initially clear to me.  Only through trial and error did I learn about the different check types and the advantages of each.  Note:  I attempted to categorize them to the best of my ability, but the most pedantic Check_MK or Nagios experts may disagree with some of the points below.

First we start with the ‘out-of-the-box’ checks.

Check Type:  Built-In Agent  (Passive)

Explanation

Passive in this context means that the checks are run local to the host.

(Linux)  The monitoring agent (/usr/bin/check_mk_agent on my system), as explained in Part II, is a simple executable shell script that outputs ASCII text when run.  It does  a wide range of various checks, from ‘df’ type checks to cat’ing and parsing files within /proc to generate metrics.  The sections are separated by a service name surrounded by ‘<<<>>>‘  characters.  The monitoring server connects to (by default) port 6556 of the hosts it’s monitoring, which hits xinetd (CentOS/RH 6) or systemd (CentOS/RH 7 and I’m guessing the later Debian/Ubuntu variants as well), which is instructed to run the shell script.  Once you have the agent installed, you can test this yourself by just telnetting to localhost 6556 and seeing the ASCII output.    That’s what CMK is parsing.

Out of the box there are about two dozen checks (again, called “services”) give or take depending on how many network devices, disks and so on that the host has – it’s all automatically detected.

It is critical to understand the following points:

  1. Out of the box, these checks run every minute, and are initiated from the monitoring server
  2. Even if you haven’t added the Services that Check_MK has discovered (which it will do and turn into a yellow WARN for), the check IS running, and the stats are even being stored in .RRD files.
  3. However, until you Activate (under Hosts in WATO), no alerts will ever occur, no matter what the rest of your configuration.

Points 2 and 3 apply to almost all the check types (except SNMP traps, which are a different animal).

What about SNMP on the host itself, snmpd on Linux or snmp built into a vendor appliance?   Well, these are also services, and some of them overlap with the regular agent (like disks) and some don’t (like CPU temp).  This will be available as services just like other checks.  You configure SNMP at the folder level under Hosts.

 

Configuration

If you want to get started with these checks, there’s nothing you need to do other than install the agent, found under WATO->Monitoring Agents and provided in the form of .rpm, .deb, and .msi files (for you Windows lovers).

But if you want to configure some of the settings, that tends to be in Parameters for Discovered Services under WATO.  That page is absolutely massive, and it’s unclear which of those are plugins and which are built-ins.  (If the CMK developers are reading this, please categorize and simplify not just the check type but also the SOURCE of the check.)  In any case, one way around that is, after you have your agent installed and services activated:   click on the hostname of the host in the View (All  Hosts, or one of the Overview pages), which takes you to the parameters to the service.  Click on that (a ruler) and this will be a shortcut to the correct rule.

Enjoy!

Check Type:  Generic Plugins

Explanation

I’ve purposely decided that Nagios Plugins (run via the Check_MK equivalent of NRPE called MRPE) belong in a separate section in this tutorial, in a below section.  That’s because in practice, I add and configure them differently.  The word “plugins” is actually more vague than it should be in Check_MK-land, so read on.

Check_MK’s plugins are run from the /usr/lib/check_mk_agent/plugins directory, and run when the agent runs (again, every minute by default), each and every time.   So, technically they are still passive checks.

To install find them, look at the Catalog of Check Plugins under WATO.  Drop the script in the plugins directory above, ensuring that it’s executable.  The next time that the system does an inventory check, it should show up.

Configuration

Most likely it will be orange, for unknown.  That’s because you need to go in the Parameters for Discovered Services section to look for the right place to adjust the settings for the plugin (it may show up in Parameters for the Service but I’m not sure).   This is all unnecessarily complex and I hope future version of CMK make the link more clear.

Pro tip:  as the monitoring user, type cmk -m to browse the plugins in an old-school ANSI menu, or cmk -M to get an ASCII list.

One example of a very useful plugin is “mk_logwatch,” which lets you monitor for the presence of arbitrary regexes/strings in any log file.  If you want a more extensive explanation of logwatch, here is the manual page on the Check_MK official site.  If you want to get up and running quickly with that useful plugin, I go over that in the next section in the Tuning Guide.

The configuration of some of the plugins must be done via config files on disk that you create, like in the logwatch example.  Configuration of other plugins happens on the monitoring server within WATO.

Another example check that falls under the generic check type “manual” and requires no local plugin installation is the process check.  This is found under WATO -> Manual Checks, and it’s called “state and count of processes.”  For example, if you wanted to alert when your httpd process went down, you could literally give it a regular expression and a number of processes to look for.

 

Check Type:  Built-In Server (Active)

Explanation

Active checks in this context means that the monitoring server initiates the checks.  Generally speaking nothing needs to be installed on the client side for the active checks to work.

There are a few active checks available out-of-the-box on the monitoring server.  Some of these are exceedingly useful.   Also, they’re off by default.

Configuration

You do this via rules, as in the screenshot above.  Parameters for some of these can run from simple to quite complex.

One of my favorites is the HTTP check.  If you do this:

Then you eventually get metrics like this:

Some of the other very useful checks include Ping and SSH, which I recommend for almost any environment – free metrics and monitoring.  There’s a generic SQL one that is useful for a few database types.  There’s an LDAP one that saved the lives of Ops in my most recent gig.  “Connect to a TCP Port” is another interesting one that’s flexible and ready to deploy.

One downside is that the monitoring “host” will be the server name of your monitoring server.  It’s not a big problem, but it can be potentially confusing.

Check Type:  NRPE/MRPE Plugins

 

Explanation

MRPE stands for MK’s Remote Plugin ExecutorIt was designed by Matias as a backwards-compatible way to invoke Nagios plugins.  This is useful because there are a number of free plugins written by people around the world, and you can find them on the Nagios Exchange.  MRPE is backwards-compatible with NRPE;  it’s just that the plugins are called in a Check_MK way of doing things.

If you want to read all about MRPE, you can do so here on the official Check_MK page.  But the reason I wrote this section is because I want to give you a summary of what it is, but also give you a context where you can contrast it with other check types.  This is not something I personally got when I read the official docs.

 

Configuration

Download your chosen plugin and drop it in a directory of your choice.  If you’re coming from a Nagios environment, you might have it installed under /usr/lib/nagios/plugins but it doesn’t matter where you put it.

Create  configuration file in /etc/check_mk/mrpe.cfg.  This simple files takes the form:

SERVICE_NAME       /path/to/the/plugin/script  -warn 10 -crit 20

Service name can be anything, and the last set of columns are whatever arbitrary arguments that the script takes.

If everything works, the next time the local agent runs it’s going to read that /etc/check_mk/mrpe.cfg script and parse it.  You can see this for yourself in the /usr/bin/check_mk_agent script, towards the end.

 

Pro Tip – .d configuration files hack

If you’re a Linux guru, you might be a fan of the .d configuration style of packages like sudoers and apache.  The modularization of configuration in this way sometimes makes it more pleasant and clean when integrating with configuration management systems like Puppet.

If you don’t mind doing a little surgery on the Check_MK agent script (path above), you can edit the for loop that reads the mrpe.cfg and add a mrpe.d/*cfg to that line, so it reads both mrpe.cfg and anything in the mrpe.d directory.

Why do I like that?  Because I don’t want my configuration management system to edit mrpe.cfg for the whole site, and use some ugly ruby templating or worse to mix and match.  I think it’s more clean to have individual files for different things, for example for my postgres databases I have my config management push to something like /etc/mrpe.d/postgres.cfg.  It’s just makes more sense, in my opinion, because that goes to the hosts that need it and it’s clear to someone looking in that directory.

Check Type:  Local/Custom  (Passive)

Explanation

I like to call these quick and dirty checks and I like that you can quickly spin up and deploy them yourself without too much hassle.  They can be in any language of your choice so long as, when you run them (with no command line arguments), the output appears in the form:

STATUS      NAME         VARNUM=VALUE;[WARN;CRIT;MIN;MAX] OUTPUT

where

  • STATUS is  one of 0 for OK, 1 for WARN, 2 for CRIT or 3 for UNKNOWN
  • NAME is a spaceless name of the check, the service name
  • The varname and value are for the automatic metrics for graphs that will auto-populate.  These are great, but if you don’t need them then replace this whole section with a –
  • The output is a human readable output, useful when run at the command line or in the output on the monitoring server itself.

Example:

1  CustomTestPerf_ProcCount count=306 WARN: 306 processes greater than 100

Configuration

You don’t need a configuration file for these.  Write a script in any language if your choice, make it executable and drop it in /usr/lib/check_mk_agent/local.

Pro Tip – run check less frequently

By default, as you know by now the monitoring server runs the /usr/bin/check_mk_agent once every minute.  That means all the executable scripts in the local  directory are also run that often.  If you need things to run at other times, simply make a subdirectory in local with a number of minutes that it should run in an interval:

/usr/lib/check_mk_agent/local/600/check_something.py

Would run  every 10 minutes.

Another tip:  write your monitoring script with command line arguments (contrary to what I just told you).  Now make sure it’s not executable, and write a simple wrapper shell script that calls it with the command line arguments of your choice.  This way, no need to hard-code values!

 

SNMP

With SNMP and Check_MK, there are essentially two modes of operation.

If the monitoring server is querying hosts for SNMP information, then it’s acting as an SNMP manager and this functionality is pretty obvious to configure.  When you add a new host in the Hosts section of WATO, you’re asked to select an Agent Type.  An agent type might be a standalone Check_MK agent, an SNMP agent, or both.  This can also be set at the Folder level in the Folder properties.  In this case, the all you’d need is for your server (or network or other device) to have the SNMP agent running (snmpd or equivalent).

The other mode of operation is if Check_MK is configured to also accept SNMP traps.  A device or application can send an asynchronous trap, so that rather than wait for a polling event, it can immediately notify a monitoring server if there’s a problem.  This methodology is more complicated to set up and is detailed in the next section.