Monitoring servers using icinga usually requires a small setup on the monitored server. From the different choices of agents, the nrpe daemon ontop of xinetd is easy enough to set up, while it can provide any kind of information that is available on the monitored server locally. To automate this setup task and as a learning exercise I wrote up an Ansible playbook, which I describe here.
The linux world is big, so you might come here asking: “Why the hell do you use xinetd and nrpe for monitoring?". Well, looking at the available options from the official documentation it turns out there are actually many different techniques which can work, I just happen to always use nrpe in an environment of mostly linux servers. Here is the list of all the options which I should probably check out in a future post:
- SNMP
- SSH
- NSClient++ (https://nsclient.org/)
- NSCA-NG
- NRPE
- Passive Check Results and SNMP Traps
Starting from the bottom: Configuring Ansible
Ansible uses a clientless architecture, but one host still needs to be the “master” or point-of-execution. Using the following playbook, Ansible can be installed on the monitoring-server or even on a different machine, like a virtual machine on the engineers laptop, which is simply connected to the network during installation or maintenance.
After an easy installation of Ansible through the package manager we are left with establishing the ssh-connection between the master and the servers involved in the monitoring setup. An article of the Linux Journal is a great ressource on how to go about this. Basically, the more comfortable you are able to connect to the servers you are dealing with using Ansible, the less secure it will be. In my case I decided for an insecure but comfortable solution for the testing setup on my laptop: a password-less ssl-keypair which allows the holder of the private key to log into the remote server’s root account. While this really screws any security mechanisms that might have been established before, it is probably optimal for playing around and learning Ansible.
ssh-keygen
# repeat the step below for every involved server
ssh-copy-id -i .ssh/id_dsa.pub root@remote.computer.ip
ssh root@remote.computer.ip
After the ssh-setup we need to declare the servers on the Ansible-server. In my setup i declared the exact IPs in /etc/hosts and refered to the declared hostnames in /etc/ansible/hosts. This is due to the fact that my test-environment doesn’t have static IP addresses, and I’d like to change them conveniently all at one place (/etc/hosts) in case the IP addresses happen to change.
The servers involved in the monitoring-setup will be either monitored-servers or monitoring-servers, so it makes sense to declare these two groups for Ansible in /etc/Ansible/hosts something like this:
[monitored_servers]
server1.laptop
server2.laptop
[monitoring_servers]
icinga2.laptop
The Ansible Playbook
The playbook itself is a .yml file and is easily readable once you get the hang of it. It is helpful to know that in this Ansible file every entry besides “name”, “hosts”, “tasks” and the conditional “when” refers to an Ansible module. This means if you stumble across any section of this .yml file and it is not clear to you how it works, simply search for the name in the Ansible module documentation. To me all the different names where confusing at the start and once I figured this out playbooks became much easier to read.
Things worth pointing out:
- For this task we need a specific monitoring package. This has a different name depending on the platform, which is exactly what we load at the beginning below “Include OS specific variables”. This depends on two short additional files which can be found below this playbook.
- Depending on the OS / monitoring package the check used to test the connectivity in the end also is in a different location. To solve this problem the playbook first searches for the location of the check using find, and pipes the output of that into the final command.
- The playbook copies over the nrpe configuration file, which is expected to be in nrpe.d/nrpe relative to the playbook path. The configuration of this file can also found below.
---
- name: Include OS specific variables
hosts: all
tasks:
- debug: var=hostvars[inventory_hostname]['ansible_distribution']
- include_vars: oel.yml
when: ansible_distribution == "OracleLinux"
- include_vars: ubuntu.yml
when: ansible_distribution == "Ubuntu"
# TODO include a default case, for now this works though
- name: Install the nrpe check on the monitoring-server
hosts: monitoring_servers
tasks:
- debug: var=monitoring_checks
- name: Install required packages
package:
name: "{{ monitoring_checks }}"
state: latest
- name: Install nrpe daemon with monitoring-configuration
hosts: monitored_servers
tasks:
- debug: var=hostvars[inventory_hostname]['ansible_default_ipv4']['address']
- debug: var=monitoring_checks
- name: Disable the firewall
when: ansible_distribution == "OracleLinux" #TODO handle different firewalls
service:
name: firewalld
enabled: no #TODO create a rule instead of just disabling it
state: stopped
- name: Install required packages
package:
name: nrpe, {{ monitoring_checks }}, xinetd
state: latest
- name: Copy over the configuration file for the nrpe daemon
synchronize:
src: nrpe.d/nrpe
dest: /etc/xinetd.d/nrpe
- name: Add xinetd to autostart and start it
service:
name: xinetd
enabled: yes
state: started
- name: Add the required entry to /etc/hosts
lineinfile:
path: /etc/hosts
state: present
line: "{{ hostvars[item]['ansible_default_ipv4']['address'] }} monitoring-server"
with_items: "{{ groups.monitoring_servers }}"
# it should be ok to have multiple IP addresses under the same host-name in case we have multiple
# monitoring-servers. this way we add them all with the same hostname to /etc/hosts
- name: Test the installation by executing the command locally
raw: "$(find / -name check_nrpe 2>/dev/null | head -n 1) -H localhost -c check_users"
# this command searches for the check_nrpe binary and the executes it.
# this is due to the fact, that depending on the OS / packaging the location differs.
- name: Install the nrpe check on the monitoring-server
hosts: monitoring_servers
tasks:
- debug: var=monitoring_checks
- name: Install required packages
package:
name: "{{ monitoring_checks }}"
state: latest
- name: Test the check_users check from the monitoring_servers
hosts: monitoring_servers
tasks:
- raw: "$(find / -name check_nrpe 2>/dev/null | head -n 1) -H {{ hostvars[item]['ansible_default_ipv4']['address'] }} -c check_users"
with_items: "{{ groups.monitored_servers }}"
The host-specific file ubuntu.yml:
---
# define variables which are specific to the ubuntu setup
monitoring_checks: monitoring-plugins-common
The host-specific file oel.yml:
---
# define variables which are specific to the oel setup
monitoring_checks: nagios-plugins-all, nagios-plugins-nrpe
The nrpe configuration file:
# default: off
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = IPv4
socket_type = stream
type = UNLISTED
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/sbin/nrpe
server_args = -c /etc/nagios/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 localhost monitoring-server
}
Finally, in order to execute the playbook on the hosts defined in /etc/ansible/hosts simply execute the following:
ansible-playbook -s nrpe.yml