Nagios 3.0 - A Extensible Host and Service Monitoring
Novell Cool Solutions: Feature
By Rainer Brunold
Reader Rating from 9 ratings
Digg This - Slashdot This Posted: 19 Oct 2007 |
Nagios is a popular host and service monitoring tool used by many administrators to keep an eye on their systems.
Since I wrote a basic installation guide in Jan 2006 on Cool Solutions many new versions were published and many Nagios plugins are now available. Because of that I think it's time to write a series of articles here that show you some very interesting solutions. I hope that you find them helpful and that you can use them in your environment. If you are not yet and nagios user I hope that I can inspire you and you give it a try.
I don't want to write here a full documentation about Nagios, I prefer to give you a basic installation guide so you can set it up very easy and play with it yourself. The installation guide will show you how to install Nagios as well as some interesting extensions and how they integrate into each other. During this installation you will make many modifications to the installation that will help to understand how it works, how you can integrate systems and different services. I will also provide some articles about monitoring special services where I describe what they do and what configuration changes are needed. All together should give you a very good overview and documentation on how you can enhance the Nagios installation yourself.
If you would like to read some detailed information about Nagios visit the documentation at the project homepage at http://www.nagios.org/docs or go through my short article from Jan 2006 at http://www.novell.com/coolsolutions/feature/16723.html
These are the Nagios extensions that I would like to explain for you:
Nagios 3.0 | This version is currently in beta 4 but the installation and configuration should not differ very much from the released version later this year. |
Nagios Plugins 1.4.10 | This is a default set of Nagios check programs that do all the work for you by checking file systems, memory usage, cpu utilization and so on. Many more check programs are available at the Nagios Exchange web site (http://www.nagiosexchange.org) |
Nagiosgraph 0.9 | This is a perl enhancement that allows you to write Nagios check results into round robin databases and create graphs based on that values |
NDOUtils 1.4 | The ndoutils are also in beta state right now. They allow you to write the whole Nagios configuration and check results into a database from where it can be used by different other Nagios extensions. |
NSCA 2.7.2 | The NSCA package allows you to build some distributed Nagios installations that report all their check results to a single central Nagios installation. From that one you can keep the overview of your whole network. |
NagVis 1.1 | NagVis allows you based on the database filled by the NDOUtils to draw some very nice maps of your network and infrastructure. |
Please note that there are much more plugins available for Nagios. I focus on this list because we use mainly this one. Many administrators out there use other one's. Please feel free to write an article with you favorite one's.
Here are screen shots of some of the components so you can expect where I would like to guide you to:
Nagios:
This is a screen shot of our Nagios production system with the service details of a linux server. The screen shot is from Nagios version 2.5 installation because our production is on that version right now. After the release of version 3.0 we will migrate to it. From the menu on the left side there are just a view changes in version 3.0, there rest is nearly the same.
Nagiosgraph:
This is a screen shot from a production system where we graph the number of ldap requests per second.
This screen shot is from a little bit older NagiosGraph version. The new one does better diagram formatting.
NagVis:
This is a sample screen shot from the project homepage at http://sourceforge.net/projects/nagvis
Chapter 1 — Basic Nagios installation and configuration
Nagios is included in all SUSE Linux Enterprise distributions in different version. The latest service pack 1 of SLES 10 includes Nagios 2.6 but as I would like to show you the integration of different components like the NDOUtils I have to use the most recent version 3.0 beta 4. here.
1. Server Preparation
I use a SLES 10 sp1 Server for this article. The installation should work the same way for SLES 9 or OES server.
Nagios does not need very much memory nor does it use a lot of disk space. I would say 256MB ram and about 100MB of disk space are enough to monitor a view hundred services. If you start using nagiosgraph to draw some graphs it will increase a little bit but not very much.
So I did a default installation of SLES 10 sp1 installing the additional patterns "Web and LAMP Server" and the "C/C++ Compiler and Tools". The LAMP packages bring us the apache and the mysql database for the NDOUtils and the Compiler packages are required for building the software binaries.
After the installation please check that this packages are not installed because they would conflict with the one we are installing. If they are installed remove them with Yast.
nagios
nagios-nsca
nagios-nsca-client
nagios-plugins
nagios-plugins-extras
nagios-www
For several Nagios functions we require some additional packages that have to be installed. Please check that the following are installed:
gd-devel
libpng-devel
2. Software Download and Extraction
For this first section of the Nagios basic installation we require the following two packages:
Software | Download Link | Current File Name by 05/10/2007 |
Nagios 3.0 Beta 4 | http://www.nagios.org/download | nagios-3.0b4.tar.gz |
Nagios Plugins 1.4.10 | http://www.nagios.org/download | nagios-plugins-1.4.10.tar.gz |
Download those two packages and copy them to a temporary installation directory. I use /images for those steps.
# mkdir /images
# cp <nagios-3.0b4.tar.gz> /images
# cp <nagios-plugins-1.4.10.tar.gz> /images
# cd /images
# tar -xvzf nagios-3.0b4.tar.gz
# tar -xvzf nagios-plugins-1.4.10.tar.gz
3. Security Preparation
Nagios itself does not require root permission to run on the system.
In a normal installation there is a dedicated nagios user and a nagios group for that. Sometimes Nagios runs some check programs that require root permissions, for that we can utilize sudo at that time.
As apache presents the Nagios front end, we have the choice to submit some commands to Nagios using it.
For those steps we have to prepare a another local linux group called nagcmd that has the permissions to write to a named pipe where Nagios receives the command at the other side. Sample commands are when you would like to reschedule a service check to run right now and you don't want to wait till the normal check interval, or when you would like to define a service downtime in which time frame there should be no service down notification.
NOTE: On SLES systems apache runs as user wwwrun. If you use a different distribution, add the appropriate user to the nagcmd group.
# useradd -m nagios
# groupadd nagios
# groupadd nagcmd
# usermod -G nagios,nagcmd nagios
# usermod -G nagcmd wwwrun
4. Software Compilation and Installation of Nagios 3.0 beta 4
Never compiled a software package? Don't worry it's quite easy.
The only thing is that sometimes packages require some extra parameters during the preparation of the compilation.
Nagios offers the choice to define the directory structure we would like to have when Nagios get's installed. To do so we have to provide the configure command specific parameters where we would like to have the binaries after the installation. I vary here from a default Nagios installation because I would like to follow the LSB (Linux Standard Base) rules which define where each kind of files should be placed. A sample is that variable data (log files, databases, ...) should be placed below /var and so on. Because of this situation we have to make a view more modifications after the installation.
NOTE: LSB ? Linux Standard Base ? this tries to standardize linux distributions. When a developer creates a software package according to the lsb rules, it is guaranteed that the package can be installed on all lsb certified distributions like SUSE, Red Hat, ... LSB does not only give you rules where to place the different file type, it also gives the developers rules which libraries and function they can use to be lsb compliant. For more information please visit http://www.linuxbase.org/en
Here are my configuration options:
--prefix /opt/nagios defines where to put the Nagios files
--with-cgiurl /nagios/cgi-bin defines the web server url where the cgi's will be available
--with-htmurl /nagios defines the web server url where nagios will be available
--with-nagios-user nagios user account under which Nagios will run
--with-nagios-group nagios group account under which Nagios will run
--with-command-group nagcmd group account which will allow the apache user to submit
commands to Nagios
In the following command section, the "configure" will prepare the compilation and set the provided parameters, the "make all" will do the compilation and the "make install" will do the installation itself.
# cd /images/nagios-3.0b4
# ./configure --prefix=/opt/nagios --with-cgiurl=/nagios/cgi-bin \
--with-htmurl=/nagios --with-nagios-user=nagios \
--with-nagios-group=nagios --with-command-group=nagcmd
# make all
The "make all" compilation should finish without any errors and you should get a description of the next steps that have to be done. If there were some errors you have to correct them and rerun the configure command before continuing. If so, make sure to have the software packages installed listed in the part "1. Server Preparation" installed.
# make install
# make install-init
# make install-commandmode
# make install-config
# make install-webconf
Nagios is right now installed, but before starting we have to install some further components and make some changes to the default configuration. First I describe the installation of the following components and then the configuration of it.
5.Software Compilation and Installation of the Nagios Plugins 1.4.10
Here it is the same as with the Nagios package. First we have to prepare the compilation with the configure command, where we also have to provide matching options to the Nagios package and then we have to do the compilation and installation.
# cd /images/nagios-plugins-1.4.10
# ./configure --prefix=/opt/nagios --with-nagios-user=nagios \
--with-nagios-group=nagios
If compilation has shown no error, continue with the installation, otherwise correct the problems and rerun the configuration command.
# make
# make install
That's all for the plugins. They need nothing else.
6. Configuration of Nagios 3.0 beta 4
The Nagios package provides us a default set of configuration files. In general the configuration files are split into two different categories. The first one with files in /opt/nagios/etc controls the behavior on how Nagios works. The second group in the objects sub directory holds all host and service definitions.
Here is a overview of the default configuration files in the /opt/nagios/etc directory:
nagios.cfg | This is the main Nagios configuration file that controls the behavior of the whole application. Here you define a lot of global parameters as well as all the other configuration files. There is a lot of documentation in there if you would like to go through. |
cgi.cfg | This file controls the web interface. Configurations like user authentication are here stored. |
resource.cfg | This file should hold sensitive data that is used in the check command definitions. Imagine you configure a ftp service to be monitored where you need a user account and password for. All the other Nagios configuration files are readable by everybody. Only the resource.cfg is only readable by the nagios user and the nagios group itself. So place the username and account into this file and refer in the other configuration files to this one by using the $USERx$ variable. |
And this are the files in the objects sub directory:
timeperiods.cfg | All time periods for Nagios are configured here. Eg. create a time period called 24x7 or another on named workhours. When you define in the other configuration files when eg. a service downtime should be notified you refer to a definition here. |
contacts.cfg | Every person that needs system notifications from Nagios have to be defined here. Some parameters are email address, type of notification (email, sms, ...), time periods for notification (24x7, workhours), groups to which the user belong .... |
commands.cfg | Every let's call it external action (host check, service check, email notification, ...) is just a command that Nagios launches with the appropriate parameters. All those command definitions are here in. Nagios provides a lot of so called macros which help us to define those commands and let us submit parameters to them. If you define a host with it's ip address, Nagios provides a $HOSTNAME$ and $HOSTADDRESS$ macro during the command execution that con be used. |
templates.cfg | When you define several hosts or services it is easier to collect similar parameters together, assign them to a template and assign that template to the host and service definitions. This is a very easy way to keep the host and service definitions small and clean. Use templates whenever possible ! |
localhost.cfg | This is a sample host and service definition for the localhost. |
printer.cfg | This is a sample host and service definition for the printer. |
switch.cfg | This is a sample host and service definition for the switch. |
windows.cfg | This is a sample host and service definition for the windows machine. |
Because I changed the default directory structure we have to do some modifications to several files.
First there is the main Nagios configuration file nagios.cfg which holds most of the Nagios configuration itself.
# vi /opt/nagios/etc/nagios.cfg
...
log_file=/var/opt/nagios/nagios.log
...
object_cache_file=/var/opt/nagios/objects.cache
...
precached_object_file=/var/opt/nagios/objects.precache
...
status_file=/var/opt/nagios/status.dat
...
command_file=/var/opt/nagios/rw/nagios.cmd
...
lock_file=/var/opt/nagios/nagios.lock
...
temp_file=/var/opt/nagios/nagios.tmp
...
log_archive_path=/var/opt/nagios/archives
...
check_result_path=/var/opt/nagios/spool/checkresults
...
state_retention_file=/var/opt/nagios/retention.dat
...
debug_file=/var/opt/nagios/nagios.debug
...
Next create the required directories below /var to hold the above configured files:
# mkdir -p /var/opt/nagios/rw
# mkdir -p /var/opt/nagios/spool/checkresults
# mkdir -p /var/opt/nagios/archives
# chown -R nagios.nagios /var/opt/nagios
# chown -R nagios.nagcmd /var/opt/nagios/rw
# chmod 2775 /var/opt/nagios/rw
Also the Nagios runlevel script has to be modified:
# vi /etc/init.d/nagios
...
NagiosStatusFile=/var${prefix}/status.dat
NagiosRetentionFile=/var${prefix}/retention.dat
NagiosCommandFile=/var${prefix}/rw/nagios.cmd
NagiosVarDir=/var${prefix}
NagiosRunFile=/var${prefix}/nagios.lock
...
7. Apache Security Preparation
Nagios has by default the authentication for the web interface activated.
That means after Nagios has been started and you try to access the web interface a login windows appears. Nagios has already a default user defined in the contacts.cfg (user: nagiosadmin) so we just have to create a apache password file where we store the password for it.
Do this with the following command and set the password to nagios:
# htpasswd2 -c /opt/nagios/etc/htpasswd.users nagiosadmin
Password: nagios
NOTE: LDAP can also be used for user authentication and requires just a view changes to the apache configuration. But the users still have to be defined in the contacts.cfg. LDAP is used just for verifying the password. I will cover this later.
NOTE: If you do not like to define every user in the contacts.cfg and security is not the case for you, you can modify the cgi.cfg and change all "authorized_for..." parameters to "authorized_for...=*". That will give all user that authenticate to the web server (or LDAP) all permissions even they do not exist in the contacts.cfg. After ny modification to the cgi.cfg restart Nagios.
8. Apache and Nagios Startup
So now all configurations are done and we can startup the applications.
During the Nagios installation an apache config file (nagios.conf) was placed in /etc/apache2/conf.d. So we first have to restart apache to activate that configuration.
# rcapache2 restart
After that we can start Nagios for the first time. In earlier version it was always a very good idea to do a configuration verify before restarting. Otherwise Nagios could be interrupted if there were errors in. In this new version now Nagios does this automatically when it is started or reloaded. If configuration errors were found the restart / reload operation is aborted and you get a information where the error is. Correct it and retry the operation.
As we use the default configuration and we made no mistakes Nagios should start up:
# /etc/init.d/nagios start
If you would like, add Nagios for automatic startup at system boot time:
# insserv nagios
9. Nagios Test
Now Nagios should be available at the following URL: http://
As you do have the authentication settings modified you should get an authentication window where you have to enter the user nagiosadmin with password nagios.
10. Nagios Next Steps
Please feel free to play with it and discover the functions that are available. If you would like to start adding your own services, I have on my old Cool Solutions article a easy sample on how to add a simple check program and add it as service to Nagios. Here is the link to it http://www.novell.com/coolsolutions/feature/16723.html
Search for "check_file_exist.sh" and follow the instructions. The configuration is the same for Nagios 2.0 and 3.0.
The next chapter here will integrate the NagiosGraph extension which will allow to draw graphs based on service check results. After that I will provide an article on how to add service checks to Nagios.
Rainer Brunold
No comments:
Post a Comment