07
Mar

I have been exploring the opensource options for continuous real time monitoring of servers, virtual machines and networking devices. I have come across two that are really open source one called Nagios and one called Zabbix.

 

Zabbix seems to be a Nagios build out so I picked the Nagios and more specifically the Fully Automated Nagios or FAN ISO install.

 

After researching Nagios it seems to be one of the most versatile but we all know when you get freedom you have tons of options and for a newbie it can be overwhelming to get your head around. That said with a bit of guidance you can get Nagios FAN up and running and monitoring servers and services in about a day. One great thing about Nagios FAN is it is ready to go after you install it with little configuration. This helps with the heavy lifting but if you are interested in installing Nagios separate from the OS install will be covered in the next post.

 

To start with I got the idea to install this as a VM on Hyper-V and monitor my in-house servers to test. This led to some issues with the CentOS5.4 version of Linux that comes with the Nagios FAN ISO. Specifically I was not able to get the VM to boot after the install as it was not loading the Hyper-V disk modules on boot. You can correct this by installing the FAN ISO and upon reboot typing Linux Rescue and at the command prompt and running two commands to add the Hyper-V disk module to the boot order.

 

Step by Step:

 

  1. Download the Latest Nagios FAN ISO from here http://www.fullyautomatednagios.org/wordpress/download/
  2. Create a VM on Hyper-V with the following specs (4GB RAM, 2 Cores, NIC card, Skipping the HDD creation)
  3. -if on Hyper-V 2012 R2 then choose Generation 1 VM for compatibility
  4. Create the VM then goto Settings and create a fixed Hard Disk 40GB in size – Let it create the disk (about 1 hour or less)
  5. Attach the ISO image for FAN to the VM and boot
  6. Use the Text based install
  7. On the boot screen for FAN type Linux Text and hit enter
  8. Follow the prompts to format and partition the drive and install the OS and FAN
  9. Upon reboot at the Same FAN install screen you need to have the HyperV drivers preloaded so do this
  10. Type Linux Rescue to get to the rescue console
  11. Let the system boot to the command prompt – you need write access to the drive so do not choose read only
  12. Run the following command to make the installed system the root (/) environment:
  13. chroot /mnt/sysimage
  14. View the contents of the /etc/grub.conf file to note the exact version of the installed kernel, which will be used in the next command:
  15. cat /etc/grub.conf
  16. Run the following command, replacing “VERSION” with the version you noted from the output of the grub.conf file:
  17. mkinitrd /boot/initrd-VERSION.el5.img VERSION.el5 –preload hv_storvsc –preload hv_vmbus –preload hv_utils -f
  18. For example, if the grub.conf file noted that the kernel was “/vmlinux-2.6.18-348.1.1.el5,” the version number is 2.6.18-348.1.1. The mkinitrd command would then be the following:
  19. mkinitrd /boot/initrd-2.6.18-348.1.1.el5.img 2.6.18-348.1.1.el5 –preload hv_storvsc –preload hv_vmbus –preload hv_utils -f
  20. Type “exit” twice to reboot, and your system should boot normally and load Nagios FAN.
  21. After the system is online Login to the root console with the following login
  22. Default login = nagiosadmin
  23. Default Password = nagiosadmin
  24. Upon reboot you will need to remove the ISO or change the boot order in the HyperV VM settings to boot from IDE.

  25. The first run you will get a basic menu to set the network IP address, do this and change the name to something with a FQDN that is the same internal and external to the firewall. i.e. nagios.yourdomain.com
  26. You can change the listening port binding for the web portals (http://nagios.yourdomain.com:81) if you like you can do this in the apache conf file. basic steps in the following link http://httpd.apache.org/docs/2.2/bind.html

  27. I went one step further and added OpenSSL and my wildcard cert so the whole thing can be secured end to end. I will go over these steps in one of the next posts. 
  28. Upon reboot you will need to remove the ISO or change the boot order in the Hyper-V VM settings to boot from IDE.
  29. The first run you will get a basic menu to set the network IP address, do this and change the name to something with a FQDN that is the same internal and external to the firewall. Something like nagios.yourdomain.com and set a DNS entry in both your internal and external DNS.
  30. Get into Centreon portal then login using the default username and password (nagiosadmin / nagiosadmin) 
  31. Change the nagiosadmin user email to get alerting going to the email address for the spiceworks helpdesk
  32. Add yourself as an additional admin and change the nagiosadmin password

Ok now the hard part- setting up the all the service templates and host templates for the Windows hosts you will monitor. Actually it is not really that hard but took me like 3 days of messing around to get it right. The basic step by steps are simple once you know the flow of Nagios and Centreon. Start with Configuration then Services and then Service Templates. Before you get to setting service templates you need to add one command for the windows servers SNMP monitor.

Add Command For Windows SNMP

  1. Configuration – Commands – Checks and Add a new one
  2. Command Name = check_snmp_win
  3. Command Type = Check
  4. Command Line = $USER1$/check_snmp_win.pl -H $HOSTADDRESS$ $USER7$ -n $ARG1$
  5. Send it to $ADMINEMAIL$
  6. Under Argument Descriptions = ARG1 : Windows Service Name
  7. Connectors = Perl Connector
  8. Graph Template = Default_Graph
  9. Add a comment if you like
  10. Save it

Restart the monitoring service to read the new entry. This is necessary with the build I am working with, newer builds might not need this, I hope. One serious note on this as you can break Nagios restarting. YOU NEED TO TEST YOUR SETTINGS BEFORE YOU COMMIT THEM TO PRODUCTION!!!! I cannot stress this enough, if you break it, you fix it.

  1. Configuration – Monitoring Engines
  2. You will see two check boxes one for generate config files and one for run engine debug. this is how you test the new settings you just put in. If the settings fail or give you errors you need to correct them and then come back and test again. Do this over and over until all you get is warnings. Once you are happy with the results you can commit this file to production by checking the other two boxes as well.
  3. Put settings into production by checking all four boxes on the monitoring engines tab this will run a debug and then copy the files to production and restart the engine.
  4. You should now see your SNMP command listed.

Service Templates

 

Add Service Templates For Each Windows Service You Want To Monitor

  1. Configuration – Services – Templates
  2. Under General Information
  3. Alias = Win-Service-ServiceName (i.e. Win-Service-ADDS)
  4. Service Template Name = Windows_Active_Directory_Domain_Services
  5. Service Template Model = generic_service
  6. Is Volatile = Default
  7. Check Period = 24×7
  8. Check Command = check_snmp_win
  9. Args = the actual service name in quotes (i.e. “Active Directory Domain Services”)
  10. Max Check Attempts = Any number you wish I put it = 2
  11. Normal Check Interval = 1
  12. Retry Check Interval = 1
  13. Active Checks Enabled = yes
  14. Passive Checks Enabled = yes
  15. Notification Enabled = Yes (note put no if you do not want email alerts for this service being down)
  16. Implied Contacts = nagiosadmin_nagiosadmin (You should have changed this email in the user step above to something correct)
  17. Notification Interval = 2 (this is the time to wait until sending the email alert)
  18. Notification Period = 24×7
  19. Notification Type = Check Warning, Unknown, Critical
  20. First Notification delay = 2 (this is the delay after the first email alert is sent)
  21. Repeat as needed for each windows service, some of the services I monitor by default are Exchange info store, mailbox assistants, mailbox transport, Hyper-V on the host, time service, lync services, DNS, ADDS, DHCP)

Restart the monitoring service to read the new entry. This is necessary with the build I am working with, newer builds might not need this, I hope. One serious note on this as you can break Nagios restarting. YOU NEED TO TEST YOUR SETTINGS BEFORE YOU COMMIT THEM TO PRODUCTION!!!! I cannot stress this enough, if you break it, you fix it.

  1. Configuration – Monitoring Engines
  2. You will see two check boxes one for generate config files and one for run engine debug. this is how you test the new settings you just put in. If the settings fail or give you errors you need to correct them and then come back and test again. Do this over and over until all you get is warnings. Once you are happy with the results you can commit this file to production by checking the other two boxes as well.
  3. Put settings into production by checking all four boxes on the monitoring engines tab this will run a debug and then copy the files to production and restart the engine.
  4. You should now see your  command listed.

 

Windows Drive Space Monitors

Now set some drive space monitors for C, D, E, F drives so you can add these to the host templates in the next step

  1. Configuration – Services – Templates
  2. Alias = Win-Storage-C
  3. Service Template Name = Windows_Storage_Monitor_C_Drive
  4. Service Template Model = generic-service
  5. Is Volatile = Default
  6. Check Period = 24×7
  7. Check Command = check_centreon_remote_storage
  8. Arguments – Path, partition = C:
  9. Warning = 50
  10. Critical = 90
  11. community = public (this is the SNMP community string you add to each windows server SNMP service, I used public for demo purpose, best practice is a random string)
  12. SNMP version = 2c
  13. Max check attempts = 3
  14. Normal Check Interval = 2
  15. Retry Interval = 2
  16. Active Checks Enabled = Yes
  17. Passive Checks Enabled = Yes
  18. Skip custom macros
  19. Notification enabled = yes
  20. Implied Contacts = nagiosadmin_nagiosadmin
  21. Notification Interval = 5
  22. Notification Period = 24×7
  23. Notification Type = Check Warning, Unknown, Critical
  24. First Notification delay = 5
  25. Save and repeat for the other partitions you need to cover. I usually do not have more than 4 partitions per server so I stopped at F:

Restart the monitoring service to read the new entry. This is necessary with the build I am working with, newer builds might not need this, I hope. One serious note on this as you can break Nagios restarting. YOU NEED TO TEST YOUR SETTINGS BEFORE YOU COMMIT THEM TO PRODUCTION!!!! I cannot stress this enough, if you break it, you fix it.

  1. Configuration – Monitoring Engines
  2. You will see two check boxes one for generate config files and one for run engine debug. this is how you test the new settings you just put in. If the settings fail or give you errors you need to correct them and then come back and test again. Do this over and over until all you get is warnings. Once you are happy with the results you can commit this file to production by checking the other two boxes as well.
  3. Put settings into production by checking all four boxes on the monitoring engines tab this will run a debug and then copy the files to production and restart the engine.
  4. You should now see your  command listed.

 

Host Templates

Add Host Templates For Each Server Type You Want To Monitor

Some examples of Host Templates are listed below and you will notice I am really just grouping the services I want to monitor and then I will apply these templates to actual hosts like a layered sandwich.

  1. Servers-Exchange-2010
  2. Servers-Exchange-2013
  3. Servers-HyperV
  4. Servers-Linux
  5. Servers-Win2012
  6. Servers-Win2008
  7. Servers-Win2003
  8. Servers-Win2000
  9. Servers-Lync2013
  10. Servers-DC
  11. Servers-E-Drive-Monitor
  12. Servers-D-Drive-Monitor

Steps
Goto – Configuration – Hosts – Templates
Add

  1. Host Template Name = Servers-Win2012
  2. Alias = Windows 2012 R2 Servers
  3. IP / DNS = Blank
  4. SNMP Community String = public (this should be a strong password key that is hard to guess, I used public to make it easy)
  5. Host Parallel Template = generic_host
  6. Rest of first page is blank

Click the Relations tab at the top of the page and check the list of Linked Service Templates you should see the all the service templates you created earlier.

For my windows 2012 r2 server example I choose the following from the Linked Service Templates Section.

  1. Windows_Storage_Monitor_C_Drive
  2. Windows_Memory_Monitor
  3. Windows_DISK_Swap_Monitor
  4. Windows_CPU_Monitor

I then saved the host template and repeated for the other versions of hosts I need to monitor. For Exchange I monitored the services: Information Store, Transport and Mailbox Assistants as well as the other basic CPU and memory. You see now that you can make this whatever suits your need and level of monitoring.

Restart the monitoring service to read the new entry. This is necessary with the build I am working with, newer builds might not need this, I hope. One serious note on this as you can break Nagios restarting. YOU NEED TO TEST YOUR SETTINGS BEFORE YOU COMMIT THEM TO PRODUCTION!!!! I cannot stress this enough, if you break it, you fix it.

  1. Configuration – Monitoring Engines
  2. You will see two check boxes one for generate config files and one for run engine debug. this is how you test the new settings you just put in. If the settings fail or give you errors you need to correct them and then come back and test again. Do this over and over until all you get is warnings. Once you are happy with the results you can commit this file to production by checking the other two boxes as well.
  3. Put settings into production by checking all four boxes on the monitoring engines tab this will run a debug and then copy the files to production and restart the engine.
  4. You should now see your host template listed.

 

Add Hosts

Start adding hosts that you want to monitor.

Nagios Add Hosts Final Step

0 No comments

Comments are closed.