Monitoring Linux systems with SNMP extend method

An easy way to obtain information about Linux programs and systems using SNMP extend and a Shell script

Rafael Moraes Monteiro
10 min readMay 18, 2021

About two years ago, when I started my new job at an IT company that works with telephony IP solutions, I was given a challenge to create a new monitoring system using Zabbix. The idea was to update the old monitoring system with the older version of Zabbix to a new one. The older version of Zabbix didn’t have nice templates and the triggers didn’t work as they should have. Another problem was the services and the quantity of the devices to monitor. There weren’t any correct items for all the services on Zabbix nor for the correct devices, so I used the newer 4.2.5 version to create the new monitoring system.

The biggest problem that I found in the beginning was the large numbers of hosts to monitor and the many versions of them. There were about 500 hosts to monitor, and there were about 3 different versions in production. Each version or type of host had specific types of services; some had Apache service running in port 80, another in 80 and 443, some had services like NTP, LDAP, OpenSips, Asterisk, and others didn’t have any kind of services. For some, it was important to get the information in a Mysql database, in a specific table. I also had to include the mixed types of telephony cards and the numbers of telephony links that were different for each client.

Using SNMP protocol, information like CPU’s, memory, Uptime, network interfaces and disks is simple to obtain, but only with that method I wasn’t able to obtain the information about the services and daemons running, and any other specific data in those hosts. Searching in the internet I found a solution, that was the SNMP extend method. This implementation uses a custom script to obtain information about applications in addition to raw system metrics. Checks on file sizes, number of files in a given directory, date of file modification, obtaining information within the CLI of a given Linux application, among other simple information that can be obtained locally with just one command, are provided by the daemon Net-SNMP. Used in virtually all Linux distributions, it can be extended to provide these (and many other types) of functionality.

Configuration

The use of SNMP extend has a pattern of use and configuration.

  • /etc/snmp/snmpd.conf
extend name prog args

where name is an identifying string for the extension, prog is the program to run, and args are the arguments to give the program.

Examples scripts

Simplistic example

Here is a simple example using echo:

  • snmpd.conf
rocommunity testing
extend test /bin/echo hello
  • retrieving value
$ snmpwalk -v2c -c testing 127.0.0.1 nsExtendOutput1
NET-SNMP-EXTEND-MIB::nsExtendOutput1Line."test" = STRING: hello
NET-SNMP-EXTEND-MIB::nsExtendOutputFull."test" = STRING: hello
NET-SNMP-EXTEND-MIB::nsExtendOutNumLines."test" = INTEGER: 1
NET-SNMP-EXTEND-MIB::nsExtendResult."test" = INTEGER: 0
  • finding the OID
$ snmptranslate -On NET-SNMP-EXTEND-MIB::nsExtendOutput1Line.\"test\".1.3.6.1.4.1.8072.1.3.2.3.1.1.4.116.101.115.116

Script used

From the content obtained previously, I created a script in Shell that contemplated all the information and items relevant to the monitoring of all equipment and services. With the use of SNMP extend it was possible to obtain information within the CLI of some services, such as; Asterisk and Mysql. The SNMP configuration file looks like this:

  • /etc/snmp/snmpd.conf
#Apache2
extend proc-apache2 ../mon_snmp_extend.sh proc-apache2
extend apache-port-80 ../mon_snmp_extend.sh apache-port-80
extend apache-port-443 ../mon_snmp_extend.sh apache-port-443
extend apache-port-8443 ../mon_snmp_extend.sh apache-port-8443
#Asterisk
extend proc-asterisk ../mon_snmp_extend.sh proc-asterisk
extend asterisk-uptime ../mon_snmp_extend.sh asterisk-uptime
extend asterisk-sip-siprouter1 ../mon_snmp_extend.sh asterisk-sip-siprouter1
extend asterisk-sip-siprouter2 ../mon_snmp_extend.sh asterisk-sip-siprouter2
extend asterisk-channels-active ../mon_snmp_extend.sh asterisk-channels-active
extend asterisk-calls-active ../mon_snmp_extend.sh asterisk-calls-active
extend asterisk-calls-processed ../mon_snmp_extend.sh asterisk-calls-processed
extend asterisk-sip-udp-5071 ../mon_snmp_extend.sh asterisk-sip-udp-5071
#BIND9
extend proc-bind9 ../mon_snmp_extend.sh proc-bind9
#DAHDI
extend asterisk-tdm1-pbx ../mon_snmp_extend.sh asterisk-tdm1-pbx
extend asterisk-tdm2-pstn ../mon_snmp_extend.sh asterisk-tdm2-pstn
extend asterisk-tdm1-pbx ../mon_snmp_extend.sh asterisk-tdm3-pbx
extend asterisk-tdm2-pstn ../mon_snmp_extend.sh asterisk-tdm4-pstn
extend dahdi-kernel ../mon_snmp_extend.sh dahdi-kernel
extend dahdi-asterisk-modulo ../mon_snmp_extend.sh dahdi-asterisk-modulo
extend dahdi-asterisk-channels-problems ../mon_snmp_extend.sh dahdi-asterisk-channels-problems
extend dahdi-asterisk-modulo-isdn ../mon_snmp_extend.sh dahdi-asterisk-modulo-isdn
extend dahdi-asterisk-modulo-r2 ../mon_snmp_extend.sh dahdi-asterisk-modulo-r2
#ENUM
extend fone-enum-resolving ../mon_snmp_extend.sh fone-enum-resolving
#ESTATISTICAS N1 SRC e SRL
extend sr-statistics-n1-10m-completed-calls ../mon_snmp_extend.sh sr-statistics-n1-10m-completed-calls
extend sr-statistics-n1-10m-failed ../mon_snmp_extend.sh sr-statistics-n1-10m-failed
extend sr-statistics-n1-completed-calls ../mon_snmp_extend.sh sr-statistics-n1-completed-calls
extend sr-statistics-n1-failed-calls ../mon_snmp_extend.sh sr-statistics-n1-failed-calls
#ESTATISTICAS N1 PBX-IP
extend pb-statistics-n1-10m-completed-calls ../mon_snmp_extend.sh pb-statistics-n1-10m-completed-calls
extend pb-statistics-n1-10m-failed ../mon_snmp_extend.sh pb-statistics-n1-10m-failed
extend pb-statistics-n1-completed-calls ../mon_snmp_extend.sh pb-statistics-n1-completed-calls
extend pb-statistics-n1-failed-calls ../mon_snmp_extend.sh pb-statistics-n1-failed-calls
#ESTATISTICAS N2
extend statistics-n2-10m ../mon_snmp_extend.sh statistics-n2-10m
extend statistics-n2-10m-failed ../mon_snmp_extend.sh statistics-n2-10m-failed
extend statistics-n2-completed-calls ../mon_snmp_extend.sh statistics-n2-completed-calls
extend statistics-n2-failed-calls ../mon_snmp_extend.sh statistics-n2-failed-calls
#KHOMP
extend khomp-summary-serial ../mon_snmp_extend.sh khomp-summary-serial
extend khomp-tdm1 ../mon_snmp_extend.sh khomp-tdm1
extend khomp-tdm2 ../mon_snmp_extend.sh khomp-tdm2
extend khomp-tdm3 ../mon_snmp_extend.sh khomp-tdm3
extend khomp-tdm4 ../mon_snmp_extend.sh khomp-tdm4
extend khomp-summary-driver ../mon_snmp_extend.sh khomp-summary-driver
extend asterisk-kommuter ../mon_snmp_extend.sh asterisk-kommuter
extend asterisk-ebs ../mon_snmp_extend.sh asterisk-ebs
extend asterisk-channels-problems ../mon_snmp_extend.sh asterisk-channels-problems
extend asterisk-channels-problems-khomp ../mon_snmp_extend.sh asterisk-channels-problems-khomp
extend asterisk-channels-problems-khomp-failure ../mon_snmp_extend.sh asterisk-channels-problems-khomp-failure
#Linux
extend users-connected ../mon_snmp_extend.sh users-connected
#MEDIA_PROXY
extend proc-media-relay ../mon_snmp_extend.sh proc-media-relay
extend proc-media-dispatche ../mon_snmp_extend.sh proc-media-dispatche
#MYSQL
extend proc-mysqld ../mon_snmp_extend.sh proc-mysqld
#NTP
extend proc-ntpd ../mon_snmp_extend.sh proc-ntpd
#LDAP
extend proc-slapd ../mon_snmp_extend.sh proc-slapd
#OPENSIPS
extend opensips-status ../mon_snmp_extend.sh opensips-status
extend opensips-uptime ../mon_snmp_extend.sh opensips-uptime
extend opensips-status-peer ../mon_snmp_extend.sh opensips-status-peer
extend opensips-calls-active ../mon_snmp_extend.sh opensips-calls-active
extend opensips-port-5060 ../mon_snmp_extend.sh opensips-port-5060
extend opensips-port-5080 ../mon_snmp_extend.sh opensips-port-5080

In the script, an initial rule was created to obtain the password for accessing Mysql for each device, as well as a verification of what type of device was being monitored. That part will not be presented here. Using regular expressions (Regex) and Linux commands, it was possible to obtain information such as string, int or boolean that returned the ideal information for that service or data. The items were created with a nomenclature (app, type and check_cmd). Where the app must be the same argument used in the file snmpd.conf. The type field has a numbering of 0 or 1, and check_cmd is the command to be executed. These variables are separated by the ${BASH_REMATCH[1-3]}argument respectively in the Shell script.

In the field type, 0 performs a system process verification, filtering by the specific application. The following command is executed for the value 0:

if [ ${BASH_REMATCH[2]} -eq 0 ]
then
#Process type
VALUE=`/bin/ps -C ${BASH_REMATCH[3]} | grep ${BASH_REMATCH[3]}| /usr/bin/wc -l 2>/dev/null`
fi

Already with the value 1, the existing command in the variable check_cmd is executed.

if [[ ${BASH_REMATCH[2]} -eq 1 ]] 
then
#Argument execution
VALUE=`${BASH_REMATCH[3]}`
fi

Considering the validation of the executed arguments, the Shell script was built with the following format:

  • mon_snmp_extend.sh
#APACHE2
proc-apache2,0,apache2
apache-port-80,1,"sudo /bin/netstat -lnptu | grep -i listen | grep -i apache | grep -c :80"
apache-port-443,1,"sudo /bin/netstat -lnptu | grep -i listen | grep -i apache | grep -c :443"
apache-port-8443,1,"sudo /bin/netstat -lnptu | grep -i listen | grep -i apache | grep -c 8443"
#ASTERISK
proc-asterisk,0,asterisk
asterisk-uptime,1,"sudo /usr/sbin/asterisk -rx \"core show uptime\""
asterisk-sip-siprouter1,1,"sudo /usr/sbin/asterisk -rx 'sip show peers like SIPRouter-Local1' | grep SIPRouter | awk '{ print \$5 }' | grep -c OK"
asterisk-sip-siprouter2,1,"sudo /usr/sbin/asterisk -rx 'sip show peers like SIPRouter-Local2' | grep SIPRouter | awk '{ print \$5 }' | grep -c OK"
asterisk-calls-active,1,"sudo /usr/sbin/asterisk -rx 'core show channels count' | grep 'active call' | awk '{ print \$1 }'"
asterisk-calls-processed,1,"sudo /usr/sbin/asterisk -rx 'core show channels count' | grep 'calls processed' | awk '{ print \$1 }'"
asterisk-channels-active,1,"sudo /usr/sbin/asterisk -rx 'core show channels count' | grep 'active channels' | awk '{ print \$1 }'"
asterisk-sip-udp-5071,1,"sudo /bin/netstat -nlup | egrep udp | grep -c 5071"
#BIND9
proc-bind9,0,named
#DAHDI
asterisk-tdm1-pbx,1,'sudo /usr/sbin/asterisk -rx "dahdi show status" | grep " Card 0 Span 1 " | grep OK | wc -l'
asterisk-tdm2-pstn,1,'sudo /usr/sbin/asterisk -rx "dahdi show status" | grep " Card 0 Span 2 " | grep OK | wc -l'
asterisk-tdm3-pbx,1,'sudo /usr/sbin/asterisk -rx "dahdi show status" | grep " Card 1 Span 1 " | grep OK | wc -l'
asterisk-tdm4-pstn,1,'sudo /usr/sbin/asterisk -rx "dahdi show status" | grep " Card 1 Span 2 " | grep OK | wc -l'
dahdi-kernel,3,'sudo /sbin/lsmod | grep "dahdi" | wc -l'
dahdi-asterisk-modulo,1,'sudo /usr/sbin/asterisk -rx "module show like chan_dahdi.so" | grep chan_dahdi | wc -l'
dahdi-asterisk-channels-problems,1,"sudo /usr/sbin/asterisk -rx 'dahdi show channels' | awk '{print \$4}' | egrep -cv \"Language|In|default\""
dahdi-asterisk-modulo-isdn,1,'sudo /usr/sbin/asterisk -rx "pri show channels" | grep -v Span | grep -v PRI | grep -v Idle | grep -v Alerting | wc -l '
dahdi-asterisk-modulo-r2,1,'sudo /usr/sbin/asterisk -rx "mfrc2 show channels" | grep BR | grep -v IDLE | wc -l '
#ENUM
fone-enum-resolving,1,"sudo /usr/local/voip-scripts/cron/testa_bind.sh | grep OK | wc -l"
#Estatisticas N1 - SRC e SRL
sr-statistics-n1-10m-completed-calls,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from voip.completed_calls WHERE _datetime > NOW() - INTERVAL 10 minute;' | grep '[0-9]'"
sr-statistics-n1-10m-failed,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from voip.not_completed_calls WHERE _datetime > NOW() - INTERVAL 10 minute AND _code IN (503, 500, 603, 480, 403, 401);' | grep '[0-9]'"
sr-statistics-n1-completed-calls,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from voip.completed_calls;' | grep '[0-9]'"
sr-statistics-n1-failed-calls,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from voip.not_completed_calls;' | grep '[0-9]'"
#Estatisticas N1 - PBX-IP
pb-statistics-n1-10m-completed-calls,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from voip.completed_calls WHERE _datetime > NOW() - INTERVAL 10 minute;' | grep '[0-9]'"
pb-statistics-n1-10m-failed,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from voip.not_completed_calls WHERE _datetime > NOW() - INTERVAL 10 minute AND _code IN (503, 500, 603, 480, 403, 401);' | grep '[0-9]'"
pb-statistics-n1-completed-calls,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from voip.completed_calls;' | grep '[0-9]'"
pb-statistics-n1-failed-calls,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from voip.not_completed_calls;' | grep '[0-9]'"
#Estatisticas N2
statistics-n2-10m,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from asterisk.cdr WHERE calldate > NOW() - INTERVAL 10 minute AND disposition = \"ANSWERED\";' | grep '[0-9]'"
statistics-n2-10m-failed,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from asterisk.cdr WHERE calldate > NOW() - INTERVAL 10 minute AND disposition IN (\"CONGESTION\", \"FAILED\");' | grep '[0-9]'"
statistics-n2-completed-calls,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from asterisk.cdr WHERE disposition = \"ANSWERED\";' | grep '[0-9]'"
statistics-n2-failed-calls,1,"sudo mysql -u root -p$MYSQL_SENHA -e 'select count(*) from asterisk.cdr WHERE disposition != \"ANSWERED\";' | grep '[0-9]'"
#KHOMP
khomp-summary-serial,1,"sudo asterisk -rx \"khomp summary\" | grep serial | cut -d\"'\" -f2"
khomp-tdm1,1,"sudo asterisk -rx \"khomp links show concise\" | grep L00 | cut -d: -f2 | cut -d, -f1 | sed 's/kes//'"
khomp-tdm2,1,"sudo asterisk -rx \"khomp links show concise\" | grep L01 | cut -d: -f2 | cut -d, -f1 | sed 's/kes//'"
khomp-tdm3,1,"sudo asterisk -rx \"khomp links show concise\" | grep L02 | cut -d: -f2 | cut -d, -f1 | sed 's/kes//'"
khomp-tdm4,1,"sudo asterisk -rx \"khomp links show concise\" | grep L03 | cut -d: -f2 | cut -d, -f1 | sed 's/kes//'"
khomp-summary-driver,1,"sudo asterisk -rx \"khomp summary\" | grep driver | cut -d\" \" -f7"
asterisk-kommuter,1,'sudo /bin/cat /etc/asterisk/cli.conf | grep "^khomp kommuter on" | wc -l'
asterisk-ebs,1,'sudo /usr/sbin/asterisk -rx "khomp summary" | grep -o UP | wc -l'
asterisk-channels-problems-khomp,1,"sudo /usr/sbin/asterisk -rx 'khomp channels show' | awk '{ print \$9 }' | egrep -cv \"Free|status|Busy|-|^$|\|\" "
asterisk-channels-problems-khomp-failure,1,"sudo /usr/sbin/asterisk -rx 'khomp channels show' | awk '{ print \$7 }' | grep -c 'Failure'"
#Linux
users-connected,1,"sudo w | grep user | cut -d, -f3 | awk '{print $1 }'"
#MEDIA_PROXY
proc-media-relay,0,media-relay
proc-media-dispatche,0,media-dispatche
#MYSQL
proc-mysqld,0,mysqld
#NTP
proc-ntpd,0,ntpd
#LDAP
proc-slapd,0,slapd
#OPENSIPS
opensips-status,1,"sudo /bin/ps -C opensips | grep opensips | /usr/bin/wc -l"
opensips-uptime,1,"sudo opensipsctl fifo uptime | grep since | sed -s 's/:: /@/' | cut -d@ -f2"
opensips-status-peer,1,"sudo mysql -u root -pmysql.root -e 'select ((select count(*) from voip.monitora_peer m INNER JOIN voip.historico_peer h ON h.id = m.id WHERE date > NOW() - INTERVAL 10 MINUTE AND (pkt_transfer != pkt_received OR ping_max > 100)) / (select count(*) from voip.monitora_peer m INNER JOIN voip.historico_peer h ON h.id = m.id WHERE date > NOW() - INTERVAL 10 MINUTE)) as percentage;' | grep '[\d\.]'"
opensips-calls-active,1,"mysql -u root -pmysql.root opensips -e 'select count(*) from dialog;' | grep '[0-9]'"
opensips-port-5060,1,"sudo /bin/netstat -nlup | egrep udp | grep -c 5060"
opensips-port-5080,1,"sudo /bin/netstat -nlup | egrep udp | grep -c 5080"

Zabbix

With the SNMP configuration file and the script created, it was only necessary to obtain the OID of each item and thus, create items in the Zabbix system. In order, not to create just a single template and associate all the obtained items in that template and use it on all equipment. I created templates for each application; Asterisk, NTP, OpenSips, Mysql and etc. With each template of each created service, I produced key templates for the equipment and linked the service templates to the equipment templates, in a template hierarchy

Template hierarchy in Zabbix

In this way, I built a simple, yet extremely efficient and scalable monitoring system. The use of SNMP extend made it easier to obtain application and specific information. It fit perfectly into the existing device structure, with each having its own peculiarity.

--

--

Rafael Moraes Monteiro

Junior Support Analyst at CAM Tecnologia | MEng student in Cybersecurity -PPEE - UnB | LPIC-1