nagios with openSUSE Leap 42.2 小記
目的: 監控 目前專案還有自己的設備
監控公共服務或是主機是否活著
OS: openSUSE Leap 42.2
安裝 nagios 相關套件, plugins 套件的名稱也改了, 現在叫 monitoring-plugins
# zypper install nagios monitoring-plugins
設定 nagiosadmin 密碼
傳統的方式
# htpasswd2 -c /etc/nagios/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
因為考慮之後自動化要結合 ansible 所以也嘗試了 -b 與 -i 選項, 這邊比較有趣的是 -b 或是 -i 都要相選項與 -c 放在一起, 也就是 -b -c 是不行的, 要 -bc 才行
-b batch mode 密碼要放在使用者帳號後面
# htpasswd2 -bc /etc/nagios/htpasswd.users nagiosadmin test
Adding password for user nagiosadmin
-i read stdin, 透過 STDIN 來餵進去密碼
# echo test | htpasswd2 -ic /etc/nagios/htpasswd.users nagiosadmin
Adding password for user nagiosadmin
確認 nagios 開機啟動
# systemctl is-enabled nagios
nagios.service is not a native service, redirecting to systemd-sysv-install
Executing /usr/lib/systemd/systemd-sysv-install is-enabled nagios
disabled
設定開機啟動 nagios
# systemctl enable nagios
nagios.service is not a native service, redirecting to systemd-sysv-install
Executing /usr/lib/systemd/systemd-sysv-install enable nagios
# systemctl is-enabled nagios
nagios.service is not a native service, redirecting to systemd-sysv-install
Executing /usr/lib/systemd/systemd-sysv-install is-enabled nagios
enabled
嘗試啟動 apache2, 這個時候會出現錯誤
# systemctl restart apache2.service
Job for apache2.service failed because the control process exited with error code. See "systemctl status apache2.service" and "journalctl -xe" for details.
使用 status 觀察, 原因是 apache2.4 與 apache2.2 寫法不一樣
# systemctl status apache2.service
12月 31 11:11:50 template start_apache2[7143]: AH00526: Syntax error on line 15 of /etc/apache2/conf.d/nagios.conf:
12月 31 11:11:50 template start_apache2[7143]: Invalid command 'Order', perhaps misspelled or defined by a module not included in the server configuration
12月 31 11:11:51 template systemd[1]: apache2.service: Main process exited, code=exited, status=1/FAILURE
可以參考
- 2.2 用 Order, 但是2.4 用 Require
解法啟用 access_compat 模組 ( openSUSE / SUSE 預設 authz_host 已經啟動 )
# a2enmod mod_access_compat
列出已經啟用的 apache2 module
# apache2ctl -M
- 會將設定寫入 /etc/apache2/sysconfig.d/loadmodule.conf 檔案內 LoadModule access_compat_module /usr/lib64/apache2-prefork/mod_access_compat.so
重新啟動 apache2
# systemctl restart apache2.service
觀察資訊
# systemctl status apache2.service
啟動 nagios
# systemctl start nagios
觀察資訊
# systemctl status nagios
開啟 http 服務
#yast2 firewall
預設如果啟動 nagios, 他會去檢查本機 http 服務, 但是沒有預設網頁就會警告, 還有如果監控的項目比較多, total process 也會超標, 所以我調整了一下 /etc/nagios/objects/localhost.cfg 相關內容
#vi /etc/nagios/objects/localhost.cfg
註解 HTTP, linux-servers 群組 以及調整 Total Process
# 2014/1/8 edit by sakana, temp disable HTTP monitor
#define service{
# use local-service ; Name of service template to use
# host_name localhost
# service_description HTTP
# check_command check_http
# notifications_enabled 0
# }
# Define an optional hostgroup for Linux machines
#
#define hostgroup{
# hostgroup_name linux-servers ; The name of the hostgroup
# alias Linux Servers ; Long name of the group
# members localhost ; Comma separated list of hosts that belong to this group
# }
# 2014/1/8 edit by sakana change check_local_procs from 250 to 400, 400 to 800
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Total Processes
check_command check_local_procs!400!800!RSZDT
}
上面其實只是說明, 如果需求跟我一樣懶得動手改, 可以抓網路上我已經改好的
( 其實也是為了自己自動化 )
# wget https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/localhost.cfg
--2016-12-31 12:21:29-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/localhost.cfg
正在查找主機 raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
正在連接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... 連上了。
已送出 HTTP 要求,正在等候回應... 200 OK
長度: 5546 (5.4K) [text/plain]
Saving to: ‘localhost.cfg’
100%[=====================================================================================================================>] 5,546 --.-K/s in 0s
2016-12-31 12:21:30 (28.9 MB/s) - ‘localhost.cfg’ saved [5546/5546]
目前目錄下就會有 localhost.cfg
# ls
bin Desktop Documents Downloads inst-sys localhost.cfg Music Pictures Public Templates Videos
將localhost.cfg 取代 /etc/nagios/objects/localhost.cfg ( 謎之音: 記得先備份?? )
# mv localhost.cfg /etc/nagios/objects/localhost.cfg
修改通知 e-mail
# vi /etc/nagios/objects/contacts.cfg
修改預設的 e-mail
define contact{
contact_name nagiosadmin ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
email 自己帳號@郵件 ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
}
檢視 Nagios 設定有沒有問題
#nagios -v /etc/nagios/nagios.cfg
重新啟動 Nagios
# systemctl restart nagios.service
結果如下
安裝 nagios-nrpe 套件
# zypper install nrpe monitoring-plugins-nrpe
下載之前自己建立的範本檔案
--2016-12-31 16:13:07-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/templates.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19833 (19K) [text/plain]
Saving to: ‘templates.cfg’
100%[=====================================================================================================================>] 19,833 --.-K/s in 0.06s
2016-12-31 16:13:07 (324 KB/s) - ‘templates.cfg’ saved [19833/19833]
確認目前目錄下有 templates.cfg
# ls
Desktop Documents Downloads Music Pictures Public Templates Videos bin inst-sys templates.cfg
覆蓋且移動原來的設定檔
# mv templates.cfg /etc/nagios/objects/templates.cfg
下載之前自己建立的範本commands.cfg檔案
--2016-12-31 16:18:12-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/commands.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7876 (7.7K) [text/plain]
Saving to: ‘commands.cfg’
100%[=====================================================================================================================>] 7,876 --.-K/s in 0s
2016-12-31 16:18:13 (60.2 MB/s) - ‘commands.cfg’ saved [7876/7876]
確認 commands.cfg 已經下載
# ls
Desktop Documents Downloads Music Pictures Public Templates Videos bin commands.cfg inst-sys
覆蓋且移動原來的設定檔
# mv commands.cfg /etc/nagios/objects/commands.cfg
建立之後存放 Server 與 一般工作站的設定檔目錄
#mkdir /etc/nagios/servers
#mkdir /etc/nagios/pcs
#mkdir /etc/nagios/racks
#mkdir /etc/nagios/switches
#mkdir /etc/nagios/projects
#mkdir /etc/nagios/labs
取得事先寫好的 linuxPublic.cfg, windowsPublic.cfg 複製給公用服務使用並複製到 /etc/nagios/objects目錄
--2016-12-31 16:37:12-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/linuxPublic.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2652 (2.6K) [text/plain]
Saving to: ‘linuxPublic.cfg’
100%[=====================================================================================================================>] 2,652 --.-K/s in 0s
2016-12-31 16:37:12 (28.7 MB/s) - ‘linuxPublic.cfg’ saved [2652/2652]
# mv linuxPublic.cfg /etc/nagios/objects/
--2016-12-31 16:38:37-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/windowsPublic.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2345 (2.3K) [text/plain]
Saving to: ‘windowsPublic.cfg’
100%[=====================================================================================================================>] 2,345 --.-K/s in 0s
2016-12-31 16:38:37 (23.4 MB/s) - ‘windowsPublic.cfg’ saved [2345/2345]
# mv windowsPublic.cfg /etc/nagios/objects/
取得事先寫好的 switchSimple.cfg, rackHost.cfg 複製給switch, 機器服務使用並複製到 /etc/nagios/objects目錄
--2016-12-31 16:41:58-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/switchSimple.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3365 (3.3K) [text/plain]
Saving to: ‘switchSimple.cfg’
100%[=====================================================================================================================>] 3,365 --.-K/s in 0s
2016-12-31 16:41:58 (29.6 MB/s) - ‘switchSimple.cfg’ saved [3365/3365]
# mv switchSimple.cfg /etc/nagios/objects/
--2016-12-31 16:43:38-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/rackHost.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2716 (2.7K) [text/plain]
Saving to: ‘rackHost.cfg’
100%[=====================================================================================================================>] 2,716 --.-K/s in 0s
2016-12-31 16:43:38 (18.2 MB/s) - ‘rackHost.cfg’ saved [2716/2716]
# mv rackHost.cfg /etc/nagios/objects/
取得事先寫好的 windows.cfg 複製給windows 服務使用並複製到 /etc/nagios/objects目錄
--2016-12-31 16:53:25-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/windows.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4023 (3.9K) [text/plain]
Saving to: ‘windows.cfg’
100%[=====================================================================================================================>] 4,023 --.-K/s in 0s
2016-12-31 16:53:25 (45.9 MB/s) - ‘windows.cfg’ saved [4023/4023]
# mv windows.cfg /etc/nagios/objects/
取得事先寫好的 nagios.cfg 主要是修改使用 cfg_dir= 並複製到 /etc/nagios目錄
--2016-12-31 16:48:01-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/nagios.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 44650 (44K) [text/plain]
Saving to: ‘nagios.cfg’
100%[=====================================================================================================================>] 44,650 --.-K/s in 0.1s
2016-12-31 16:48:01 (397 KB/s) - ‘nagios.cfg’ saved [44650/44650]
# mv nagios.cfg /etc/nagios/
檢視 Nagios 設定有沒有問題
#nagios -v /etc/nagios/nagios.cfg
這邊可能會有警告, 因為 rack 主機我們只檢查主機不檢查服務
重新啟動 Nagios
# systemctl restart nagios.service
結果如下
Part II: Nagios 用戶端-- Linux 伺服器用戶端建置
請於 Client 端
1.安裝 nagios-nrpe套件
# zypper install nrpe monitoring-plugins-nrpe monitoring-plugins
2.設定nagios-nrpe 套件
(另外最好去確認 /etc/services 有沒有 nrpe 5666/tcp # nagios nrpe 的設定)
#grep 5666 /etc/services
修改設定檔 允許 Nagios 伺服器連線 ( 設定檔位置有改變, 目前為 /etc/nrpe.cfg )
# vi /etc/nrpe.cfg
allowed_hosts=127.0.0.1,192.168.100.199
(請依照實際的IP 作為修正, 可能為 10.x.x.x )
這邊要注意 127.0.0.1後面要加上 逗點 , 然後主機 IP 不能有空格
不然會出現沒有辦法建立 SSL HandShake
(這個部份可以解釋, 如果nrpe是使用 SystemV 的形式啟動後面都會出現不能建立SSL Handshake ,但是以Xinetd 就不會)
啟動 NRPE
# systemctl start nrpe
檢查狀態 ( 這邊有看到 /run/nrpe/nrpe.pid 無法建立, 但是目前測試沒有影響查詢, 如果不放心就建立 /run/nrpe 然後給 nagios 寫入權限 )
# systemctl status nrpe
● nrpe.service - Daemon to remotely execute Nagios plugins
Loaded: loaded (/usr/lib/systemd/system/nrpe.service; enabled; vendor preset: disabled)
Active: active (running) since 六 2016-12-31 19:17:50 CST; 3min 50s ago
Process: 3457 ExecStart=/usr/sbin/nrpe -c /etc/nrpe.cfg -d (code=exited, status=0/SUCCESS)
Main PID: 3461 (nrpe)
Tasks: 1 (limit: 512)
CGroup: /system.slice/nrpe.service
└─3461 /usr/sbin/nrpe -c /etc/nrpe.cfg -d
12月 31 19:17:50 template systemd[1]: Starting Daemon to remotely execute Nagios plugins...
12月 31 19:17:50 template systemd[1]: Started Daemon to remotely execute Nagios plugins.
12月 31 19:17:50 template nrpe[3461]: Starting up daemon
12月 31 19:17:50 template nrpe[3461]: Cannot write to pidfile '/run/nrpe/nrpe.pid' - check your privileges.
設定開機啟動 NRPE
# systemctl enable nrpe
確認開機啟動
# systemctl is-enabled nrpe
enabled
請於Client 端
執行 check_nrpe 測試, 成功應該會出現 NRPE的版本
# /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1
NRPE v2.15
*************************************************************
請於 Server 端
針對 nagios client 測試 nagios-nrpe 成功應該會出現 NRPE的版本
#/usr/lib/nagios/plugins/check_nrpe -H 192.168.100.100
NRPE v2.15
(這邊請確認 firewall 是否關閉, 或是准許 nrpe 通過, 可以使用 #yast2 firewall 關閉防火牆測試 或是下指令 #rcSuSEfirewall2 stop )
*************************************************************
請於Client 端
加入相關 nrpe 指令 ( 因為目前已經沒有 hda1 了 )
#vi /etc/nagios/nrpe.cfg
加入
#command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_sda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1
command[check_sda2]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda2
command[check_ssh]=/usr/lib/nagios/plugins/check_ssh 127.0.0.1
command[check_smtp]=/usr/lib/nagios/plugins/check_smtp 127.0.0.1
也可以用之前已經做好的 nrpe.cfg
--2016-12-31 19:49:29-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/nrpe.cfg
正在查找主機 raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
正在連接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... 連上了。
已送出 HTTP 要求,正在等候回應... 200 OK
長度: 8474 (8.3K) [text/plain]
Saving to: ‘nrpe.cfg’
100%[===========================================================================>] 8,474 --.-K/s in 0s
2016-12-31 19:49:29 (53.1 MB/s) - ‘nrpe.cfg’ saved [8474/8474]
#vi /etc/nagios/nrpe.cfg
(如果IP有改就改 allowed_hosts= 後面接的IP吧 !! )
覆蓋原來的檔案
# mv nrpe.cfg /etc/
重新啟動
# systemctl restart nrpe
測試相關指令
# /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_sda2
DISK OK - free space: / 12753 MB (69% inode=97%);| /=5661MB;14732;16573;0;18415
# /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_users
USERS OK - 2 users currently logged in |users=2;5;10;0
# /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_load
OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;
# /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_total_procs
PROCS OK: 208 processes | procs=208;250;300;0;
另外於 Server端測試 相關的指令
# /usr/lib/nagios/plugins/check_nrpe -H 192.168.100.100 -c check_sda2
DISK OK - free space: / 12753 MB (69% inode=97%);| /=5661MB;14732;16573;0;18415
Part III: 將linux 伺服器(Nagios Client)加入到 Nagios監控範圍
請於Server上面
下載事先做好的 linux.cfg 範本
--2016-12-31 20:14:49-- https://raw.githubusercontent.com/sakanamax/LearnAnsible/master/playbook/general/nagios/files/linux.cfg
正在查找主機 raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.100.133
正在連接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.100.133|:443... 連上了。
已送出 HTTP 要求,正在等候回應... 200 OK
長度: 3664 (3.6K) [text/plain]
Saving to: ‘linux.cfg’
100%[===================================================================>] 3,664 --.-K/s in 0s
2016-12-31 20:14:49 (19.2 MB/s) - ‘linux.cfg’ saved [3664/3664]
移動到 /etc/nagios/objects 下
# mv linux.cfg /etc/nagios/objects/
**設定 Nagios 載入 linux client 的設定檔 **
依照不同的性質將範本設定檔 linux.cfg 複製到不同目錄我們建立了
- /etc/nagios/labs - 監控實驗機器
- /etc/nagios/servers - 監控服務機器
- /etc/nagios/pcs - 監控pc
- /etc/nagios/projects - 監控專案機器
- /etc/nagios/racks - 監控機架 IPMI
- /etc/nagios/switches - 監控 switch
假設有台 linux 服務機器要監控
將剛剛的 linux.cfg 複製給 linux server 使用並複製到 /etc/nagios/servers目錄
# cp /etc/nagios/objects/linux.cfg /etc/nagios/servers/linux100.cfg
確認IP設定無誤
#vi /etc/nagios/servers/linux100.cfg
- address 192.168.3.129 請改成實際的IP
- host_name suseserver129 請改成實際的名稱
確認設定檔是否無誤
# nagios -v /etc/nagios/nagios.cfg
重新啟動 nagios 使其生效
# systemctl restart nagios
結果如下
大功告成
~ enjoy it