Nagios監控mongodb分區叢集服務實戰

來源:互聯網
上載者:User

標籤:做了   mongodb複製   列表   database   iss   data   flush   orm   pack   



1 ,監控外掛程式下載
Mongodb外掛程式為:git clone git://github.com/mzupan/nagios-plugin-mongodb.git,剛開始本人這裡沒有安裝gitpub環境,找網友草根幫忙下載的。之後上傳到了csdn資源頁面,新的為:http://download.csdn.net/detail/mchdba/8019077


2 ,加入新的 mongodb 監控命令

由於mongodb服務是和mysql從庫公用一台物理機。之前已經做了基礎nagios以及mysql服務監控,所以這裡僅僅須要在原來的基礎上加入mongodb命令和服務就可以。Nagios監控mysql請參考:http://blog.itpub.net/26230597/viewspace-760141/以及http://blog.itpub.net/26230597/viewspace-1217246/。所以這裡須要加入的mongodb監控命令例如以下所看到的:

[[email protected] objects]# cd /usr/local/nagios/etc/objects[[email protected] objects]# vim commands.cfgdefine command {    command_name check_mongodb    command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$}define command {    command_name check_mongodb_database    command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -d $ARG5$}define command {    command_name check_mongodb_collection    command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -d $ARG5$ -c $ARG6$}define command {    command_name check_mongodb_replicaset    command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -r $ARG5$}define command {    command_name check_mongodb_query    command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -q $ARG5$}


3 ,加入 mongodb 監控服務 mongodb的服務也須要單獨又一次加入。例如以下所看到的:
#檢測mongodb服務的連線時間,超過2秒就普通警示,5秒就嚴重警示define service{        host_name dbm1slave1        service_description Mongo Connect Check        check_command check_mongodb!connect!30000!2!5        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }#檢查mongodb的串連數,超過150普通警示,200嚴重警示define service{        host_name dbm1slave1        service_description Mongo Free Connections        check_command check_mongodb!connections!27017!70!80        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }                #檢查mongodb複製完畢的百分比率,確保primary和standby的time是一致的。define service{        host_name dbm1slave1        service_description Mongo Replication Lag        check_command check_mongodb!replication_lag!27017!15!30        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }        #檢查mongodb記憶體使用量率。閥值與mongodb所在機器的總記憶體數相關define service{        host_name dbm1slave1        service_description Mongo Memory Usage        check_command check_mongodb!memory!27017!20!28        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }        #檢查mongodb Mapped的記憶體使用量率。閥值與mongodb所在機器的總記憶體數相關define service{        host_name dbm1slave1        service_description Mongo Mapped Memory Usage        check_command check_mongodb!memory_mapped!27017!20!28        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }        #檢查Lock Time的百分率。假設lock time佔領mongo已耗用時間的5%就普通警示。假設超過10%就嚴重警示define service{        host_name dbm1slave1        service_description Mongo Lock Percentage        check_command check_mongodb!lock!27017!5!10        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }# Check Average Flush Time,檢查mongo伺服器的平均flush時間,define service{        host_name dbm1slave1        service_description Mongo Flush Average        check_command check_mongodb!flushing!27017!100!200        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }# Check Last Flush Time,檢查最新的flush時間,假設超過200ms就普通警示。超過400ms就嚴重警示define service{        host_name dbm1slave1        service_description Mongo Last Flush Time        check_command check_mongodb!last_flush_time!27017!200!400        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }        # Check status of mongodb replicaset,檢查mongo複製的狀態define service{        host_name dbm1slave1        service_description MongoDB state        check_command check_mongodb!replset_state!27017!0!0        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }# Check status of index miss ratio,檢查索引命中率。define service{        host_name dbm1slave1        service_description MongoDB Index Miss Ratio        check_command check_mongodb!index_miss_ratio!27017!.005!.01        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }        # Check number of databases and number of collectionsdefine service{        host_name dbm1slave1        service_description MongoDB Number of databases        check_command check_mongodb!databases!27017!300!500        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }define service{        host_name dbm1slave1        service_description MongoDB Number of collections        check_command check_mongodb!collections!27017!300!500        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }                # Check size of a database,檢查庫的大小define service{        host_name dbm1slave1        service_description MongoDB Database size your-database        check_command check_mongodb_database!database_size!27017!300!500!your-database        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }                        # Check index size of a database,檢查庫索引的大小define service{        host_name dbm1slave1        service_description MongoDB Database index size your-database        check_command check_mongodb_database!database_indexes!27017!50!100!your-database        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }                    # Check index size of a collection,檢查集合collection的索引大小define service{        host_name dbm1slave1        service_description MongoDB Database index size your-database        check_command check_mongodb_collection!collection_indexes!27017!50!100!your-database!your-collection        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }        # Check the primary server of replicaset。檢查複製的primary服務define service{        host_name dbm1slave1        service_description MongoDB Replicaset Master Monitor: your-replicaset        check_command check_mongodb_replicaset!replica_primary!27017!0!1!your-replicaset         #示範範例:check_command check_mongodb_replicaset!replica_primary!27017!0!1!shard2        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }        # Check the number of queries per second,檢查每一秒的查詢數量define service{        host_name dbm1slave1        service_description MongoDB Updates per Second        check_command check_mongodb_query!queries_per_second!27017!200!150!update        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }        # Check Primary Connection,檢查複製中與primary庫的連線時間,超過2秒就普通警示,超過4秒就嚴重警示define service{        host_name dbm1slave1        service_description Mongo Connect Check        check_command check_mongodb!connect_primary!27017!2!4        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }# Check Collection State。檢查collection狀態,檢查mongo服務組列表的每個主機,能夠檢查重要collection的高可用性(鎖、逾時、服務配置的可用性)。假設發現一個查詢失敗就會警示。define service{        host_name dbm1slave1        service_description Mongo Collection State        check_command check_mongodb!collection_state!27017!your-database!your-collection        max_check_attempts 5        normal_check_interval 3        retry_check_interval 2        check_period 24x7        notification_interval 10        notification_period 24x7        notification_options w,u,c,r        contact_groups ops        }




4 ,查看部分監控項效果

配置完nagios端服務。重新啟動下service nagios restart; 等上幾分鐘,nagios監控介面就會出現完整的mongo服務資訊,例如以下所看到的:




5
,從 ps 中確定 mongodb 的架構

[[email protected] ~]# ps -eaf|grep mongo

mongodb   2457     1  0  2013 ?

        2-03:39:08 ./mongod --configsvr --dbpath /home/data/mongodb/config --port 20000 --logpath /home/data/mongodb/config.log --logappend --fork

mongodb   2804     1  0  2013 ?        1-10:02:33 mongos --configdb 192.168.12.62:20000,192.168.12.63:20000,192.168.12.72:20000 --port 30000 --chunkSize 64 --logpath /home/data/mongodb/mongos.log --logappend --fork

mongodb   3072     1  0  2013 ?

        1-10:17:20 mongod --shardsvr --replSet shard1 --port 27017 --dbpath /home/data/mongodb/shard11 --oplogSize 2048 --logpath /home/data/mongodb/shard11.log --logappend --fork

root     11179  9391  0 11:14 pts/1    00:00:00 grep mongo

mongodb  30414     1  0 Feb14 ?        1-06:20:50 mongod --shardsvr --replSet shard2 --port 27018 --dbpath /home/data/mongodb/shard21 --oplogSize 2048 --logpath /home/data/mongodb/shard21.log --logappend --fork

[[email protected] ~]#

 

看到有4個mongo進程,

a)         啟動參數有“--configdb”的就是叢集入口進程;

b)         Shard Server,啟動參數帶“--shardsvr --replSet”的是叢集分區的一個片組啟動進程,使用者儲存實際的資料區塊,也就是27017port和27018port的mongodb服務執行個體。至於怎樣推斷27017port中哪個是primary哪個是secondary須要去登入27107port運行rs.status();去查看一下。

c)         Config Server:啟動參數帶“--configsvr”的進程,儲存了整個Cluster Metadata,當中包含chunk資訊,也就是20000port的mongodb服務執行個體。

d)         Route Server:啟動參數帶“mongos --configdb”的進程,前端路由,client由此接入。且讓整個叢集看上去像單一資料庫,前端應用能夠透明使用。也就是30000port的mongodb執行個體。



6,調試中出現過的錯誤

錯誤1:

[[email protected] nagios ~]# tail -f /usr/local/nagios/var/nagios.log

[1412819956] Warning: Return code of 13 for check of service ‘Mongo Memory Usage‘ on host ‘dbm1slave1‘ was out of bounds.

[1412819956] SERVICE ALERT: dbm1slave1;Mongo Memory Usage;CRITICAL;SOFT;1;(Return code of 13 is out of bounds)

[1412819975] Warning: Return code of 13 for check of service ‘Mongodb Connect Check‘ on host ‘dbm1slave1‘ was out of bounds.

[1412819975] SERVICE ALERT: dbm1slave1;Mongodb Connect Check;CRITICAL;SOFT;1;(Return code of 13 is out of bounds)

[1412820058] Warning: Return code of 13 for check of service ‘Mongo Free Connections‘ on host ‘dbm1slave1‘ was out of bounds.

 

須要賦值nagios使用者全部許可權以及r運行許可權

chmod 770 /usr/lib/nagios/plugins/check_mongodb.py

chown -R nagios.nagios /usr/lib/nagios/plugins/check_mongodb.py

 

錯誤2:

監控介面Status Information一欄出現 No module named pymongo報錯提示資訊:

出現這個提示是由於須要安裝pymongo模組,運行easy_install pymongo命令安裝就可以。例如以下所看到的:

[[email protected] objects]# easy_install pymongo

Searching for pymongo

Reading http://pypi.python.org/simple/pymongo/

Best match: pymongo 2.7.2

......

zip_safe flag not set; analyzing archive contents...

Adding pymongo 2.7.2 to easy-install.pth file

 

Installed /usr/lib/python2.6/site-packages/pymongo-2.7.2-py2.6-linux-x86_64.egg

Processing dependencies for pymongo

Finished processing dependencies for pymongo

----------------------------------------------------------------------------------------------------------------

<著作權全部,文章同意轉載,但必須以連結方式註明源地址,否則追究法律責任!>
原部落格地址:http://blog.itpub.net/26230597/viewspace-1293589/
原黃杉 (mchdba)

----------------------------------------------------------------------------------------------------------------


參考文章:https://github.com/mzupan/nagios-plugin-mongodb/blob/master/README.md

 


Nagios監控mongodb分區叢集服務實戰

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.