Solve the problem that Zabbix agent is not reachable after Zabbix use for a period of time

Source: Internet
Author: User

Zabbix use after a period of time always reported Zabbix agent unreachable, error text as follows:

Zabbix Server Messages:PROBLEM:Zabbix agent on Zabbix Server was unreachable for 5 minutes

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M02/5C/19/wKiom1UaYVShGSy9AAB4dKsXH8g489.jpg" width= "422" Height= "/>"

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M00/5C/19/wKiom1UaYVXyfIrPAAKDYiKeydE204.jpg" width= "1001" height= "228"/>

First look at the log of the Zabbix agent, find the critical error message, the log is as follows:

From:/tmp/zabbix_agentd.log

Mysqladmin:connect to server at ' localhost ' failed
Error: ' Can ' t connect to local MySQL server through socket '/tmp/mysql.sock ' (2) '
Check that Mysqld was running and that the socket: '/tmp/mysql.sock ' exists!

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M02/5C/13/wKioL1UaYpOAkGowAATVvyNZLzA994.jpg" width= "1002" height= "357"/>

Thus, the Zabbix agent can not connect to the database (as an administrator should be clear Zabbix agent is not connected to the database), specifically, cannot pass the/tmp/ Mysql.sock connect to the local database server, because this is a socket file, its default permissions are for other users or groups of users to develop read and write permissions. For example, view the current configuration:

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M01/5C/19/wKiom1UaYVmQONPnAAB1Y5S17Gw410.jpg" width= "521" Height= "/>"

And the database service is running, and the socket file does exist, and permissions are normal. and through the command line can verify that through the socket file is indeed able to connect.

650) this.width=650; "title=" image "style=" border-top:0px;border-right:0px;border-bottom:0px;border-left:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M02/5C/19/wKiom1UaYVrj990dAAB9Nu7HD5w621.jpg" height= "126" />

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M01/5C/13/wKioL1UaYpvgSc4EAAFaLnazjRM253.jpg" width= "1120" height= "111"/>

Problem analysis and solution ideas:

Database server is always normal, this as an administrator even if not run any command can also be aware that should not be the database server itself, but can not be ruled out with the client connection method.

View the configuration file for the MySQL database

[Email protected] ~]# DELSC/ETC/MY.CNF
[Client]
Port = 3306
Socket =/tmp/mysql.sock
[Mysqld]
Port = 3306
Socket =/tmp/mysql.sock
DataDir =/usr/local/mysql/var
Skip-external-locking
Skip-name-resolve
Key_buffer_size = 384M
Max_connections = 5000
Max_allowed_packet = 1M
Table_open_cache = 64K
Sort_buffer_size = 128M
Net_buffer_length = 8K
Read_buffer_size = 256K
Read_rnd_buffer_size = 512K
Myisam_sort_buffer_size = 128M
Slow-query-log = 0
Tmp_table_size = 8G
Max_heap_table_size = 8G
Table_cache = 512
Binlog_cache_size = 6144M
Query_cache_type = 1
Query_cache_size = 128M
Query_cache_limit = 128M
Query_cache_min_res_unit = 1024
Myisam-recover-options = BACKUP
Innodb_data_home_dir =/usr/local/mysql/var
Innodb_data_file_path = Ibdata1:10m:autoextend
Innodb_log_group_home_dir =/usr/local/mysql/var
Innodb_buffer_pool_size = 8G
Innodb_write_io_threads = 8
Innodb_read_io_threads = 8
Innodb_thread_concurrency = 16
Innodb_file_format = Barracuda
Innodb_log_file_size = 512M
Innodb_log_buffer_size = 64M
Innodb_flush_log_at_trx_commit = 1
Innodb_flush_method = O_direct
Innodb_lock_wait_timeout = 50
Innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
Innodb_lock_wait_timeout = 120
Innodb_file_format = Barracuda
Innodb_use_sys_malloc = 0
Innodb_additional_mem_pool_size = 2G
innodb_file_per_table = 1
[Mysqld_safe]
Log-error =/usr/local/mysql/var/mysql-error.log
Pid-file =/usr/local/mysql/var/mysql.pid
[Mysqldump]
Quick
Max_allowed_packet = 16M
[MySQL]
No-auto-rehash
[Myisamchk]
Key_buffer_size = 512M
Sort_buffer_size = 512M
Read_buffer = 8M
Write_buffer = 8M
[Mysqlhotcopy]
Interactive-timeout
[Email protected] ~]#

Found that there is a socket =/tmp/mysql.sock line in [client], it is possible (as the case may refer below) that the MySQL client will automatically connect using the socket when connected by default.

Check the configuration file of the Zabbix agent to see if there is configuration information via the socket connection to MySQL (as mentioned earlier, as the administrator should be aware that Zabbix agent is not connected to the database, it should be accurate to say that the default is not to connect to the database, as shown in Zabbix The agent added a MySQL-related monitoring item, which used the MySQL program, but did not write the socket option (in fact using localhost as the connection host name would use the socket)).

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M02/5C/13/wKioL1UaYpyCM9_oAAUdomQRMP8706.jpg" width= "1134" height= "474"/>

To narrow down the problem, first comment out the paragraph in the/etc/my.cnf file about the socket in [client] (you should know that there are no other client connections in advance). Note: If you comment out the socket in [client], you need to specify the hostname, port, and password in the native MySQL client program (including Mysqladmin,mysqldump, etc.), or the client program will find the default location for the socket "/var/ Lib/mysql/mysql.sock ", and the location of this socket file is not necessarily this.

What is the difference between a MySQL connection and a port number connection through a socket connection? The MySQL client on the UNIX platform can connect to the MySQL server in two different ways: by using a UNIX socket with a file in the file system (/tmp/mysql.sock by default), or by using TCP/IP to connect through the port number. UNIX socket files are faster than TCP/IP, but can only be used when connecting to servers on the same computer. If the specified host name is not specified or a special host name of localhost is specified (note here that if the host name localhost that specifies the connection will use a socket connection), a UNIX socket will be used. The socket connection can be understood as specifying the ip+ port.

Therefore, according to the above theory to convert localhost to 127.0.0.1, cancel the socket connection mode, instead of TCP/IP connection. In fact, in retrospect, if you specify the host name (in addition to localhost) or IP address, port number, you do not have to use the socket, so you can remove the socket that comment, so that the administrator usually connect debugging.

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M00/5C/13/wKioL1UaYpzyGIPnAAChecqyRTw446.jpg" width= "530" height= "122"/>

Because Zabbix server is going to connect to the database, you can also check it by the way. There is a good result: about Dbport There is a useful comment on the "Database port when not using the local socket." Ignored for SQLite. " This means that if the socket is not used, the dbport is used, and the default dbport is not used and the socket is preferred.

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M01/5C/13/wKioL1UaYpzApnhBAACGetKuGt0103.jpg" width= "535" height= "236"/>

Therefore, this dbport is also set to the enabled state.

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M00/5C/13/wKioL1UaYp3haHZqAAGKofdrgQQ722.jpg" width= "543" height= "354"/>

Other possible influencing factors:

Iptables rules-A input-p tcp-m State--state new-m TCP--dport 3306-j accept is also normal, TCP--0.0.0.0/0 0.0.0.0/0 state NEW TCP dpt:3306.

SELinux is closed in advance.

However, after revising the above mentioned, the problem remains, Zabbix still error (Zabbix server Messages:PROBLEM:Zabbix agent on Zabbix Server are unreachable for 5 minutes), After viewing the logs of the Zabbix server, I found that there was a message (later found that the database ran out of information):

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M02/5C/19/wKiom1UaYWHT76l4AAB-trKPXR8900.jpg" width= "930" Height= "/>"

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M00/5C/19/wKiom1UaYWKRZsD5AAHMgYX5w3Y263.jpg" width= "1086" Height= "/>" for the first error: Test this key through the command line to find that the key can be obtained data, but the speed is relatively slow (more than three seconds to obtain data), so also need to check whether the item has a timeout-like value setting.

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M02/5C/13/wKioL1UaYqLC0QzvAAEnavlhX8M839.jpg" width= "793" height= "211"/>

It is known that, although ntp[pool.ntp.org] data can be obtained, it takes a long time, but does not find the settings associated with timeouts in the Zabbix Web management interface, so consider switching to an NTP server or simply disabling it. It was later found that the key could not get the data when it triggered the Zabbix Agent unreachable alarm.

For the second error: The problem can be complex, such as the size of packets allowed to be passed by the database (max_allowed_packet), database query time-out, (Connect_timeout, wait_timeout), and so on. Raw configuration data:

650) this.width=650; "title=" image "style=" border-right-width:0px;border-bottom-width:0px;border-top-width:0px; " Border= "0" alt= "image" Src= "http://s3.51cto.com/wyfs02/M00/5C/13/wKioL1UaYqay5yNkAAQ0fum2NqI728.jpg" width= "790" height= "764"/>

Therefore, the Max_allowed_packet = 1M is changed to a larger size, for example, Max_allowed_packet = 2M.

After several operations, it is found that Zabbix Server no longer generates an Zabbix agent unreachable alarm.

Small summary:

(1) Reduce the problem area as quickly and effectively as possible, and use agile methods to minimize downtime.

(2) can solve the problem first, do not study slowly, but can slowly solve the problem, it is necessary to study carefully.

(3) Encounter small problems must be treated as if facing big problems, lest small problems develop into big problems.

(4) To quickly locate a problem, you need to be aware of your own server environment, the relationship between each component, component, and component in your entire work environment.

(5) If you do not maintain these services, be sure to communicate with colleagues in a timely manner.

(6) Do a good job of the problem record, warm and know new, even if the time to write a blog post is also worthwhile.

--end--

This article is from "Communication, My Favorites" blog, please make sure to keep this source http://dgd2010.blog.51cto.com/1539422/1626956

Solve the problem that Zabbix agent is not reachable after Zabbix use for a period of time

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.