This problem is a mistake when I develop a heartbeat server. In fact, the error is very simple. I want to write an article to tell you that the warning during compilation is very important! Due to the large amount of project code and a large amount of information during the compilation period, I neglected a warning message during the compilation period while busy debugging. In fact, this warning is really convenient to solve the problem, I ignored it and it took more time to locate the problem from the core. The purpose of this article is to remind everyone of this naive and easy-to-make mistake by mistake: please do not ignore any warnings during the compilation period, it greatly saves time for locating bugs! At the same time, this article is positioning illegal
When an instruction error occurs, it also shows what is wrong with the core code behavior displayed by GDB's BT. The heartbeat Server is a multi-threaded server that provides UDP and HTTP Services. log4cpp is used for log recording. Due to various causes of project delay, I haven't touched the code of this server for a while. Now I have changed a lot of code, passed functional testing, and started stress testing. When the pressure client simulates hundreds of clients, there is no problem until or, the coredump core dump starts to appear occasionally. The problem can be reproduced, but it must be difficult to keep pace under heavy pressure. Therefore, only core files can be generated and analyzed first:
gdb ./rhs core.7714 ...Core was generated by `./rhs'.Program terminated with signal 4,Illegal instruction.[New process 7716][New process 7721][New process 7720][New process 7719][New process 7718][New process 7717][New process 7715][New process 7714]#0 0x000000000040f06c in HeartbeatProcesser::compete (this=0x149dcac0) at ProcessHeartbeat.cpp:405405 break;
It's strange that the core is on the break line of code. It's a bit confusing. How can break be suspended? Strange. The code structure is as follows:
Int heartbeatprocesser: compete () {int procnum = 0; const struct timeval & now = getsystemtime (); For (unsigned int I = 0; I <m_vecagentstatus.size (); I ++ ){... do {... ++ procnum ;... if (procnum> = m_icompetemaxnumonetime ){... break; // This is the 405th line! The core is here! How is it possible? } Info_log ("ID [% s]...", packet-> Strid,...);...} while (...);} return procnum ;}
I can't find the reason for analyzing the code branch that break used to go through. Next, let's see which line of Assembly statement is hanging:
(gdb) x/i $pc0x40f06c <_ZN18HeartbeatProcesser7competeEv+752>: ud2a
It's a bit abrupt. ud2a is usually a problem during compilation! Then compare the compete function to see the compiled assembly code:
(GDB) disas competedump of javaser code for function _ zn18heartbeatprocesser7competeev :...... // The following sentence corresponds to int procnum = 0. Among them,-0x44 (% RBP) is the procnum variable, which is often used in the following analysis: 0x0000000000000040ed8c <_ zn18heartbeatprocesser7competeev + 16>: movl $0x0,-0x44 (% RBP) 0x0000000000000040ed93 <_ zn18heartbeatprocesser7competeev + 23>: callq 0x4161fc <_ z13getsystemtimev> ...... // The following two rows correspond to + 770 for (unsigned int I = 0; I <m_vecagentstatus.size (); I ++) where-0x34 (% RBP) is the I variable 0x000000000040edb8 <_ zn18heartbeatprocesser7competeev + 60>: movl $0x0,-0x34 (% RBP) 0x00000040edbf <_ zn18heartbeatprocesser7competeev + 67>: jmpq 0x40f082 <_ zn18heartbeatprocesser7competeev + 774> ...... // The following sentence corresponds to + procnum; as mentioned above,-0x44 (% RBP) is the procnum variable 0x000000000040eeb4 <_ zn18heartbeatprocesser7competeev + 312>: addl $0x1, -0x44 (% RBP )...... // The following three rows correspond to If (procnum> = m_icompetemaxnumonetime) {, where 0xf8 (% Rax) is bytes <_ zn18heartbeatprocesser7competeev + 707>: mov 0xf8 (% Rax ), % eax0x000000000040f045 <_ zn18heartbeatprocesser7competeev + 713>: CMP-0x44 (% RBP), % eax0x000000000040f048 <_ zn18heartbeatprocesser7competeev + 716>: JG 0x40f06c <_ 0000+ 752> // the following code is executed only when the if condition is true. From 718 to 745, it is only for logging. 0x0000000000000040f04a <_ zn18heartbeatprocesser7competeev + 718>: mov-0x30 (% RBP), % rax0x000000000040f04e <_ zn18heartbeatprocesser7competeev + 722>: mov 0x18 (% Rax), % ecx0x0000000000000040f051 <_ 0000+ 725>: moV 0x26d5c0 (% rip), % RDI #0x67c618 <perflog> 0x000000000040f058 <_ zn18heartbeatprocesser7competeev + 732>: mov-0x44 (% RBP ), % edx0x000000000040f05b <_ 0000+ 735>: mov $0x4563b0, % esi0x0000000000000040f060 <_ 0000+ 740>: mov $0x0, % eax0x0000000000000040f065 <_ 0000+ 745>: callq 0x41d590 <_ zn7log4cpp8category6noticeepkcz> // In fact, the following line is break. As you can see, the 0x000000000040f06a core cannot be dropped. <_ zn18heartbeatprocesser7competeev + 750: JMP 0x40f07e <_ zn18heartbeatprocesser7competeev + 770> // The core is in this line, that is, info_log (this line of statement 0x000000000040f06c <_ zn18heartbeatprocesser7competeev + 752>: ud2a // followed by for (unsigned int I = 0; I <m_vecagentstatus.size (); I ++) in the loop I ++, condition judgment and other 0x000000000040f07e <_ zn18heartbeatprocesser7competeev + 770>: addl $0x1,-0x34 (% RBP )...... 0x000000000040f09a <_ zn18heartbeatprocesser7competeev + 798>: JNE 0x40edc4 <_ zn18heartbeatprocesser7competeev + 72> // The following is the compete function return, corresponding to return procnum ;, generally, the returned value is 0x0000000000000040f0a0 <_ 0000+ 804>: mov-0x44 (% RBP), % eax0x0000000000000040f0a3 <_ zn18heartbeatprocesser7competeev + 807>: add $0x98, % rsp0x000000000040f0aa <_ zn18heartbeatprocesser7competeev + 814>: Pop % rbx0x0000000000000040f0ab <_ 0000+ 815>: leaveq 0x0000000000000040f0ac <_ 0000+ 816>: retq
Now I understand that the break statement of the C code displayed after bt is incorrect (may be caused by compilation optimization )! The assembly code is displayed on the statement that prints logs in the info () line. Of course, the Assembly prevails! So when the IF (procnum> = m_icompetemaxnumonetime) condition is false, it is actually to execute info_log ("ID [% s]...", packet-> Strid,
...); But why is the compiler displayed as ud2a? When reading the code, we found that the 1st parameters in info_log ("ID [% s] were temporarily changed to packet-> Strid, and this Strid is not char *, it is a string object in STL in C ++! A very simple error. In fact, the compiler has long been aware of this typo problem, but I ignored the make result of printing several screens and started testing after the compilation is complete. The warning information during compilation is clear:
[root@houyi-vm02 rhs0.1]# makecd src/server; make all ... ...g++ -c -I../../include/ -Wall -g -fpermissive -DCM_UNIX -DCM_LINUX -DCM_DEBUG -o ProcessHeartbeat.o ProcessHeartbeat.cpp ProcessHeartbeat.cpp: In member function âint HeartbeatProcesser::compete()â:ProcessHeartbeat.cpp:408: warning: cannot pass objects of non-POD type âstruct std::stringâ through â...â; call will abort at runtime... ...
It can be seen that the GCC prompt is very clear, and the wrong string object is used! After writing so much, I would like to say that in every compilation process, warning information during the compilation period should be taken very seriously, which will greatly save time for problem locating, otherwise, you will have to find out what went wrong.