"Linux" gdb Debug Core file

Source: Internet
Author: User

Writing server-side programs, it is easy to encounter crash problems, fortunately, Linux provides a core file, retaining the crash site. Sometimes, according to the current call stack, and print out the current stack of variables to analyze the cause of crash, but sometimes see the call stack is helpless. The following describes yourself through the combination of several commands of GDB and discovers a crash cause of the process.

Let's go into the scene and gradually discover the reasons.

First, or run the GDB command,gdb wbxgs core.5797, to see the scene.

[Email protected] bin]# gdb wbxgs_crash core.5797

GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4RH)

......

#0 0x00000038e8d70540 in strlen () from/lib64/tls/libc.so.6

(GDB) BT

#0 0x00000038e8d70540 in strlen () from/lib64/tls/libc.so.6

#1 0x000000000057cfc0 in T120_trace::text_formator::advance (this=0x7e800a70, lpsz=0x1 <Address 0x1 out of bounds> )

At./t120trace.cpp:1464

#2 0x000000000057ceb1 in t120_trace::text_formator::operator<< (this=0x7e800a70, lpsz=0x1 <Address 0x1 out of bounds>)

At./t120trace.cpp:1411

#3 0x0000000000407927 in ~func_tracer (this=0x7e804bd0) at.. /h/t120trace.h:381

#4 0x00000000004432fd in Cgssocketserver::readheader (this=0x8e4130, socketfd=1088,

Buf=0x7e806cc0 "Get/detectservice?cmd=selfcheck http/1.1/r/nconnection:close/r/nhost:10.224.122.94/r/n/r/n", bufsize=1024)

At mgr/gssocketserver.cpp:337

#5 0x0000000000443981 in Cgssocketserver::handle (this=0x8e4130, socketfd=1088, [e-mail protected]) at mgr/ gssocketserver.cpp:424

#6 0x0000000000442f5e in Cgssocketserver::readthread (PARG=0X9AE9C0) at mgr/gssocketserver.cpp:304

#7 0x00000038e980610a in Start_thread () from/lib64/tls/libpthread.so.0

#8 0x00000038e8dc68b3 in Clone () from/lib64/tls/libc.so.6

#9 0x0000000000000000 in?? ()

Through this call stack, it can be seen that the program crash when playing log. Although encountered similar crash, however, the reason is that there is a dead loop, through review code, did not find a dead loop. But the current call stack for the analysis of the cause of crash is no use, if the analysis of specific reasons? Would it be the other thread that got the error causing the program to crash on this thread? To find out the reason for the deep layer, try to see if there are any problems with the other threads by using some of GDB's thread-related commands. Then, using info threads, we looked at the situation of the thread at that time.

(GDB) Info Threads

Process 5797 0x00000038e8d7186d in memset () from/lib64/tls/libc.so.6

Process 5839 0x00000038e8dc6c8c in epoll_wait () from/lib64/tls/libc.so.6

Process 5842 0x00000038e8d8f7d5 in __nanosleep_nocancel () from/lib64/tls/libc.so.6

Process 5845 0x00000038e8d8f7d5 in __nanosleep_nocancel () from/lib64/tls/libc.so.6

+ Process 5846 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

Process 5847 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

Process 5848 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

+ Process 5849 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

Process 5850 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

Process 5852 0x00000038e8dbf946 in __select_nocancel () from/lib64/tls/libc.so.6

One process 5854 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

Ten process 5856 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

9 process 5857 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

8 process 5858 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

7 process 5859 0x00000038e8d8f7d5 in __nanosleep_nocancel () from/lib64/tls/libc.so.6

6 process 5861 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

5 process 5862 0x00000038e980a66f in sem_wait () from/lib64/tls/libpthread.so.0

4 process 5863 0x00000038e8d8f7d5 in __nanosleep_nocancel () from/lib64/tls/libc.so.6

3 process 5864 0x00000038e8d8f7d5 in __nanosleep_nocancel () from/lib64/tls/libc.so.6

2 process 5883 0x00000038e8d8f7d5 in __nanosleep_nocancel () from/lib64/tls/libc.so.6

* 1 Process 5853 0x00000038e8d70540 in strlen () from/lib64/tls/libc.so.6

It is normal for a thread to stop at sleep or wait, but we see that thread 21 has some exceptions, the program stops at Memset, and whether or not there is a problem, you need to see if there is a specific error.

Then, through the command thread 21, go to the call stack of thread 21.

(GDB) Thread

[Switching to Thread (process 5797)] #0 0x00000038e8d7186d in memset () from/lib64/tls/libc.so.6

(GDB) BT

#0 0x00000038e8d7186d in memset () from/lib64/tls/libc.so.6

#1 0x000000000049da0d in Cgspdufactory::streamstringfrom ([email protected], [e-mail protected]) at Common/pdu/gspdu.cpp : 422

#2 0x00000000004d1f25 in Cgsothsharduserrsppdu::streamfrom (this=0x2aaaec951650, [e-mail protected]) at common/pdu/ pdugs.cpp:2707

#3 0x000000000049cb2d in Cgspdufactory::d ERIVEPDU ([e-mail protected], ulpdulen=30506) at common/pdu/gspdu.cpp:79

#4 0x000000000049c78e in Cgspdufactory::streampdufrom (PDATAPACKET=0X2AAAECA31D70) at common/pdu/gspdu.cpp:35

#5 0x0000000000449681 in Cgswdmsmanager::on_wdms_message_indication (this=0x8e3680, msg=0x2aaae9894360)

At mgr/gswdmsmanager.cpp:344

......

#18 0x0000000000407733 in Main (Argc=1, argv=0x7fff9b44ac98) at gsmain.cpp:118

(GDB) F 3

#3 0x000000000049cb2d in Cgspdufactory::d ERIVEPDU ([e-mail protected], ulpdulen=30506) at common/pdu/gspdu.cpp:79

Common/pdu/gspdu.cpp:no such file or directory.

In Common/pdu/gspdu.cpp

Use the command I locals to print the values of all variables.

(GDB) I locals

PPDU = (CBASEPDU *) 0x2aaaec951650

Ppduheader = (Cpduheader *) 0x2aaaea1c4190

Ulpdutype = 50

Until now there is no obvious anomaly, then print the PDU's head as follows:

(GDB) P *ppduheader

$ = {M_ulheadlen =, m_ulversion = 2080000, M_ulpdutype =, M_ulsrcsvrtype = webex_connect_gs, m_strsrcsvraddr = {

static NPOs = 18446744073709551615,

_m_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data Fields>}, <no data fields>},

_m_p = 0x2aaaeca52a68 "10.224.95.109:9900"}, M_strsubject = {static NPOs = 18446744073709551615,

_m_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data Fields>}, <no data fields>},

_m_p = 0x2aaaec929b28 "Qawin.qazone.GS"}, M_ulsequence = 0}

As can be seen from the Blue Word section, this PDU is sent from the 10.224.95.109 server.

At that time the QA test environment, are 10.224.122 start IP server, how can have this IP PDU, so, ask QA, found 10.224.95.109 This server is other Datacenter Server, And the old version, due to the current test environment version of the removal of two PDUs, while adding four PDUs, resulting in the old PDU sent to the time, the new version of it as a new PUD parsing, resulting in incorrect parsing, resulting in the resolution of the wrong length. All local variables can be viewed through the F 1 command into the first level call stack.

(GDB) F 1

#1 0x000000000049da0d in Cgspdufactory::streamstringfrom ([email protected], [e-mail protected]) at Common/pdu/gspdu.cpp : 422

422 in Common/pdu/gspdu.cpp

(GDB) I locals

strtmp = 0x2aaaf1c00010 ""

IRet = 0

Ullen = 1179995975

It can be seen that the parsed length is a very large value 1179995975, while thread 21 formally stops after allocating memory, and when using memset, stop there. As you can see from log, thread 21 is also stuck here, and it's not going to work anymore.

Because there were two server crash at the time, by looking at the other server's core file, another server was found to be the same call stack as the server. After QA has updated the version of 10.224.95.109 , crash no longer appears.

Through this example, it can be seen that when the server appears crash, although the current call stack may not be of value, but by analyzing all the threads of the call stack, it is possible to analyze the clues, which can help solve the problem of crash.

This problem can be learned, when modifying the interface between the server, it is important to consider the compatibility with the old version of the problem, even if the PDU may never be used, still need to retain, because production on the first GSB, and then on primary, There is a case that two versions will be running at the same time. Failure to remove or change the PDU sequence may result in the entire system not functioning.

Hope this article, to solve crash problem and avoid similar crash problem has certain reference function.

"Linux" gdb Debug Core file

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.