Your java/c/c++ program has collapsed? Secret segment Error (segmentation fault) (3)

Source: Internet
Author: User
Tags signal handler dmesg

Preface

Connected to two articles:

Why is your C/C + + program not working? Secret segmentation Fault (1)
Why is your C/C + + program not working? Secret Segmentation Fault (2)

Write here, the more follow, the more you find really is the kernel is very white, non-general white.
But since it is a study, we will set the heart and make the mistake of the paragraph clear.

This article will be used as a saga to end this series, but also as a bit of error and program debugging, to find out the cause of the crash (usually not give you so perfect stackstrace and human error hint) in depth.

Tools or commands used in this article:
    1. Dmesg
    2. Strace
    3. Gdb
    4. Linux kernel 3.10 Source code
Scene Reproduction

The last two articles revolve around one of these issues:

//野指针char ** p;//零指针或空指针NULL;//段错误(Segmentation Fault)*p = (char *)malloc(sizeof(char));
Problem Code

For the readability of this article, weave the problem code around the above questions:

#include "stdio.h"#include "string.h"#include "stdlib.h"int main(int argc,char** args) {    char * p = NULL;    0x0;}
Segment Error

Find the problem 1th Step strace Check Signal description

The previous article has introduced gbd+coredump the method to find the code that appears error, this article directly strace:

-i-x-o segfault../segfault.o

Get the following information:

Can know:

1. Error signal: SIGSEGV
3. Error code: SEGV_MAPERR
3. Error memory Address: 0x0
4. There was an error at the logical address 0x400507.

Can guess:

There is a null pointer access in the program attempting to 0x0 throw a segment error to the write.

2nd Step DMESG Check the error scene

On DMESG:

dmesg

Get:

Know:

1. Error type: Segfault, segment Error (segmentation Fault).
2. Error ip:0x400507
3. Error Number: 6, or 110

3rd Step to collect known conclusions

Here 错误号和ip is the key, the error number against the following:

/* Page fault error code bits: * * bit0==0:NoPage found1: Protection Fault * bit1==0:ReadAccess1:WriteAccess * bit2==0: Kernel-mode Access1: User-mode access * bit3==1: Useof reserved bit detected * bit4==1: Fault is an instruction fetch*//*enumX86_pf_error_code {Pf_prot =1<<0, Pf_write =1<<1, Pf_user =1<<2, PF_RSVD =1<<3, Pf_instr =1<<4,    };*/

After the control:

Error number 6 = = = (Pf_user | Pf_wirte | 0).
That is, "user-state", "write-page error", "no page corresponding to the specified address."

The above information coincides with our initial inference.

Now, the current known conclusions are summarized as follows:

1. Error type: Segfualt, segment Error (segmentation Fault).

2. Error ip:0x400507

3. Error Number: 6, or 110

4. Error code: Segv_maperr The address is not mapped to an object.

5. Reason for error: A 0x0 segment error was raised on a write operation because 0x0 there is no page or map associated with it.

4th step find the error code according to the conclusion

On GDB:

gdb ./segfault.o

According to the conclusions of the ip = 0x400507 immediate get:

Clearly, this validates our conclusion:

We tried to write the value to 0x0 the address to 0x0 throw a segment error that was written to the unmapped address.

And we found the 9th line of the wrong code STACK.C:

root and Trace

Obviously, we are not satisfied with this, why did the visit 0x0 cause this error to crash the program?

The second article has already said the problem of the virtual address space of the process, in fact, when we write operations, the virtual address to the physical address of the mapping, because you will eventually be the data (this is 0x0, note and our address 0x0 distinction) to write to physical memory.

0x0is a logical address, Linux page-managed memory mapping, 0x0 does not correspond to any page, then there is no home page in memory, so write to it will cause a fault, this part of the Linux Memory Map Management Module (mapping, abbreviated mm) processing.

error handling of pages 1. __do_page_fault

After the missing pages into the __do_page_fault process, note, here in order to minimize space, delete some of the source code comments, and our relevant hit code is commented:

/ * * This routine handles page faults. It determines the address, * and the problem, and then passes it off to one of the appropriate * routines. */Static void__kprobes__do_page_fault (structPt_regs *regs,unsigned LongError_code./ * Note our error is 6, that is, * /){structVm_area_struct *VMA;structTask_struct *tsk;unsigned LongAddressstructMm_struct *mm;intFaultintWrite = Error_code & pf_write;unsigned intFlags = Fault_flag_allow_retry |                    fault_flag_killable | (write?) Fault_flag_write:0);    Tsk = current; MM = tsk->mm;/ * This will take us to the address =0x0 * *    /* Get the faulting address: */Address = Read_cr2 ();if(Kmemcheck_active (regs)) kmemcheck_hide (regs); PREFETCHW (&mm->mmap_sem);if(Unlikely (Kmmio_fault (regs, address))return;if(Unlikely (Fault_in_kernel_space (address))) {//omitted here, will not hit        /* ... */        return; }//Omit a lot of code    // ...Retry:down_read (&mm->mmap_sem); }Else{Might_sleep (); } VMA = FIND_VMA (mm, address);if(Unlikely (!VMA)) {/ * Go here to process * /Bad_area (Regs, Error_code, address);//return after processing        return; }//Omit a lot of code    // ...}
2. Bad_area

One of the key callsbad_area(regs, error_code, address);

staticvoidbad_area(structunsignedlongunsignedlong address){    /* 注意这里讲错误码设为了SEGV_MAPERR */    __bad_area(regs, error_code, address, SEGV_MAPERR);}

Can clearly

The source of the segv_maperr in our conclusion.

This type is not meant to be mapped to an Object! Look underneath the strace get something, which
si_code=SEGV_MAPERR.

---SIGSEGV{si_signo=SIGSEGV,si_code=SEGV_MAPERR,si_addr=0}---+++killedbySIGSEGV(coredumped)+++

Will come here at last:

Static void__bad_area_nosemaphore (structPt_regs *regs,unsigned LongError_code,unsigned LongAddressintSi_code) {structTask_struct *tsk = current;/ * Our error code is 6 = 110,pf_user = 100, so it will enter this if * /    if(Error_code & Pf_user) {/ * off interrupt * /Local_irq_enable ();//... Slightly        if(Address >= task_size) error_code |= Pf_prot;/ * This will print the error message * /        if(Likely (show_unhandled_signals)) show_signal_msg (regs, Error_code, Address, tsk);        TSK-&GT;THREAD.CR2 = address;        Tsk->thread.error_code = Error_code; TSK-&GT;THREAD.TRAP_NR = X86_TRAP_PF;/ * This will force the sigsegv= segment error signal to be sent * /Force_sig_info_fault (SIGSEGV, Si_code, address, tsk,0);return; }//... Slightly}

Note the two key calls to the above code:

show_signal_msg  //用于打印出错信息force_sig_info_fault  //用于强制发送信号
3. Show_signal_msg
/* * Print out info for fatal Segfaults, if the show_unhandled_signals * sysctl is set: */Static inline voidShow_signal_msg (structPt_regs *regs,unsigned LongError_code,unsigned LongAddressstructTask_struct *tsk) {//... Slightly    / * Print segment error message,/PROC/KMSG * /Printk"%s%s[%d]: Segfault at%LX IP%p SP%p error%LX", TASK_PID_NR (TSK) >1? Kern_info:kern_emerg, Tsk->comm, Task_pid_nr (tsk), address, (void*) Regs->ip, (void*) REGS-&GT;SP, error_code); PRINT_VMA_ADDR (Kern_cont"in", REGS-&GT;IP); PRINTK (Kern_cont"\ n");}

The code that prints the wrong piece of information is what we get with DMESG.

You can compare the graphs of our segment errors:

4. Force_sig_info_fault

Finally, the signal is sent.

Static voidForce_sig_info_fault (intSi_signo,intSi_code,unsigned LongAddressstructTask_struct *tsk,intFault) {unsignedLSB =0;    siginfo_t info;    Info.si_signo = Si_signo; Info.si_errno =0;    Info.si_code = Si_code; INFO.SI_ADDR = (void__user *) address;if(Fault & vm_fault_hwpoison_large) LSB = Hstate_index_to_shift (Vm_fault_get_hindex (fault));if(Fault & Vm_fault_hwpoison) LSB = Page_shift; INFO.SI_ADDR_LSB = LSB;/ * Force send SIGSEGV signal * /Force_sig_info (Si_signo, &info, tsk);}

Force_sig_info:

intForce_sig_info (intSigstructSiginfo *info,structTask_struct *t) {unsigned Long intFlagsintRET, blocked, ignored;structK_sigaction *action; Spin_lock_irqsave (&t->sighand->siglock, flags);/ * The signal handler is specified here * /Action = &t->sighand->action[sig-1];//... Slightly    / * must be forced to send * /    if(Action->sa.sa_handler = = SIG_DFL)/ * Do not need to send SEGSIGV signal recursively, so clear out signal_unkillable * /T->signal->flags &= ~signal_unkillable;//Sendret = Specific_send_sig_info (sig, info, T); Spin_unlock_irqrestore (&t->sighand->siglock, flags);returnRET;}

The above code tells us how the handler for the signal is specified, so the signal about the segment error SEGSIGV is the default core dump .

5. Core Dump

In this case, we can already get the core dump, so the second one finds the code that throws the segment error, which is the recommended approach:

gdb ./segfault.o core.36054

Is it immediately clear that the stack.c 9th line of code *p = 0x0 is the culprit?

Conclusion

In this case, the whole section of the wrong exploration is over, and I hope the reader is as rewarding as I am.

Lists several common segment error causes:

1. Array out of bounds

    int a[10] = {0,1};    printf("%d",a[10000]);

2.0 pointer or null pointer

    //本系列所用实例    char * p = NULL;    0x0;

3. Floating Hands

If the pointer p is suspended, the address it points to may or may not, and you do not know when the address is written and when it is protected (mprotect).
If it is protected as readable, you write a paragraph error!

4. Access rights, unauthorized access

See 3.

5. Multi-threaded operation on shared pointer variables

Not only in c/c++,android, Java programs may also appear JVM crashes Oh, check the multi-threaded shared variable bar!

If you have any mistake, please advise me.

Your java/c/c++ program has collapsed? Secret segment Error (segmentation fault) (3)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.