Linux kernel elv_queue_empty wild pointer access kernel fault location and resolution

Source: Internet
Author: User

1. Fault description

fault operation steps:

A single board with a u disk, before the problem was passed ftp go to veneer Copy the file, copy the process of the single board automatically restart.

symptom:

due to oops @  0xffffffffc08d3d84

[0]MORE> 

[0]kdb>  

2. Information collection

[0]kdb> BT

Stack Traceback for PID 4

0xc000000594069e38 4 2 1 0 R 0xc00000059406a1a0 *ksoftirqd/0

STACK:FEFC4D0E7B71A7AF c00000058b4be318 c000000489904298 ffffffffc08d8174

0000000000000001 00000000f0000000 c000000489904258 Ffffffffc0950f14

C000000594093C30 C000000594093C30 ffffffffc0e0f298 ffffffffc094ad00

c0000004b9780368 C00000058B5D6F30 c00000058b4be318 c00000058b4be318

FFFFFFFFFFFFFFFB 0000000000000000 0000000000000000 C000000561912de8

c00000058b4be318 ffffffffc09534a0 C00000058B5D6F30 C000000561912de8

0000000000040000 Ffffffffc09536ac 0000000000000000 c000000244074000

0000000000000000 ffffffffc0dddaa0 0000000000000101 FFFFFFFFC0DDDA80

ffffffffc0e71140 0000000000000000 ffffffffc0faf480 Ffffffffc0f48c40

ffffffffc0de0000 ffffffffc08df110 C000000594093D20 C000000594093D20

...

Call Trace: [Jiffies:0x1003fe6ae]

[<ffffffffc08d3d84>] elv_queue_empty+0x24/0x48

[<ffffffffc08d7ee0>] __blk_run_queue+0x38/0x1d8

[<ffffffffc08d8174>] blk_run_queue+0x2c/0x50

[<ffffffffc0950f14>] scsi_run_queue+0x10c/0x418

[<ffffffffc09534a0>] scsi_next_command+0x48/0x68

[<ffffffffc09536ac>] scsi_io_completion+0x16c/0x550

[<ffffffffc08df110>] Blk_done_softirq+0x98/0xb0

[<ffffffffc0679a58>] __do_softirq+0x120/0x1f0

[<ffffffffc0679ba0>] do_softirq+0x78/0x80

[<ffffffffc0679ca0>] ksoftirqd+0xf8/0x250

[<ffffffffc068fe9c>] kthread+0x94/0xa0

[<ffffffffc0638ef0>] kernel_thread_helper+0x10/0x20

<1>CPU 0 Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6bab, EPC = = ffffffffc08d3d84, RA = = Ffffffffc08d7ee0

Cpu 0

$0:0000000000000000 0000000000000014 6b6b6b6b6b6b6b6b c00000058b4be318

$4:c00000058b4be318 c0000004b83837e0 0000000000000000 c0000004b9780270

$8:0000000000000004 c0000004b97803b0 0000000000000001 0000000000275c43

$12:0000000000000028 ffffffffc0607568 ffffffffc0694b98 1ebdefda014b0000

$16:c00000058b4be318 c000000489904298 c00000058b4be318 C0000004c6c59a38

$20:C0000004C6C59A10 0400000000000000 FFFFFFFFFFFFFFBF 0000000000000001

$24:0000000000000004 FFFFFFFFC06D7F50

$28:c000000594090000 c000000594093bf0 ffffffffc0fc0000 Ffffffffc08d7ee0

hi:0000000000000000

lo:0000000000000400

epc:ffffffffc08d3d84 elv_queue_empty+0x24/0x48

Not tainted

Ra:ffffffffc08d7ee0 __blk_run_queue+0x38/0x1d8

Status:5400ffe2 KX SX UX KERNEL EXL

cause:00800008

Badva:6b6b6b6b6b6b6bab

[0]kdb> MD 0XC0000004B83837E0
0xc0000004b83837e0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b KKKKKKKKKKKKKKKK
0xc0000004b83837f0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b KKKKKKKKKKKKKKKK
0xc0000004b8383800 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b KKKKKKKKKKKKKKKK
0xc0000004b8383810 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b KKKKKKKKKKKKKKKK
0xc0000004b8383820 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b KKKKKKKKKKKKKKKK
0xc0000004b8383830 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b KKKKKKKKKKKKKKKK
0xc0000004b8383840 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b KKKKKKKKKKKKKKKK
0xc0000004b8383850 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6ba5 kkkkkkkkkkkkkkk.

3. Failure analysis

Disassemble the Elv_queue_empty function and calculate the exception instruction as

"ffffffffc08d3e98:dca20000 ld v0,0 (A1)

ffffffffc08d3e9c:dc590040 ld t9,64 (v0) "

For the C code snippet for

"If (E->OPS->ELEVATOR_QUEUE_EMPTY_FN)"

whichA1 (e) = C0000004b83837e0, V0 (e->ops) = 6b6b6b6b6b6b6b6b, 0x6bis aFree Poisoningeigenvalues, which can be inferredRequest_queuein theElevatorfield points to theElevator_queueobject has beenKfree, butElevatorThe field also records its address, causingE->opsto an illegal value.

From the data features of the c0000004b83837e0 address dump , the total bytes are use-after-free Poisoning into 0x6b, and the size of the struct elevator_queue consistent, but also supporting the above inference.

Because e object has been freed,e is now a wild pointer,e->ops is an illegal address, so e->ops-> ELEVATOR_QUEUE_EMPTY_FN, Addressing an illegal address, which results in an exception.

int elv_queue_empty (struct request_queue *q) {

struct Elevator_queue *e = q->elevator;

if (!list_empty (&q->queue_head))

return 0;

if (E->OPS->ELEVATOR_QUEUE_EMPTY_FN)

return E->OPS->ELEVATOR_QUEUE_EMPTY_FN (q);

return 1;

}

Ffffffffc08d3e78 <elv_queue_empty>:

ffffffffc08d3e78:dc830000 LD v1,0 (A0)

ffffffffc08d3e7c:dc850018 LD a1,24 (A0)

ffffffffc08d3e80:10830005 beq a0,v1,ffffffffc08d3e98 <elv_queue_empty+0x20>

ffffffffc08d3e84:00000000 NOP

ffffffffc08d3e88:0000102d Move V0,zero

Ffffffffc08d3e8c:03e00008 Jr RA

ffffffffc08d3e90:00000000 NOP

ffffffffc08d3e94:00000000 NOP

ffffffffc08d3e98:dca20000 ld v0,0 (A1)

ffffffffc08d3e9c:dc590040 ld t9,64 (v0)

ffffffffc08d3ea0:13200003 BEQZ t9,ffffffffc08d3eb0 <elv_queue_empty+0x38>

ffffffffc08d3ea4:00000000 NOP

Ffffffffc08d3ea8:03200008 Jr T9

ffffffffc08d3eac:00000000 NOP

ffffffffc08d3eb0:24020001 Li v0,1

Ffffffffc08d3eb4:03e00008 Jr RA

ffffffffc08d3eb8:00000000 NOP

ffffffffc08d3ebc:00000000 NOP

4. Solution

According to the above analysis, it is known that the exception is elevator_queue object after destruction, through the wild pointer access to produce.

The Elevator_queue object is a call trace that creates or destroys a scsi_host device object that is created or destroyed , whose destruction process is as follows:

Scsi_host_dev_release, Scsi_free_queue, Blk_cleanup_queue, Elevator_exit, elevator_release

Disassembly of the Scsi_io_completion function, because the reason for compiling optimizations, is actually going to be the Scsi_end_request branch, which is the request in the request queue after the completion of the resource release process, and the exception when the print call trace slightly different,

The call trace of the real exception is as follows:

Blk_run_queue, Scsi_run_queue, Scsi_next_command, Scsi_end_request, Scsi_io_completion, __blk_ Run_queue-_elv_queue_empty

The above analysis is concluded as follows:

1 scsi host dev When the object is released, it also releases its associated request_queue  and Span style= "font-family:"times New roman"" >elevator_queue request_queue The that is pointed to in the object elevator_queue

2 compared to the 2.6.32 LTS version, when the SCSI host dev object was released, its Scsi_device object was released, but it was not updated in time to point to Scsi_ A pointer to the device object, which is the Queuedata field, which causes the pointer to become a wild pointer. If it is set to a null pointer in time, Scsi_run_queue will know that the Scsi_device object is no longer present, and there must be no request queue, so you can return directly, no further execution, and avoid subsequent exceptions.

In view of this two-point conclusion, the above 2 kinds of wild pointer conditions are modified:

1 when the elevator_queue Object is released, the elevator_queue that is pointed to in the Request_queue object is promptly The pointer to the object is set to null;

2 when the Scsi_device object is disposed, the pointer to the Scsi_device object pointed to in the SCSI host dev object is set to null in time;

5. References

Make Scsi_free_queue () Kill pending SCSI commands

Linux Block Device Architecture

Linux kernel elv_queue_empty wild pointer access kernel fault location and resolution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.