Page fault & Copy_from_user in the kernel

Source: Internet
Author: User
the kernel-state page fault.

Some colleagues asked a question some time ago: Whether page fault could occur in the kernel.

A moment did not give an accurate answer, immediately have a feeling: is not the understanding of kernel memory management is not enough, before in this respect or more confident ~

The problem seems very simple, from the previous understanding, take the classic 32-bit X86 as an example, the kernel-state low-end addresses are linear mapping, page tables are created in advance (kernel initialization), for this address should not occur page fault, but for the Vmalloc area, is through the page table dynamic mapping, This section is clearly a possible occurrence of missing pages.

Seems to have the answer, but TX asked not for the kernel-State address to send the page fault, but: in the kernel state, the user State address access, whether or not the occurrence of pages fault.

I pressed a question: under what circumstances, the kernel state requires access to the user State address. Answer: Like Copy_from_user.

The problem is clear, in this case, there may be missing pages. Hardware level

On the hardware level, when the CPU is doing the access operation, the MMU hardware traverses the page table, finds the matching mapping entry, and automatically triggers page fault if it is not found.

So, from this point of view, as long as the page table entry for the user State address in the Copy_from_user parameter does not exist, it will produce page fault. It's not supposed to happen.

In other words, the kernel to use Copy_from_user copy data from the user state, the data should be ready, that is, the corresponding memory (physics) should have been allocated well, should not happen page fault.

Normally, this is true, but in the event that the kernel does not have to access the Unmapped user State address. In case the user state data is not ready at this time. Who can guarantee that.

So, although it should not happen, it is still possible to send it. is there anything different?

In this case, the page fault that takes place in the kernel is different from the page fault that we see in the user's state as usual.

In the user state, when the page fault, the CPU hardware will switch to the privileged mode (kernel state), the exception processing, in the kernel of the fault occurred, because the current is based on the core State, so the processing method will certainly be different. This may be only on the one hand, there should be other different ~ how to deal with.

You can see the Vmalloc_fault function in the page fault process where there is a special processing for the missing pages that occur in the Vmalloc area.

What about the copy_from_user situation.

Yes, I do. In fact, Do_page_fault process exception table related processing, before the code here is not carefully understood, the original is used to deal with this situation. Why do you use Copy_from_user?

Or back to the old problem, why to use Copy_from_user, according to understand, copy data, not from one address to move data to another address, with memcpy not good.

Because of the possible occurrence of page fault, memcpy has no treatment for this situation and may cause unknown consequences.

And Copy_from_user is dealing with this situation by adding a. Fixup in the code, which is read in Do_page_fault and combined with exception table to modify the method, No time for in-depth study, interested can continue to see.

The Copy_from_user code is as follows:

Static unsigned long __copy_user_intel (void __user *to, const void *from, unsigned long size) {int d0, D1; __asm__ __volatile__ (". Align 2,0x90\n" "1:MOVL (% 4),%%eax\n" "Cmpl $67,     %0\n "" Jbe 3f\n "" 2:movl (% 4),%%eax\n "". Align 2,0x90\n "" 3: MOVL 0 (% 4),%%eax\n "" 4:MOVL 4 (% 4),%%edx\n "" 5:movl%%eax, 0 (% 3) \ n "" 6:movl% edx, 4 (% 3) \ n "" 7:movl 8 (% 4),%%eax\n "" 8:movl (% 4),%%edx\n "" 9:movl%%eax, 8 (% 
		       3) \ n "" "10:movl%%edx, (% 3) \ n" "11:movl (% 4),%%eax\n" "12:movl (% 4),%%edx\n"
		       "13:movl%%eax (% 3) \ n" "14:movl%%edx (% 3) \ n" "15:movl (% 4),%%eax\n"
		       "16:movl (% 4),%%edx\n" "17:movl%%eax, (% 3) \ n" "18:movl%%edx, (% 3) \ n"
		    "19:movl (% 4),%%eax\n"   "20:movl (% 4),%%edx\n" "21:movl%%eax, (% 3) \ n" "22:movl%%edx, (% 3) \ n" 2    3:MOVL (% 4),%%eax\n "" 24:movl (% 4),%%edx\n "" 25:movl%%eax, (% 3) \ n "26:  MOVL%%edx (% 3) \ n "" 27:movl (% 4),%%eax\n "" 28:movl (% 4),%%edx\n "" 29:movl %%eax (% 3) \ n "" 30:movl%%edx (% 3) \ n "" 31:movl (% 4),%%eax\n "" 32:MOVL 60 (% 4),%%edx\n "" 33:movl%%eax (% 3) \ n "" 34:movl%%edx, (% 3) \ n "" Addl $-64,%0\  N "" "Addl $64,%4\n" "Addl $64,%3\n" "Cmpl $63,%0\n" "Ja        1b\n "" 35:movl%0,%%eax\n "" Shrl $,%0\n "" Andl $,%%eax\n "" cld\n "" 99:REP; Movsl\n "" "36:movl%%eax,%0\n" "37:REP; movsb\n "" 100:\n "". SectioN. fixup,\ "ax\" \ 101:lea 0 (%%eax,%0,4),%0\n "jmp 100b\n" ". previous\n" _
		       Asm_extable (1b,100b) _asm_extable (2b,100b) _asm_extable (3b,100b) _asm_extable (4b,100b)
		       _asm_extable (5b,100b) _asm_extable (6b,100b) _asm_extable (7b,100b) _asm_extable (8b,100b)
		       _asm_extable (9b,100b) _asm_extable (10b,100b) _asm_extable (11b,100b) _asm_extable (12b,100b) _asm_extable (13b,100b) _asm_extable (14b,100b) _asm_extable (15b,100b) _asm_extable (16b,1 00b) _asm_extable (17b,100b) _asm_extable (18b,100b) _asm_extable (19b,100b) _asm_extable (20b,100b) _asm_extable (21b,100b) _asm_extable (22b,100b) _asm_extable (23b,100b) _asm_e
		       Xtable (24b,100b) _asm_extable (25b,100b) _asm_extable (26b,100b) _asm_extable (27b,100b)
	_asm_extable (28b,100b)	       _asm_extable (29b,100b) _asm_extable (30b,100b) _asm_extable (31b,100b) _asm_extable (32b,1 00b) _asm_extable (33b,100b) _asm_extable (34b,100b) _asm_extable (35b,100b) _asm_extable (36b,100b) _asm_extable (37b,100b) _asm_extable (99b,101b): "=&c" (size), "=&d" (D0), "=& Amp
	S "(D1):" 1 "(To)," 2 "(from)," 0 "(size):" EAX "," edx "," Memory ");
return size;
 }

Code as a whole with memcpy implementation, the main difference is fixup paragraph, interested in TX can be compared to see.


Original address: http://happyseeker.github.io/kernel/2016/12/30/page-fault-in-kernel.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.