About kernel compatibility 10: Windows Process Creation and image loading

Last Update:2018-12-03 Source: Internet

Author: User

Tags apc posix

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Chapter 6 of Microsoft Windows internals 4E describes the process of creating and loading images for Windows. Based on this, this article provides a brief introduction of translation, explanation, and discussion. The book says that the process of creating a process is divided into six stages, which occur in three parts of the operating system: the Windows client is a dynamic Connection Library of an application process, including kernel32.dll, the "execution body" of windows, that is, the kernel (precisely the upper layer of the kernel), and the service process CSRSS of the Windows subsystem. The six phases are:
1. Open the target image file.
2. Create the "execution body process object" in windows, that is, the "Process Control Block" Data Structure in the kernel.
3. Create the initial (first) thread of the process, including its stack, context, and "execution body thread object", that is, the "thread control block" Data Structure in the kernel.
4. The new process is notified to the Windows subsystem.
5. Start the initial thread to run (unless the create_suincluded flag in the parameter is 1 and is suspended upon creation ).
6. Complete the initialization of the user space in the context of the new thread and thread, including loading the required DLL, and then start running the target program.
It is described in stages below.

Phase 1: Open the Target Image File
In the Win32 API, the creation process is completed by CreateProcess. This is actually a macro definition. It is defined as either createprocessa () or createprocessw () according to different situations. Both functions are in kernel32.dll (which can be observed using the depends tool ). The difference between the two functions lies only in the expression of strings. The former uses ASCII characters, while the latter uses "wide characters", that is, Unicode. In fact, Windows uses wide characters internally, so the former only converts the string to the wide character format, and then calls the latter.
There are several types of executable software that can be run on windows, and the processing methods are naturally different:
● Run the 32-bit. EXE image of Windows directly.
● Start ntvdm.exe with the original command line as the parameter in the 16-bit. exe of windows.
● Dos.exe0000.com#or .if, start ntvdm.exe, and use the original command line as the parameter.
● The dos.bator. Batch command file (SCRIPT) starts cmd.exe and uses the original command line as the parameter.
● Posixexecutable, start posix.exe, and use the original command line as the parameter.
● OS/2: the executable image starts os2.exe and uses the original command line as the parameter.
The most important of all is the 32-bit. EXE image, which is rarely seen in the last two categories. From the processing of various types of images except for 32-bit. exe, the reader will not compare the processing of the. exe image by using wineto see what kind of similarity is there.
Of course, we only care about the 32-bit. EXE image here. For this type of image, CreateProcess () first opens the image file and creates a "Section" for it (allocation), that is, the memory interval. The purpose of creating a memory interval is to shadow the image file to this interval, but you are not busy with ing. What are you looking? First, check that the opened target file contains a qualified. EXE image (in case of a DLL image ?). What we should also look at is a little surprising to the reader. We should look at this path in the "Registry:
HKLM/software/Microsoft/Windows NT/CurrentVersion/Image File Execution options

Depends shows that NTDLL. dll has a function ldrqueryimagefileexecutionoption S (), which is dedicated to this task.
If there is a table with the target image file name and extension name as the primary key, for example, "image.exe", and the table item is also named "Debugger", then this value (a string) replaces the original target file name and becomes the new target image name, and re-execute the first-stage operations. The purpose of doing so is certainly to facilitate debugging programs, but we may imagine that if hackers or a trojan program tries to add a table entry in the registry? At this time, the user thought it was to start program A, but actually started program B !.

Stage 2: Create process objects in the kernel
We know that every process (thread) in Linux has a "Process Control Block", that is, the task_struct data structure, the vast majority of information related to specific processes/Threads is stored in this data structure. Windows is different. First, Windows processes and threads have different "objects", that is, the data structure. In concept, the threads and processes are separated. A thread is a specific (execution) context, the unit and target of CPU scheduling, and a process is only a set of threads that share address spaces and features (such as scheduling priority. As a result, the process has the data structure of the process, and the thread has the data structure of the thread. This is like the result of "extracting common factors" for a group of task_struct data structures. This is a very understandable measure. Furthermore, Windows splits the process data structure that can be centrally stored into several objects, some in the kernel and some in the user space.
Process-related objects in the kernel include:
● Eprocess. That is, struct _ eprocess, also known as "process block" in "internals ". It represents a Windows Process, and 'E' indicates "executive ", microsoft calls the upper layer in the Windows Kernel "executive" to distinguish it from the lower-layer device drivers and memory management components, and generally translates it into an "execution body ". "Executive" also refers to "management" and "operation" (so the CEO is the "President ").
● Kprocess. This is an internal component of eprocess. Its name is "PCB ".
● W32process. The following describes the service process CSRSS that has a "Windows subsystem" in the user space. This service process maintains a data structure for every Windows application process in the system, which contains information related to Windows and graphical interfaces. The window and graphic interface operations were originally implemented by CSRSS in the request of the "customer" process. However, in order to improve efficiency, this part of the function was later moved to the kernel. Correspondingly, a part of the data structure also needs to be moved to the kernel to become w32process.
Since kprocess is part of eprocess, there are actually only two types of objects related to the process in the kernel, namely eprocess and w32process. However, "Open object table" is not included here, which is also available to every process (the "open file table" in the Linux kernel is also outside the process control block ).
Process-related objects in the user space are:
● As mentioned above, after the w32process data structure is migrated to the kernel, CSRSS still needs to maintain some other information for each Windows process. Therefore, CSRSS still has a Data Structure Based on the process.
● Peb (process environment block), that is, "process environment block ". Peb records the running parameters of processes, image loading addresses, and other information. The position of peb in the user space is fixed, always at 0x7ffdf000. In Windows, the dividing line between user space and system space is 2 GB, that is, 0x80000000, so peb is near the top of the user space.

The "internals" book does not provide a definition of the data structure, but the internal structure of eprocess is given through Debug:

Code:

+0x000 Pcb : _KPROCESS +0x06c ProcessLock : _EX_PUSH_LOCK +0x070 CreateTime : _LARGE_INTEGER +0x078 ExitTime : _LARGE_INTEGER +0x080 RundownProtect : _EX_RUNDOWN_REF +0x084 UniqueProcessId : Ptr32Void +0x088 ActiveProcessLinks : _LIST_ENTRY +0x090 QuotaUsage : [3] Uint4B +0x09c QuotaPeak : [3] Uint4B +0x0a8 CommitCharge : Uint4B +0x0ac PeakVirtualSize : Uint4B +0x0b0 VirtualSize : Uint4B +0x0b4 SessionProcessLinks : _LIST_ENTRY +0x0bc DebugPort : Ptr32Void +0x0c0 ExceptionPort : Ptr32Void +0x0c4 ObjectTable : Ptr32_HANDLE_TABLE +0x0c8 Token : _EX_FAST_REF +0x0cc WorkingSetLock : _FAST_MUTEX +0x0ec WorkingSetPage : Uint4B +0x0f0 AddressCreationLock : _FAST_MUTEX +0x110 HyperSpaceLock : Uint4B +0x114 ForkInProgress : Ptr32_ETHREAD +0x118 HardwareTrigger : Uint4BIt can be seen that the first component of eprocess is PCB, and its type is _ kprocess, that is, kprocess. This is a data structure with a size of 0x6c. The book also provides its internal structure.
The book "uninitialized ented Windows 2000 secrets" also provides the internal structure of this data structure by means of debug, but the listed structure is very different from this, maybe because of the version relationship. From the content listed above, it seems that the book "secrets" is correct, because the eprocess structure listed here contains Vm, a VM with a size of 0x50, there is no space here, but the virtual memory (address space) is obviously the main resource of the process, so the eprocess data structure should have its location. From this point of view, the book "secrets" is closer to the reality of desktop and server systems, while the "internals" book may be closer to embedded systems without MMU. Moreover, the book "secrets" also provides definitions (codes) of eprocess, peb, and other data structures obtained through reverse engineering in appendix C, which of course is very valuable.
So, if there are different versions of eprocess, what is the impact? First, applications in the user space cannot directly access the eprocess data structure in the kernel. Therefore, the specific eprocess data structure belongs to the internal implementation of the kernel, as long as all components and links in the kernel are compatible with Jackie Chan, there is no problem with "self-circular embedding", which is similar to the effect of compiling and cropping certain conditions in the Linux kernel. However, for dynamic loading. SYS module, if you need to access the data structure in the module, it may be wrong, because. the SYS module is provided in the form of a binary image, unlike in Linux, which can be re-compiled by source code. What should we do? We can go to DDK for Windows to find the answer.
In the. h file of DDK, there is a declaration of the iogetcurrentprocess () function:

Code:

NTKERNELAPI PEPROCESS IoGetCurrentProcess( VOID );This function is a supporting function provided by the kernel for the. SYS module, which is equivalent to a function exported from the Linux kernel. Its return value type is peprocess, which is a pointer to the eprocess data structure. Obviously, this is similar to the current in the Linux kernel. The purpose of the call is to obtain the eprocess data structure (pointer) of the current process ). However, the. h file of DDK does not define the eprocess data structure. Therefore, calling this function only produces a pointer, which is actually no different from void. This means that direct access to its internal components is not allowed in the. SYS module. So how does the. SYS module use this pointer? The following is an example in DDK:

Code:

NTKERNELAPI VOID MmProbeAndLockProcessPages ( IN OUT PMDL MemoryDescriptorList, IN PEPROCESS Process, IN KPROCESSOR_MODE AccessMode, IN LOCK_OPERATION Operation );This function is used to lock some storage pages of a process (not to be swapped out). One of its input parameters is the pointer to the eprocess structure of the process. Of course, this function is also provided by the kernel (the Device Driver Interface we call it ). Therefore, the pointer provider and the user are both kernels. As long as the two components are supported, the. SYS module only transmits them here, so there will be no problem.
Assume that proc is a pointer to the process control block, and the process control block contains X, which is an integer, in the dynamic installation module of Linux, you can directly use "proc-> X" to access this component, but in windows. in the SYS module, this component can only be accessed by supporting functions such as get_x () and set_x. Separating the content of the data structure from the method of the content is the difference between "object" and "Data Structure. Microsoft needs to encapsulate the content of the data structure because it does not want to disclose the data structure.
For kernel-compatible development, this means that we do not have to stick to adopting an eprocess Data Structure exactly the same as that of Windows (although Appendix C of "secrets" has defined it ), some internal operations and processing do not have to be exactly the same, but as long as they are consistent with the interface specified by DDK.

After learning about the process objects, let's get down to the truth.
The so-called process object in the kernel is actually to create an eprocess-based data structure. This is what the system calls ntcreateprocess () to do, mainly including:
● Allocate and set the eprocess data structure.
● Other related data structure settings, such as "Open object table ".
● Create an initial address space for the target process.
● Initialize the "kernel process block" kprocess of the target process.
● Map the system DLL image to the (User) address space of the target process.
● Map the image of the target process to its own user space.
● Set the "process environment block" peb for the target process.
● Map other data structures that need to be mapped to the user space, for example, data structures related to "Local Language Support" or "NLS.
● Create an eprocess and attach it to the process Queue (note that the thread queue is scheduled, not the process Queue ).
It is critical to map the system DLL, NTDLL. dll, to the user space of the target process. This is because, apart from other mainstream functions and functions, NTDLL. DLL also acts as an elf interpreter in Linux, and is also responsible for establishing dynamic connections for the target image.

It is worth noting that ntcreateprocess () is different from CreateProcess. CreateProcess () creates a process and runs it (the initial thread) unless it is specified to be suspended during creation. Ntcreateprocess () only creates the eprocess data structure of the process in the kernel and creates an address space for it. This is just an empty shell shelf, because there is no thread to talk about running, the scheduling target is a thread rather than a thread. In addition, the call to ntcreateprocess () has a condition that the target image has been mapped to a storage section.

Stage 3: Create an initial thread
As mentioned above, the process is just an empty shelf, and the actual running entity is the thread in it. The next step is to create the initial thread of the target process, that is, the first thread.
In contrast to eprocess, the data structure of the thread is ETHREAD, and the first component is the data structure kthread, which is called TCB. Similarly, the internal structure of ETHREAD in "internals" and "secrets" books is different. The latter's appendix C provides the ETHREAD data structure definition obtained through reverse engineering.
Similarly, some functions stated in Windows DDK can also be seen ,. the SYS module only transmits the ETHREAD pointer or kthread pointer (because kthread is the first component in ETHREAD, the two are actually the same), rather than directly accessing its specific components.

Code:

PKTHREAD NTAPI KeGetCurrentThread();

NTKERNELAPI KPRIORITY KeQueryPriorityThread (IN PKTHREAD Thread);

NTKERNELAPI LONG KeSetBasePriorityThread (IN PKTHREAD Thread, IN LONG Increment);

NTKERNELAPI PDEVICE_OBJECT IoGetDeviceToVerify(IN PETHREAD Thread);In addition, like a process with a "process environment block" peb, a thread also has a "thread environment block" Teb. The kthread structure has a pointer pointing to the user space's Teb. As mentioned above, peb is fixed in the user space. Below peb is Teb. Several threads in the process have several Teb, and each Teb occupies a 4 kb page.

Operations in this phase are completed by calling ntcreatethread (), including:
● Create and set the ETHREAD data structure of the target thread, and handle the relationship with eprocess (for example, thread count in the process block ).
● Create and set the Teb of the target thread in the user space of the target process.
● Set the starting address of the target thread in the user space to point to baseprocessstart () or basethreadstart () in kernel32.dll. The former is used for the first thread in the process, and the latter is used for subsequent threads. When a user program calls ntcreatethread (), it also provides a user-level starting function (address), baseprocessstart (), and basethreadstart (). This starting function is called when Initialization is completed. The ETHREAD data structure has two components used to store the two addresses respectively.
● Set the kthread data structure of the target thread and allocate a stack to it. Specifically, you can set the breakpoint (Return Point) in the context to point to a program kithreadstartup in the kernel so that the thread can be executed from here once it is scheduled to run.
● Some "notification" functions that should be called every time a thread is created may be registered in the system to call these functions.

Stage 4: Notify the Windows Subsystem
Windows, specifically Windows NT, was designed to support applications of three different systems. The first is the application software of Windows itself, that is, the so-called "native" Windows software, which is the real purpose of Microsoft to develop Windows NT. The second is the OS/2 Application Software, because at that time, Microsoft had a partnership with IBM. The third is software similar to Unix applications and compliant with POSIX standards, because the US military had such requirements. However, in fact, Microsoft's support for the last two applications was half-hearted from the very beginning. After that, the wings grew hard, making it even more difficult. However, at the beginning of the design, we still considered the support for different "platforms", that is, we configured different peripheral software on the basis of the same kernel, different Application Software runtime environments are formed. Microsoft calls them "subsystems )". Therefore, the so-called "Windows subsystem", "OS/2 Subsystem", and "POSIX subsystem" are available in the Windows Kernel ". Of course, to this day, there are only Windows subsystems.
So how is the so-called subsystem made up? The "internals" book clarifies the composition of the Windows subsystem, which is composed of the following elements.
1. The sub-system is csrss.exe. Including support for the following components and functions:
● Operations on the console (portable) window. Applications oriented to the console/terminal do not support window operations (such as window movement, big, small, and covering), but need to be run in the window, so additional support is required.
● Process and thread management. For example, a dialog window pops up, saying that a process has no response, asking the user to choose whether to end the process. A notification must be sent to the csrss.exe process when a Windows process or thread is created or exited.
● DOS software and 16-bit Windows software run on (32-bit) windows.
● Others. Including support for local languages (input methods.
This process is called CSRSS, which means "C/S run-time Subsystem". CSRSS is the service process of the Windows subsystem. In fact, the three subsystems are in the C/S structure, but the service process of the OS/2 subsystem is called os2ss, and the service process of the POSIX subsystem is called psxss. According to internals, the reason is that the service processes of the three subsystems were combined at the beginning, namely CSRSS. Later, the two subsystems were moved out to another portal, but the rest will continue to be called CSRSS.
2. The driver of the graphics device in the kernel, that is, the win32k. SYS module. Its functions include:
● Window management controls window display and various screen outputs (such as the cursor). It also receives input and distributes the input to specific applications from the keyboard, mouse, and other devices.
● Provide a graphic function library for the application software.
Iii. system DLL, such as kernel32.dll, advapi32.dll, user32.dll, and gdi32.dll.

The second element, win32k.sys, is also in combination with csrss.exe. This feature is also provided by the Service Process in the user space. The application process sends a graphical operation request to the CSRSS through inter-process communication, and The CSRSS completes related graphic operations. However, it was later found that frequent inter-process communication and scheduling became a bottleneck, so this part of the function was removed and moved into the kernel, Which is win32k. sys. Therefore, for 32-bit Windows applications, it is very rare to leave it to CSRSS, or to do so through CSRSS. However, CSRSS should be notified when a Windows Process is created because it manages all Windows processes. On the other hand, after receiving the notification, CSRSS will display the hourglass cursor on the screen, if it is a window process.
Note that the process that sends a notification to CSRSS is the parent process, that is, the process that calls CreateProcess (), rather than the newly created process, and it has not started running yet.

Now that the CreateProcess () operation has been completed, the kernel32.dll is exited from the return of CreateProcess () and returned to the application or higher-level DLL. These four stages are based on the user space of the parent process. During the whole process, the system calls are performed multiple times, and each time the system calls are completed, they are returned to the user space. For example, ntcreateprocess () is called in the second stage, and ntcreatethread () is called in the third stage (), the entire process of creating a process involves many system calls (some system calls are details, so they are not mentioned above ).
In fact, the creation of a Linux Process is not completed by a system call. A typical process includes Fock (), execve (), and other system calls, but more on Windows. This is related to the design of the entire Windows System Call interface. Taking the memory allocation of user space as an example, the Linux system calls BRK () with only one parameter, that is, the length of the interval, but the Windows system calls ntallocatevirtualmemory () with six parameters, the first parameter is processhandle, which indicates a handle of the opened process object. What does this mean? This indicates that Linux processes can only allocate space for themselves, while Windows processes can allocate space for other processes. In other words, the Linux Process is "self-reliant" in the allocation of storage space, while the Windows process can be "replaced ".
The impact on system design may be far greater than the reader's imagination. In terms of allocating a user space stack to the first thread of a sub-process, since a Linux Process (thread) can only allocate space for itself, the user space stack must exist before it enters the user space to run, so the user space stack must be allocated in the kernel. In contrast, a Windows process can allocate space for other processes, so the parent process can complete these operations for the child process in the user space. In this way, some things Linux can only do in the kernel, while windows can do in the user space. Some people call windows a "microkernel", which may also be a reason. In Windows, CreateProcess () contains more system calls, which is easy to understand.
Now, although the parent process has been returned from the database function CreateProcess (), the sub-process has not yet started to run, and its operation has to go through the fifth and sixth phases below.

Stage 5: Start the initial thread
The newly created thread may not be able to be scheduled immediately, because the user may set the flag create _ susponded to 1 during creation. In this case, you must wait for other processes to resume their running qualifications through system calls before they can be scheduled to run. Otherwise, the task can be scheduled to run now. When the task will be scheduled to run, it depends on the priority and other conditions. Once scheduled, it is running as the new process and has nothing to do with the caller of CreateProcess.
As mentioned above, when the first thread of a process is scheduled to run for the first time, kithreadstartup is first executed due to the setting of the thread (system space) stack. This program reduces the IRQL of the target thread from the DPC level to the APC level, and then calls the kernel function pspuserthreadstartup ().
Finally, pspuserthreadstartup () mounts the ldrinitializethunk () function in the user space NTDLL. DLL into the APC queue as the APC function, and then tries to "return to" the user space. Windows APC is similar to the signal mechanism in Linux, which is equivalent to "service interruption" in user space ". Therefore, on the eve of returning to the user space, the APC function will be checked and executed (if any ).
Therefore, the CPU enters the user space twice. For the first time, the user enters the user space because of the existence of the APC request. Execute the APC function ldrinitializethunk () and return to the system space after execution. Then, the second entry into the user space is the "return" user space. What is the return to the user space? As mentioned above, baseprocessstart () or basethreadstart () in kernel32.dll is returned. The first thread in the process is baseprocessstart (). The (thread) entry provided by the user program is provided to baseprocessstart () or basethreadstart () as a parameter (function pointer, both functions use this pointer to call the entry functions provided by the user.

Stage 6: user space initialization and DLL connection
The connection between user space initialization and DLL is completed by ldrinitializethunk () as the execution of the APC function.
From the connection between the application software and the dynamic connection library, we have seen that both Linux, windows, and wine are consistent, that is, they are completed in the user space:
● The Linux. So module connection is completed by the interpreter in the user space. The interpreter is equivalent to a dynamic library that does not need to be connected in advance, because its entry is fixed. The image of the interpreter is loaded into the user space by the kernel.
● Windows DLL connection is completed by ldrinitializethunk () in Ntdll. dll in user space. Before that, NTDLL. dll has not been connected to the application software, but has been mapped to the user space. The location of the function ldrinitializethunk () in the image is pre-determined and recorded during system initialization. Therefore, you do not need to connect before entering this function.
● Wine's dynamic library connection can be divided into two situations. One is the. So module in ELF format and the other is the DLL in PE format. The connection between the two is completed in the user space, the former is still completed by the elf interpreter ld-linux.so.2, the latter is completed by the tool software wine-kthread. The specific call path of the latter is:
Main ()> wine_init ()> _ wine_process_init ()> _ wine_kernel_init ()>
Wine_switch_to_stack ()> start_process ()> ldrinitializethunk ()
This has been mentioned in the "loading and starting wine Binary Images" article. Note that the function that finally completes the DLL connection is also called ldrinitializethunk (). Obviously, the wine author is clear about windows.

From the above description, we can see that the process of Windows creation is quite different from that of Linux, but the process of loading the PE image and implementing the DLL connection is similar to that of Linux, the interpreter is integrated into the system DLL and executed as an APC function. There is no big difference between the interpreter and other functions. However, compared with the PE image loading process of wine, it is obvious that the wine process (see "loading and starting wine Binary Images") is complicated and inefficient.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More