Clr_via_c #. 3rd translation [25.2 thread Overhead]

Source: Internet
Author: User

25.2 thread overhead. Thread overhead

 

The thread is very powerful because it allows WindowsProgramA long-running task can also be executed in a timely manner. In addition, the thread allows the user to use an application (such as the "Task Manager") to forcibly terminate an application that appears to have been frozen. However, like all virtualization mechanisms, threads consume space (memory consumption) and time (runtime execution performance. Next we will discuss this consumption in more detail. In each thread, the following threads are available:

Thread Kernel Object)OS allocates and initializes one of these data structures for each thread created in the system. The data structure contains a set of attributes that describe threads (discussed later in this chapter ). The data structure also contains the so-called thread context. The context is a memory block that contains a set of CPU registers. When Windows runs on an x86 CPU computer, the thread context uses a memory of 700 bytes. For x64 and IA64 CPUs, the context uses approximately 1240 bytes of memory and 2500 bytes of memory respectively.

Thread environment block (Teb)Teb is the memory block allocated in user mode (ApplicationCodeAddress space that can be accessed quickly ). Teb also consumes one memory (4 kb for x86 and x64 CPUs and 8 kb for IA64 CPUs ). Teb contains the thread's first exception handling link. When the thread exists in the try block, this node will be removed from the chain. In addition, Teb contains the "locally stored thread" data of the thread and some data structures used by GDI and OpenGL graphics.

User-mode Stack)The user mode stack is used to store local variables and parameters passed to the method. It also contains the address of the thread to be executed when the current method returns. By default, Windows allocates 1 MB of memory to the user mode stack of each thread.

Kernel-mode Stack)In OS, when an application passes a parameter to a kernel mode function, the kernel mode stack is also used. For security reasons, Windows copies any real parameters transmitted from the user mode code to the kernel from the user mode stack of the thread to the kernel mode stack of the thread. Once copied, the kernel can verify the value of the real parameter. Because the application code cannot access the kernel mode stack, the application cannot modify the real parameter value after verification. The OS kernel code starts to process the copied value. In addition, the kernel calls its own internal method and uses the kernel mode stack to pass its own real parameters, local variables for storing functions, and storage return addresses. When running on 32-bit windows, the kernel mode stack size is 12 Kb; when running on 64-bit windows, the size is 24 KB.

Notification of DLL thread connection and thread separation (DLL thread-Attach and thread-Detach notifications)In Windows, when a thread is created in a process, the dllmain method of all DLL loaded in that process is called and a dll_thread_attach flag is passed to the method. Similarly, when a thread is terminated, the dllmain method of all DLL in the process is called and a dll_thread_detach flag is passed to the method. Some dll must use these notifications to execute some special initialization or (Resource) Cleanup operations for each thread created/destroyed in the process. For example, the C-Runtime Library DLL allocates some local thread storage statuses. These statuses are required when the thread uses the functions contained in the C-Runtime Library.

 

In the past windows, many processes could only load 5 or 6 DLL files, but now hundreds of DLL files can be loaded. In many machines, Microsoft Office outlook loads about 250 DLL files in its process address space. This means that if a new thread is created in officeoutlook, 250 DLL functions will be called before the thread can begin to do the design. When the threads in Outlook terminate, the 250 functions need to be changed again. This seriously affects the performance of threads created and destroyed by processes.

 

Now you know all the space and time overhead required to create a thread, let it enter the system, and finally destroy it. But the situation seems worse-Now let's discuss context switching. A single CPU computer can only do one thing at a time. Once, windows must share the physical CPU among all threads (logical CPU) in the system.

 

In any given time, Windows only allocates one thread to one CPU. That thread can run a "time-slice or quantum ). When the time slice ends, the Windows context will switch to another thread. For each context switch, you need to perform the following operations on Windows:

1. Save the value in the CPU register to a context structure inside the kernel object of the currently running thread.

2. Select a thread from the existing thread for scheduling. If this thread is owned by another process, before Windows starts to execute any code or access any data, it must switch the virtual address space seen by the CPU.

3. Load the value in the selected context structure to the CPU register.

 

After the context switch is complete, the CPU executes the selected thread until its time slice ends. Then another context switch occurs. Windows performs context switching every 30 ms. Context switching is a net overhead. That is to say, the overhead produced by context switching will not be exchanged for any memory or performance gains. Windows performs context switching to provide users with a robust and responsive operating system.

 

Now, if the thread of an application enters an infinite loop, windows regularly preempt it and assigns a different thread to an actual CPU, then let the new thread run for a while. Assuming that the new thread is a thread in the task manager, end users can use the task manager to end the process that contains an infinite loop thread. The process will terminate and all the data it processes will be destroyed. However, all other processes in the system continue to run and will not fall below their data. Of course, users do not need to restart. Therefore, context switching sacrifices performance in exchange for a better user experience.

 

In fact, the performance loss is even worse than you think. Yes, some performance loss occurs when the Windows context switches to another thread. However, the CPU needs to execute a different thread, and the code and data of the previous thread are still in the cache of the CPU, this makes it unnecessary for the CPU to frequently Access RAM (its speed is much slower than the CPU tells the cache ). When the Windows context is switched to a new thread, the new thread is very likely to execute different code and access different data. The code and data are not in the CPU telling cache. Therefore, the CPU must Access RAM to fill in its notification cache to return to the high-speed execution status. However, after 30 ms, a new context switch occurs again.

 

The time for performing context switching depends on the CPU architecture and speed. The time required to fill the CPU cache depends on the applications running in the system, the size of the CPU cache, and other factors. Therefore, I cannot give you an exact value about the context switching time, or even an estimate. The only difference is that to build high-performance applications and components, context switching should be avoided as much as possible.

 

Important:At the end of a time slice, if windows decides to schedule the same thread again (instead of switching to another thread), Windows will not perform context switching. On the contrary, the thread continues to run. This significantly improves performance. When designing your own code, be sure to avoid context switching.

 

Important:A thread can automatically terminate its time slice in advance. This occurs frequently because threads often wait for I/O operations (keyboard, mouse, file, network, and so on) to complete. For example, the thread of the Notepad program is often in the re-occurrence state and does nothing: This thread is waiting for input. If you press the J key on the disk, Windows will wake up the notepad thread and ask it to process the key operations. The notepad thread may take 5 ms to process the buttons and then call a Win32 function to tell windows that it is ready to process the next input event. If there are no more input events, Windows will let the "Notepad" thread enter the waiting state (the remaining part of the time slice will be abandoned), so that the thread will not be scheduled on any CPU, know that the next input event occurs. This enhances the overall performance of the system, because the threads waiting for the completion of I/O operations will not be scheduled on the CPU, so it will not waste CPU time; the time saved can be used to schedule other threads on the CPU.

 

In addition, when executing GC, CLR must suspend all threads and traverse their stacks to find the root to mark objects in the heap, traverse their stacks again (some objects are moved during compression, so they need to be updated) and then all threads are restored. Therefore, reducing the number of threads also significantly improves GC performance. Every time you use the debugger and debug to a breakpoint, Windows suspends all threads in the application being debugged and restores all threads when you do not execute or run the application. Therefore, the more threads you use, the worse the debugging experience.

 

Based on the above discussion, we have concluded that we must avoid using threads as much as possible, because they consume a lot of memory and they need a lot of time to create, destroy, and manage. Windows performs context switching in the thread, and it also wastes a lot of time when GC occurs. However, based on the above discussion, we also come to another conclusion that threads are sometimes required because they make windows more robust.

 

I would also like to point out that computers with multi-core CPUs can actually run several threads at the same time, which improves the scalability of applications (the ability to do more work in a small amount of time ). Windows assigns a thread to each CPU kernel, and each kernel executes context switching to other threads by itself. Windows ensures that a single thread will not be scheduled by colleagues on multiple kernels, as this will cause great confusion. Today, many computers contain multiple CPUs, hyper-threading CPUs, or multi-core CPUs. However, when Windows was initially designed, a single CPU computer was the mainstream, so windows design threads were used to enhance the system's response capability and reliability. Today, threads are also used to enhance application scalability, but they can only happen on multi-core computers.

 

in the remaining sections of this book, we will discuss how to leverage the various mechanisms provided by windows and CLR, while maintaining the code response capability, create as few threads as possible. At the same time, if the code runs on a multi-core computer, it also improves its scalability.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.