In-depth understanding of computer system Key notes

Source: Internet
Author: User
Tags data structures
Introduction

A deep understanding of the computer system is a big one for me. To tell the truth, I did not finish all the whole reading, but selectively looked at some of my self-thought important or interesting chapters, also benefited a lot, see the computer system of some essential things or the original rational content, which for every programmer who want to learn programming in depth is crucial. Only a good understanding of how the system is running our code, we can target the characteristics of the system to write high-quality, high-efficiency code.   I need to study this book a few more times, and today I will summarize some of the knowledge I have learned in the book. Key Notes

Writing an efficient program requires the following types of activities: Selecting a suitable set of algorithms and data structures. It is important that good data structures can sometimes help to implement some algorithms faster, which also requires programmers to be familiar with a variety of commonly used data structures and algorithms. Write the source code that makes it possible for compilers to efficiently optimize to translate into efficient executables. Therefore, it is important to understand the capabilities and limitations of compiler optimizations. Writing a program in a way that looks like a little bit of change can cause a big change in compiler optimizations. Some programming languages are much easier to optimize than other languages. Some features of the C language, such as the ability to perform pointer operations and coercion of type conversions, make it difficult for the compiler to optimize it. In parallel, a task is divided into multiple parts for computation with a particularly large amount of processing, which can be computed in parallel on some combination of multicore and multiprocessor.

Let the compiler expand the loop
When it comes to program optimization, many people refer to cyclic expansion techniques. The compiler can now easily perform cyclic unwinding, as long as the optimization level is set high enough that many compilers can do it routinely. Using the command-line option "-funroll-loops" to invoke GCC, the loop expansion is performed.

Performance improvement Techniques: advanced design, choose the right algorithms and data structures for the problem at hand, and be especially alert to avoid using algorithms or coding techniques that incrementally produce poor performance. Basic coding principles. Avoid limiting the optimization factor so that the compiler can produce efficient code.
Elimination of successive function calls. When possible, move the calculation out of the loop, taking into account the modularity of the chosen compromise program for greater efficiency. Eliminate unnecessary memory references. A temporary variable is introduced to hold the intermediate result, and the result can be placed in an array or global variable only if the last value is computed. Low-level optimization.
Try various pointer forms relative to the array code. Reduce the cycle overhead by unfolding the loop. Find ways to use pipelined functional units through techniques such as iterative segmentation.

When it comes to performance improvements, there may be some saying:

(1) Do not optimize early, optimization is the root of all evils;
(2) The optimization that takes a lot of time to make may not be obvious and not worth the effort;
(3) Now the memory, CPU price is so low, performance optimization is not so important.
......

In fact, my view is: We may not have to specifically put the previously written procedures out of the optimization, spend n more time just for the promotion of a few seconds or a few minutes of time. However, when we refactor someone else's code or start thinking about it initially, we need to know about these performance improvement techniques, to write the code from the beginning with these basic principles, and to write code that doesn't require others to refactor to improve performance. In addition, some very simple techniques, such as the loop-independent complex calculations or large memory operations of the code into the loop, for the overall performance improvement is really more obvious.

How to tune code using the code Profiler, which is the profiling tool.
Program profiling (profiling) is actually inserting the tool code in a version of the running program to determine how much time is required for each part of the program.
The UNIX system provides a profiling called gprof, which produces two types of information:

First, it determines how much CPU time is spent by each function in the program.
Second, it calculates the number of times each function is called to perform the invocation of the function to classify. And what functions are called by each function, and which functions are called by itself.

Profiling using GPROF requires 3 steps, such as a source program of PROG.C.
1) Compile: GCC-O1-PG prog.c-o prog (just add-PG parameters)
2) Run:./prog
A gmon.out file is generated for use by the GPROF parser (running slower than usual).
3) Anatomy: Gprof Prog
Analyze the data in the Gmon.out and display it.

The first part of the profiling report lists the time spent executing each function, sorted in descending order.
The second part of the anatomy report is the invocation history of the function. For specific examples, refer to the online information.

Gprof Some attributes are worth noting: timing is not very accurate. Its timing is based on a simple interval counting mechanism, and the compiled program maintains a counter for each function, recording the time spent executing the function. It is relatively accurate for programs that run for a long time. The invocation information is fairly reliable. By default, the call to the library function is not displayed. Instead, the time of the library function is computed to the time of the function that called them.

A very important difference between static and dynamic links is that dynamically linked code and data sections without any dynamic link libraries are actually copied to the executable, whereas the linker simply copies some relocation and symbol table information, allowing the runtime to parse the code and data references in the dynamic-link library.

Memory mapping
Refers to mapping space on disk to a virtual storage area. UNIX processes can use the MMAP function to create new virtual memory regions and map objects to these areas, which are low-level allocations.
The General C program uses malloc and free to dynamically allocate the memory area, which is how the heap is used.

The main reason for the low heap utilization is fragmentation, which occurs when there is unused memory but cannot be used to satisfy the allocation request.
There are two forms of fragmentation: internal fragments and external fragments. The difference between the two is as follows: The inner fragment occurs when an allocated block is larger than the payload. For example, some allocators, in order to satisfy their constraints to add an additional 1 words of storage space, this 1-word space is internal fragmentation. It is the sum of the difference between the allocated block size and their payload size. External fragmentation occurs when the free memory is aggregated enough to satisfy an allocation request, but there is not a single free block large enough to handle the request.

The modern OS provides three ways to implement concurrent programming: processes. In this way, each logical control flow is a process that is dispatched and maintained by the kernel. Because processes have separate virtual address spaces, want and other flow letters, the control flow must use interprocess communication (IPC). I/O multiplexing. In this form of concurrency, applications display their own logical streams in the context of a process. The logical flow is simulated as a state machine, and after the data arrives at the file descriptor, the main program is shown to transition from one state to another state. Because the program is a separate process, all streams share an address space. Thread. A thread is a logical flow running in the context of a single process that is dispatched by the kernel. Threads can be seen as a combination of process and I/O multiplexing, which is dispatched by the kernel as a process, sharing a virtual address space like I/O multiplexing.

(1) Process-based concurrent servers
The simplest way to construct concurrency is to use processes like the fork function. For example, a concurrent server that accepts a client connection request in the parent process and then creates a new child process to serve each new client. To understand how this works, let's say we have two clients and one server, and the server is listening for a connection request on a listener descriptor (such as descriptor 3). The following shows how the server accepts requests from both clients.

With regard to the merits and demerits of the process, the process has a very clear model for sharing state information between the parent and child processes: The shared file table, but does not share the user address space. Process has separate address controls loving you is both an advantage and a disadvantage. Because of the independent address space, the process does not overwrite the virtual memory of another process. On the other hand, inter-process communication is cumbersome, at least costly.

(2) Concurrent programming based on I/O multiplexing
For example, a server that has two I/O events: 1) A network client initiates a connection request, and 2) the user types a command line on the keyboard. Let's wait for the incident first. It is ideal not to have that choice. If the accept waits for a connection, you cannot respond to the input command. If you wait for an input command in read, we cannot respond to any connection request (this is a process).
One solution to this dilemma is the I/O multiplexing technology. The basic idea is to use the Select function, which requires the kernel to suspend the process and return control to the application only after one or more I/O events have occurred.
The pros and cons of I/O multiplexing: Because I/O multiplexing is in the context of a single process, each logical process has access to the full address space of the process, so the overhead is much lower than the multi-process, and the disadvantage is the high complexity of programming.

(3) Thread-based concurrent programming
Each thread has its own thread context, including a thread ID, stack, stack pointer, program counter, general purpose register, and condition code. All threads running in a process share the entire virtual address space of the process. Because the thread runs in a single process, it shares the entire contents of the process's virtual address space, including its code, data, heap, shared library, and open files. So I think there is no inter-thread communication, only the concept of lock between threads.

The model that the thread executes. The execution model of threads and processes is somewhat similar. The life cycle of each process is a thread, and we call it the main path. But people have to be aware that threads are peers, and that the main thread differs from other threads in that it executes first.
In general, thread code and local data are encapsulated in a thread example approached (which is a function). The function usually has only one pointer parameter and one pointer return value.
Threads in Unix can be joinable (can be combined) or detached (detached). Joinable can be killed by other threads, the detached thread cannot be killed, and its memory resources are automatically released by the system.

Thread memory model, each thread has its own separate thread context, including thread ID, stack, stack pointer, program counter, condition code, and general purpose register. Each thread and other threads share the remainder, including the entire user virtual address space, which consists of code snippets, data segments, heaps, and all of the shared library code and data regions. The stacks of different threads are not fortified with other threads, which means that if a thread gets a pointer to another thread in some way, it can read any part of the line stacks.

What kind of variable multithreading can be shared, and what can't be shared.
There are three variables: global variables, local automatic variables (local variables), and local static variables, where local automatic variables are stored in the local stack of each thread and are not shared. Global variables and static variables can be shared.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.