This is the second article in the series of blog posts. You 'd better start from scratch.
This article will explain the data stream of the Visual C ++ compiler-it will start with a C ++ source program and end with the corresponding binary program. This article is very simple-everything is just getting started.
First, let's take a look at how to compile a single file program from the command line.APP.cpp
If you want to start compilation from Vistual Studio, you must also include some high-level software. However, they will give some special commands at the end, ).
Suppose we just typed:CL/02 App.cpp
CL stands for 'compilation and linking ', 02 tells the compiler optimization speed-generate some machine code that executes as fast as possible. This command starts a process to run the CL. EXE program-a drive that calls other software: when it is connected together, they will process the logs in app.cpp, and ultimately produce a binary file, which is app.exe. During execution, the binary file will execute operations in our source code.
Let's browse the previous chart to see what happened.
CL.EXE
Parse our command line and check whether it makes sense. ThenC1XX.DLL
C ++ 'frontend '"CXX" refers to C ++, because' + 'cannot be used for file names .) The frontend is a chain used to understand the C ++ language. It scans, parses, and converts the APP. cpp file to an equivalent tree, and passes it to the next component through five temporary files. These five files are referred to as the pencil, meaning the C intermediate language. Do not confuse it with the hosting language, such as the intermediate code produced by C. Sometimes, it becomes MSIL, but unfortunately, in the ECMA-335 standard, it is named "pencil.
Next, CL. EXE will call the so-called 'backend ', which is located in C2.DLL. The backend is 'utc', which means 'universal tuples compilers ', but this name does not appear in any binary files included in Visual Studio. The backend first converts the information from the front end to a tuples-a binary stream instruction. It appears that they look like a high-level assembly language. I feel very advanced:
Because we require the compiler to optimize the speed, through the/02 switch, optimize some backend, analyze the tuples and convert them into another form, so that they can run faster, but in terms of semantics, it is equivalent to the result produced by the original tuples. After this step is completed, the tuples will be passed to the back-end CodeGen part, and the binary code will be generated.
.
In the preceding chart, the Black arrow shows the data stream text or binary files. The red arrow indicates the control flow.
In later articles of this series, we will return to this chart when we involve optimization of the entire program and the specific/GL switch compiler and/LTCG switch linker. We see the same block diagram, but it is connected in different ways .)
Summary:
1. The front-end needs to understand C ++ source code. Other links, such as backend and linker, are mostly independent of the original source language. They work on the tuples mentioned above to form a higher level binary assembly language. The original source program can be any imperative language, such as FORTRAN or Pascal. The backend does not really care.
2. The optimization part of the backend will convert the tuples into a faster and more effective form. This conversion is called optimization. (In fact, we should call it "improvement", because there are other improvements that can generate code that runs faster-we just try our best to get close to the ideal state. However, a term 'optimization' was created decades ago and we were all stuck in it .) There are also many such optimization methods, such as 'constant merging ', 'remove public subexpression', 'upgrade', 'external lift constant expression', 'redundant code delete', 'inline function', and 'autovectorization'. .. In most cases. These optimizations are the final processors run independently of the program-they are all machine-independent optimizations.
3. the backend CodeGen part determines how to develop the runtime stack (used to implement the 'activate framework'); how to make full use of available machine registers; and how to add details of the function call conventions; use the detailed description of the target machine to convert the code and make it run faster.
(For example, if you look at the assembly code, for example, when debugging the code, you also use Visual Studio (Alt + 8) disassembly window-you may notice some instructions used to set EAX to 0xor eax, eax
, Better than some more direct commandsmove eax,0
. Why? Because the XOR command is only two bytes smaller), the execution speed is faster. We also call it "micro-optimization". Maybe you will doubt whether it is worth the trouble? Do you still remember that proverb? .)
Compared with optimization, CodeGen must be clear about the processor architecture that the Code will run. In some cases, on the basis of understanding the target processor, it may even change the layout order of machine commands-called 'scheduling '. I 'd better explain it again: CodeGen knows whether it is for x86, x64, or ARM-32, and it is rare to know the specific microarchitecture of the processor whose code is going to run, take Nehalem and Sandy Bridge as an example to see the/favor: ATOM case. For more details, see)
This article focuses on the optimization part of the compiler and seldom mentions components that constitute the front-end, CodeGen, or linker.
This article introduces a lot of terms that I didn't intend to make you fully understand: after all, this is just an overview. I hope you will be interested in spreading some ideas, make sure that you will come back next time. I will start to explain all the terms.
Next time, let's take a look at the simplest Optimization Method and its working principle-merging constants.
Http://blog.jobbole.com/47148/.