The difference between LLVM and GCC

Source: Internet
Author: User
Tags posix

Recently developed with Xcode under Mac OS X Mountain Lion, there are two compilers shown below in the compilation options: One is Apple LLVM compiler 4.2 and the other is LLVM GCC 4.2.

In recent years I've heard that LLVM is better than GCC, but I haven't had time to study the difference. Starting with this problem, I have thrown a lot of questions to myself:

    • CC, c89, what's C99? What's the difference?
    • What is GCC, g++, CPP, and GPP?
    • Does LLVM differ greatly from GCC?
    • What is the difference between Apple LLVM compiler 4.2 and LLVM GCC 4.2?
    • LLVM gcc 4.2 In the end is LLVM or gcc?

Next, let's fill in the history lesson together.

CC, C89, C99

After the birth of Unix, many companies have developed their own UNIX systems and used their own specialized compilers. This leads to the use of different commands for compiling C language code on different UNIX systems. The POSIX standard commands and utilities, then, specifies the unified command interface for the CC as a different compiler, and also specifies what necessary parameters are required for the CC command.

As the subsequent ISO C standard is determined, the POSIX standard also specifies that C89, C99 as an interface to ISO C, and CC continues to be a non-standard C interface. In practice, however, most C-language compilers implement the ISO C standard, so the POSIX standard specifies that a successor to this historical legacy of CC should be canceled.

GCC, g++, CPP, GPP

With the rise of the open source movement, the Free Software Foundation has developed its own open source free C language compiler GNU C Compiler, referred to as GCC. GCC provides c preprocessor, the C-language preprocessor, referred to as CPP. Later, GCC added support for other languages such as C + +, so his name was changed to GNU Compiler Collection. G++ is specifically designed to handle the C + + language. In the GNU Official Handbook, there is a section called g++ and GCC that describes the difference between the two. g++ is a front end of the GCC compiler collection. The concept of front end and back end is described in more detail below. And GPP, the name is very special, if you use a Linux system, may not have this command. However, under certain special systems, such as DOS, you cannot create a file name with a special symbol such as g++. So according to the DJGPP compiler practice, GPP is actually g++.

LLVM and GCC

Recalling the history of GCC, although it has been a great success, but the original intention of developing GCC is to provide a free open source compiler, that's all. Later, as GCC supported a growing number of languages, problems with the GCC architecture became apparent. But what is the problem with GCC? Let's take a look at this article: the Architecture of Open Source APPLICATIONS:LLVM. The advantages of LLVM are also the shortcomings of GCC.

Legacy compilers

The traditional compiler works basically three-stage, can be divided into front-end (Frontend), Optimizer (Optimizer), back-end (backend). The frontend is responsible for parsing the source code, checking for grammatical errors, and translating it into an abstract syntax tree (abstract Syntax trees). The optimizer optimizes this intermediate code to try to make the code more efficient. The backend is responsible for converting the optimizer's optimized intermediate code into the code of the target machine, which maximizes the use of special instructions from the target machine to improve the performance of the code.

In fact, not only the static language, but also the dynamic language conforms to the above model, such as Java. Java Virtual Machine also uses the above model to translate Java code into Java bytecode.

The benefit of this model is that when we want to support multiple languages, we just need to add multiple front ends. When you need to support multiple target machines, you just need to add multiple backend. For the intermediate optimizer, we can use the generic intermediate code.

This three-segment structure also has the benefit that the developer of the front-end only needs to know how to translate the source code into an intermediate code that the optimizer can understand, and he does not need to know how the optimizer works, nor does it need to know the knowledge of the target machine. This greatly reduces the difficulty of developing the compiler, allowing more developers to participate.

Although this three-stage compiler has a lot of a bit, and is written in the textbook, but in practice this structure has never been perfectly implemented. The better thing to do is Java and. NET virtual machines. The virtual machine can translate the target language into bytecode, so in theory we can translate any language into bytecode and then enter the virtual machine to run. However, the model of this dynamic language is not very suitable for C language, so it is very inefficient to translate C language into bytecode and implement garbage collection mechanism.

GCC also made the three-segment better, and implemented a lot of front-end support for many languages. But the fatal flaw with these compilers is that they are a complete executable file that does not provide the interface for code reuse to developers in other languages. Even though GCC is open source, the reuse of source code is much more difficult.

LLVM

LLVM was originally the abbreviation for low-level virtual machines, which was positioned as a virtual machine, but was a relatively low-level VM. It is in order to solve the problem of compiler code reuse, LLVM to stand at a relatively high point of view, developed a LLVM IR This intermediate code representation language. LLVM IR takes into account a variety of scenarios, such as invoking LLVM in the IDE for real-time code syntax checking, static language, dynamic language compilation, optimization, and so on.

From the above figure, we find that there is no essential difference between LLVM and GCC in the three-segment architecture. The biggest difference between LLVM and other compilers is that it's not just compiler Collection, it's libraries Collection. For example, if I were to write a XYZ-language optimizer, I implemented the PASSXYZ algorithm myself to deal with the difference between the XYZ language and other languages. The Passa and PASSB algorithms provided by the LLVM Optimizer provide an optimization algorithm for the XYZ language and other language commonalities. Then I can choose the XYZ optimizer to link the algorithm provided by the LLVM when the link comes in. LLVM is not just a compiler, it's also an SDK.

Apple LLVM compiler 4.2 and LLVM GCC 4.2

Now we can answer the question I met at the very beginning of this article. Apple LLVM Compiler 4.2 is a true LLVM compiler, with the front end using Clang, compiled based on the latest LLVM 3.2. The core of the LLVM GCC 4.2 compiler is still LLVM, but the front end uses the gcc 4.2 compiler. As can be seen from the download page of LLVM, LLVM from 1.0 to 2.5 use GCC as the front end, until the clang front end is provided at the beginning of the 2.6.

The difference between LLVM and GCC

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.