Code generation techniques in Impala

Source: Internet
Author: User

Cloudera Impala is an open source MPP (massive parallel processing) database built for the Hadoop ecosystem, designed primarily for analytic query payloads rather than OLTP. Impala has the latest technology to maximize the use of modern hardware and efficient query execution. Run-time code generation under LLVM is one of the techniques used to improve execution performance.

LLVMIntroduction

LLVM is a library of compilers and related tools (Toolchain), which differs from the traditional compilers of standalone applications (stand-alone) , andLLVM is modular and reusable. It allows applications such as Impala to perform JIT (just-in-time) compilation within a running process. Although LLVM has some special capabilities and well-known tools such as Clang compilers that are better than GCC , it is its internal architecture that really distinguishes LLVM from other compilers.

Classic static compilers ( like most C compilers ) , the most popular design is a three-stage design consisting of a front end, an optimizer, and a backend. The front end parses the source code and generates abstract syntax trees (AST, abstractSyntax tree). The optimizer does a lot of optimizations to improve code performance. The backend ( or code generator ) translates the code into the instruction set of the target platform. This model is for the interpreter, theJIT compiler. The JVM is also an implementation of this model, which uses bytecode as the interface between the front end and the optimizer.



This classic design is important for multi-lingual support , including source and target languages. As long as the optimizer uses a common code representation, the front and back ends can compile any language. When porting (porting) compilers are required to support a new language, only a new frontend is implemented, and both the optimizer and the backend can be reused. Otherwise, the entire compiler will be re-implemented to support the *n language of the M provenances.


Although the various compiler textbooks talk about the advantages of a three-stage design, it has never been implemented in practice. Compiler implementations like Perl,Python,Ruby , and Java do not share any code. In addition, there are a variety of special-purpose compilers, such as the JIT compiler for CPU -intensive sub-domains like processing, regular expressions, and so on. GCC is not able to get a reusable component because of the chaotic code structure, such as the front-end and back-end reuse of some global variables, so we cannot embed gcc into the application. LLVM is the realization of the three-stage design.


Impalain theLLVM

Impala using LLVM to generate full optimizations at run time and query for specific functions, which has better performance than general-purpose precompiled functions. In particular, a function that executes many times in the inner loop (inner loop) in a single query . For example, a function to parse a file record and load into a Impala memory tuple is called when each record of each file is scanned. For this function, even simply removing some of the instructions will get a huge boost in speed.

If there is no runtime code generation, the function always contains inefficient code in order to handle run-time data that is unknown at compile time. For example, a record parsing function that only handles integers is much faster than a generic function that handles a variety of data types when working with only integers. However, the file schema to be scanned is not known at compile time, so this general purpose function, while inefficient, is also necessary.

The code example in 1 . Compile-time record number and type are unknown, so the processing function to write as common as possible, to avoid the occurrence of the situation is not considered. But the JIT is exactly the opposite of this idea , and the function is fully compiled at run time to be the most efficient way to approach the current data. This is not even in our usual view as a function, because it is completely non-universal, logic is fixed with constant write dead, but this is the JIT strategy! So a dynamically generated materializetuple like the following will have a completely different build version for different run- time information , such as different queries.


Common optimization techniques in code generation:

   remove conditional branch : Because of known runtime information, you can optimize if/switch statements. This is the simplest and most effective way, because the branch instruction in the final machine code prevents the pipelining of the instruction and parallel execution ( Instruction-level parallelism) . At the same time, by expanding the for loop ( because we already know how many cycles and parsing data types, branch instructions can be removed together.

   remove memory load loading data from memory is a costly operation that prevents pipelining. If the results of each load are the same, we can use code generation instead of data loading. For example, before figure 1 offsets_ and types_ Created at the beginning of each query without changes, the values in these arrays can be directly inline after the for loop is expanded in the code-generated version of the function.

   : Virtual functions have a large impact on performance, especially when the function is small and simple because it cannot be inline. So when the type of the object instance is known at run time, we can use code generation to replace the call of the virtual function and do the inline. This is especially valuable for evaluation of expression trees. In impala , an expression consists of a tree of operations and functions, such as 2 . Each expression that appears in the tree is implemented with a function that overrides the base class of the expression, and the base class invokes the individual sub-expressions recursively. Many expression functions are very simple, such as the addition of two numbers, so . By removing virtual functions from code generation and inline, an expression can be evaluated directly without a function call. In addition, the inline function allows the compiler to do further optimizations, such as sub-expression elimination, and so on.



withLLVMGenerate Code

when impala is subject to query plan (query Plan , impala ("en-us" >java Frontend is responsible for generating , LLVM is used to , Generate and compile query-specific versions of functions that are critical to performance. LLVM primarily uses ir (intermediate representation) to generate code, such as LLVM 's frontend clang C + + compiler generates ir , LLVM optimizes ir and compiles it into machine code. ir is similar to assembly language, consisting of simple instructions that can be mapped directly into machine code. There are two techniques in impala to generate ir functions: Using LLVM Span lang= "en-US" >irbuilder  api to programmatically generate ir directives; Span lang= "en-US" >clang c++ function is cross-compiled into ir .

is an example of IR . As can be seen,IR is a kind of RISC virtual instruction set. It supports addition, subtraction, comparison, branching, and other directives. In addition,IR supports labels. But unlike most RISC , it is:

? LLVM is strongly typed, it has a simple type system, such as i32, i32**,add i32.

? LLVM IR supports unlimited temporary registers, beginning with % .

Because the optimizer is not constrained by the source language and target platform, the design of the IR is also subject to this principle.


In the LLVM , the optimizer is organized into a pipeline to optimize the pass , the common pass has the inline, the expression reorganization, the cyclic invariants movement and so on. Each pass acts as a C + + class that inherits the pass class and is defined in a private anonymous namespace , while providing a function to get the pass to the outside world.


We can decide whether or not to execute the pass execution sequence. When we implement a JIT compiler for an image processing language, we can remove the useless pass. For example, if it is usually a large function, there is no need to waste time inline. If the pointers are small, then alias parsing and memory optimization become dispensable. But LLVM is not omnipotent,Passmanager itself does not know the logic of each pass inside, so this is still determined by our realization.


References

1 Runtime Code Generation in Cloudera Impala

2 The Architecture of Open Source application

http://www.aosabook.org/en/llvm.html

Code generation techniques in Impala

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.