LLVM platform, in just a few years, changed the direction of many programming languages, but also spawned a large number of features of the emergence of programming language, is worthy of the compiler architecture of the King, also won the 2012 ACM Software System Award-preface
Copyright NOTICE: This article for the West Wind Getaway original article, reproduced please indicate the source westerly world http://blog.csdn.net/xfxyy_sxfancy
Methods of translating functions
The previous features of many compiler architectures are described above, how to organize the syntax tree, if you scan the syntax tree more than once. Starting today, we are going to design the most core part of this compiler, how to design a compile-time macro, and then use LLVM to generate modules sequentially.
Design macros
Our compiler can be said to be macro-driven, because after we scan each syntax node, we will look at whether it is currently a legitimate macro, for example, we analyze the example code in the previous chapter:
void hello(intint g) { ......}
I temporarily hide the body part of the function, let us first look at the function head
Stringfunction String void String hello Node Node Stringset Stringint String k Node Stringset Stringint String g Node ......
Each layer of our syntax tree is equivalent to a linked list, and the next element can be found through the next pointer.
And the beginning of the syntax tree, is a "function" of the macro name, this part is to prompt us to use which macro function to translate.
The next node is: return type, function name, parameter table, function body
For example, the Parameters table, inside the contents of a lot, but when we scan, they are a whole, to be identified.
So the form of our macro is actually like this:
(function 返回类型 函数名 (形参表) (函数体))
The section enclosed in parentheses represents a list, not an element.
The writing of macro functions
We have previously defined the function form of the macro, we need to pass in our own context class and the node node currently being processed, return the value type of LLVM (abstract base class for each statement)
(*CodeGenFunction)(CodeGenContext*, Node*);
So we're going to implement this function:
static Value* function_macro(CodeGenContext* context, Node* node) { // 第一个参数, 返回类型 // 第二个参数, 函数名 = node->getNext(); // 第三个参数, 参数表 Node*== node->getNext(); // 第四个参数, 代码块 = node->getNext(); return F;}
It is often not easy to get the type represented by a string, especially in the case of structs and classes, at which point we often need to look up the symbol table, check whether the string is a type, what type it is, the basic type, struct, or function pointer, or pointers to other structures, and so on.
Getting types is often a very important step in LLVM.
Let's write about the interface of the symbol table, not the implementation, the next chapter, we will introduce the classic stack symbol table implementation.
The second argument is the function name, which we'll save in a temporary variable:
Static Value*Function_type_macro (Codegencontext*Context, Node*node) {//First parameter, return type Type*Ret_type=Context -Findtype (node);//second parameter, function nameNode=Node -GetNext (); Std:: StringFunction_name=Node -Getstr ();//third parameter, parameter tableNode*Args_node=Node=Node -GetNext ();//Fourth parameter, block of codeNode=Node -GetNext ();returnF;}
The next parameter table may be a very bad part of the implementation, because its nested more complex, but the idea is good, is to constantly scan the node, so we can write the following code:
//third parameter, parameter tablenode* args_node = node = Node->getnext ();STD:: vector<Type*>Type_vec;//Type list STD:: Vector<std::string>Arg_name;//Parameter list if(Args_node->getchild ()! = NULL) { for(node* PC = Args_node->getchild (); PC! = NULL; PC = Pc->getnext ()) {node* psec = Pc->getchild ()->getnext (); type* t = context->findtype (PSEC); Type_vec.push_back (t); Arg_name.push_back (Psec->getnext ()->getstr ()); } }
In fact, with the first three parameters, we can construct the function declaration in LLVM, so we don't have to write the function body code.
Many objects in LLVM have this feature, functions can only declare the function head, after parsing the function body and then fill it back. Structure is also the same, you can declare the symbol, back to fill in the type information. These features are easy to generate declaration implementations, and are flexible in the implementation of multiple passes.
Let's declare this function below:
// 先合成一个函数 *FT= FunctionType::get(ret_type, type_vec, /*not vararg*/false); Module*= context->getModule(); *= Function::Create(FT, Function::ExternalLinkage, function_name, M);
Here, we use the function type, which is also one of the classes derived from type, and the function type can also be getpointerto to get the function pointer type.
In addition, if the Function::externallinkage parameter is added when the function is built, it is equivalent to the extern keyword of the C language, which determines the function to export the symbol. In this way, the functions you write can be used by external links or as declarations of external functions.
Special problems with functions
Next we're going to create a block of code for the function, but this part of the code is not actually implemented in the same function as above, and it should be said that they are not in a scan.
We know that if you want a block of code within a function to invoke a function declared anywhere, then we have to deal with all the first three arguments we just talked about, so the declaration of the function is there, and in the subsequent formal scan, there is the following code block generation part:
Fourth parameter, code block node = Node->getnext ();Basicblock* BB = Context->createblock (F); Create a newBlockSpecial processing parameter table, this place special pit, you must give each function parameter//manualAllocainstOpen space, then useStoreinstSave it again, or aLoadError//context->Macromake(Args_node->getchild ());if(Args_node->getchild ()! =NULL) {context->Macromake(Args_node); int i =0; for (Auto arg =F->arg_begin (); I! = Arg_name.size (); ++arg, ++i) {arg->setname (arg_name[i]);Value* Argumentvalue = arg;valuesymboltable* st = bb->getvaluesymboltable ();Value* v = st->lookup (arg_name[i]); NewStoreinst(Argumentvalue, V, False, BB); }} context->Macromake(node); Processing block End bb = Context->getnowblock ();if(Bb->getterminator () = =NULL)Returninst::Create(* (Context->getcontext ()), BB); ReturnF;
There's a lot of problems with this place, and I'm going to keep a suspense, and I'll mention the special handling of this section again in the next code block and variable storage and loading tutorials.
Copyright NOTICE: This article for the West Wind Getaway original article, reproduced please indicate the source westerly world http://blog.csdn.net/xfxyy_sxfancy
Compiler architecture of the King llvm--(7) Function translation method