LLVM platform, in just a few years, changed the direction of many programming languages, but also spawned a large number of features of the emergence of programming language, is worthy of the compiler architecture of the King, also won the 2012 ACM Software System Award-preface
Copyright NOTICE: This article for the West Wind Getaway original article, reproduced please indicate the source westerly world http://blog.csdn.net/xfxyy_sxfancy
The basic structure of the syntax tree model
The last time we looked at the translation of Lex and yacc, some friends did not understand the execution part, and it was unclear how the abstract syntax tree was built. Today we'll go into more detail. If you build an abstract syntax tree (AST) conveniently
Node link left child, right brother binary tree
AST syntax tree, because it is a multi-fork tree, the direct expression is not good, so we use a computer tree in a classic conversion, the multi-fork tree into the left child right brother two fork tree.
In fact, the idea is very simple, each layer is actually a linked list, the sibling nodes together, so you can.
class Node{public:Node();Node(Node* n); ~Node(); Build list partial void Addchildren(Node* n); void Addbrother(Node* n); StaticNode* Make_list(int num, ...); StaticNode* GetList(node* node);Node* GetNext(){return next;}Node* Getchild(){return child;} ProtectedNode* NEXT;Node* CHILD;};
So we build a node class, which is the one we saw in our script last time. Isn't it simple?
In addition we are writing a make_list, convenient for us to construct a linked list, as to how to write, we will talk about later.
Type support
We found that our syntax tree could not hold any data, and we wrote the AST in order to store the data on each node, with strings, characters, integers, floating-point numbers, identifiers, and so on.
And not only has this request, more importantly, the syntax tree can easily construct the LLVM statement, so a convenient design is to use polymorphism, although efficiency or memory consumption is not as true as union, but it is more convenient.
So we built a bunch of classes that were derived from node, and of course node needs to add some functionality to determine the current node type.
Node.h
enumNodeType//Type enumeration{node_t =0, int_node_t, float_node_t, char_node_t, id_node_t, String_node_t};class codegencontext;class Node{ Public:Node(); Node (node* N);//directly add N as child to this node~node ();//Build list Section voidAddchildren (node* N);voidAddbrother (node* N);BOOLIssingle ();Staticnode* Make_list (intnum, ...);Staticnode* getList (node* Node);voidPrintintk);//Print current nodenode* GetNext () {returnNext } node* Getchild () {returnChild }Virtualvalue* CodeGen (codegencontext* context); Code generation for LLVM//This is responsible for getting or setting the LLVM type of the current node, the unknown type returns null Virtualtype* Getllvmtype ();Virtual voidSetllvmtype (type* t);//If it is a node that contains a string, the string is returned or an error is madeSTD::string& Getstr ();//Type-relatedSTD::stringGettypename ();VirtualNodeType GetType ();BOOLIsnode ();BOOLIsintnode ();BOOLIsfloatnode ();BOOLIsidnode ();BOOLIsstringnode ();BOOLIscharnode ();protected:Virtual void printself();//Print your own name voidInit (); Type* Llvm_type; Node* Next; node* Child;};
IDNode.h is our identifier class, inherited from node, other types of the same, I do not enumerate, detailed code please refer to the source on GitHub
#include "Node.h"#include <string>usingNamespace Std;class Idnode: PublicNode { Public:Idnode(Const Char* _value) { This-value= _value; } idnode (Char_value) { This-value= _value; } std::string& Getstr () {return value; }Virtualvalue* CodeGen (codegencontext* context);VirtualNodeType GetType ();protected:Virtual void printself();Private:string value;};
An issue in the AST construction
There is a particular problem with the syntax tree construction, mainly because there is a place in the design that is not very good, I did not do a list type alone, to store the child elements, but to package them directly into node. It is hard to judge whether the node that is currently waiting to be built is an element or a list of elements. So I made a issingle function to determine if the current element is a separate element, by detecting whether its next pointer is empty. If it is a single element, when you build the list, you can insert it directly at the end of the current sequence, if not, create a new node node, and then point its child pointer to the element you want to insert.
So our make_list and GetList functions are written in this way:
node* node::make_list (int num, ...) {va_list ARGP; node* para = NULL; node* ans = NULL; Va_start (ARGP, num); for (int i = 0 ; i < num; ++i) {para = Va_arg (ARGP, node*); if (!para->issingle ()) para = new Node (para); if (ans = = NULL) ans = para; else Ans->addbrother (para); } va_end (ARGP); return ans;} node* node::getlist (node* Node) {if (!node->issingle ()) return new node (node); return node;}
Basic LLVM Statement Generation
The purpose of building so many classes is to use it to generate LLVM statements, so we'll start by generating a few simple statements
The first thing to introduce is the use of the LLVM type system, because each statement of LLVM is of type, and the LLVM statement can be converted to a value pointer, then we can get to the current type with the following method:
Type*= value->getType();
Type types are also easy to use, for example to get their pointers:
PointerType* ptr_type = t->getPointerTo();
There are also a number of static functions in type types that can be used to generate basic types:
//Get basic typeStatic Type* GETVOIDTY (Llvmcontext &c)Static Type* Getfloatty (Llvmcontext &c)Static Type* Getdoublety (Llvmcontext &c)Static Type* Getmetadataty (Llvmcontext &c)//Get different length shaping typesStaticIntegertype * Getint8ty (Llvmcontext &c)StaticIntegertype * Getint16ty (Llvmcontext &c)StaticIntegertype * Getint32ty (Llvmcontext &c)StaticIntegertype * Getint64ty (Llvmcontext &c)//Get pointer types pointing to different typesStaticPointerType * Getfloatptrty (llvmcontext &c, unsigned as=0)StaticPointerType * Getdoubleptrty (llvmcontext &c, unsigned as=0)StaticPointerType * Getint8ptrty (llvmcontext &c, unsigned as=0)StaticPointerType * Getint16ptrty (llvmcontext &c, unsigned as=0)StaticPointerType * Getint32ptrty (llvmcontext &c, unsigned as=0)StaticPointerType * Getint64ptrty (llvmcontext &c, unsigned as=0)
The basic types in our AST syntax tree are actually constants in the syntax (except for idnode), so these should all be generated constant types
The base class for constant types is constant, and commonly used are constantint, CONSTANTFP, and constantexpr
We'll write the LLVM code for shaping, global strings, floating-point numbers directly.
value* Intnode::codegen (codegencontext* context) {type* T = type::getint64ty (* (Context->getcontext ())); Setllvmtype (t);returnConstantint::get (t, value);} value* Floatnode::codegen (codegencontext* context) {type* T = type::getfloatty (* (Context->getcontext ())); Setllvmtype (t);returnConstantfp::get (t, value);} value* Stringnode::codegen (codegencontext* context) {module* M = Context->getmodule (); llvmcontext& CTX = M->getcontext ();//Don't use global Contextconstant* strconstant = constantdataarray::getstring (ctx, value); type* t = Strconstant->gettype (); Setllvmtype (t); globalvariable* Gvstr =NewGlobalVariable (*m, T,true, Globalvalue::internallinkage, Strconstant,""); constant* zero = constant::getnullvalue (Integertype::getint32ty (CTX)); constant* indices[] = {zero, zero}; constant* strval = constantexpr::getgetelementptr (gvstr, indices,true);returnStrval;}
The most complex here should be a constant string, first of all, the constant string to use the constantdataarray::getstring type, however, often the function does not receive a string type of variable, you need to like the C language, its first address as a parameter to pass in, Remember the definition of the printf function we wrote earlier? The first parameter is a char* pointer.
So here we use a statement, CONSTANTEXPR::GETGETELEMENTPTR, the address, indices is an array, the first value is the assumption that the pointer is a number of arrays, take the address of the array, the second value is the assumption that the pointer is a struct body, Takes the address of the first element in the struct.
Here we're all going to pass the constant 0. Another thing to be aware of is that the constant address of the 0 does not seem to use the int64 type, probably the data range is too large fear of crossing the bar, the general int32 long Array is enough. I did not notice before, with Int64, always out of inexplicable problems.
Attached: The complete implementation of node class
/ * * @Author: sxf* @Date: 2015-09-22 19:21:40* @Last Modified by:sxf* @Last Modified time:2015-11-01 21:05:14*/< /c2>#include "Node.h"#include <stdarg.h>#include <stdio.h>#include "nodes.h"#include <iostream>voidNode::init () {llvm_type = NULL; Next = child = NULL; }node::node () {init ();} Node::node (node* N) {init (); Addchildren (n);} Node::~node () {}voidNode::addchildren (node* N) {if(Child = = NULL) {child = n; }Else{child->addbrother (n); }}voidNode::addbrother (node* N) {node* p = This; while(P->next! = NULL) {p = p->next; } P->next = n;}voidNode::p rint (intK) { for(inti =0; I < K; ++i)printf(" "); Printself ();printf("\ n"); node* p = child;intt =0; while(P! = NULL) {P->print (k +1); p = p->next; ++t; }if(T >=3)printf("\ n");}voidNode::p rintself () {printf("Node");} NodeType Node::gettype () {returnnode_t;}BOOLNode::issingle () {returnNext = = NULL;} node* Node::make_list (intNum, ...) {va_list ARGP; node* para = NULL; node* ans = NULL; Va_start (ARGP, num); for(inti =0; i < num; ++i) {para = Va_arg (ARGP, node*);if(!para->issingle ()) para =NewNode (para);if(ans = = NULL) ans = para;ElseAns->addbrother (para); } va_end (ARGP);returnAns;} node* node::getlist (node* Node) {if(!node->issingle ())return NewNode (node);returnnode;} type* Node::getllvmtype () {returnLlvm_type;}voidNode::setllvmtype (type* t) {llvm_type = t;}BOOLNode::isnode () {returnGetType () = = node_t;}BOOLNode::isintnode () {returnGetType () = = int_node_t;}BOOLNode::isfloatnode () {returnGetType () = = float_node_t;}BOOLNode::isidnode () {returnGetType () = = id_node_t;}BOOLNode::isstringnode () {returnGetType () = = string_node_t;}BOOLNode::ischarnode () {returnGetType () = = char_node_t;}STD::stringNode::gettypename () {Switch(GetType ()) { Casenode_t:return "Node"; Caseint_node_t:return "Intnode"; Casestring_node_t:return "Stringnode"; Caseid_node_t:return "Idnode"; Casechar_node_t:return "Charnode"; Casefloat_node_t:return "Floatnode"; }}STD::string& Node::getstr () {if( This->isstringnode ()) {stringnode* string_this = (stringnode*) This;returnString_this->getstr (); }if( This->isidnode ()) {idnode* string_this = (idnode*) This;returnString_this->getstr (); }STD::Cerr<<"Getstr ()-Gets the string error, the type is incorrect:"<< Gettypename () <<STD:: Endl;Exit(1);}
Copyright NOTICE: This article for the West Wind Getaway original article, reproduced please indicate the source westerly world http://blog.csdn.net/xfxyy_sxfancy
Compiler architecture of the King llvm--(5) The basic structure of the syntax tree model