The complexity of the symbolic table structure is related to the complexity of the semantic rules of the language. For C #, each symbol comes with a lot of information, such as location, namespace, type, etc. For JavaScript, a symbol table is almost unwanted, because everything is dynamic, and the content is barely checked at compile time. The output of semantic analysis is the symbol table, and the input of code generation is symbol table and syntax tree. Therefore, in addition to the syntax of the tree to put the relevant content, semantic content is best placed in the symbol table (such as the type of expression, the statement of the scope of the results). The semantic analysis results of Cminus can be seen in a real symbol table organization.
First we need to solve the problem of type expression. There are many kinds of complex languages. The type here refers not to the difference between int and string, but to the type of function, the type of struct. There are many additional attributes attached to each type. In the process of semantic analysis, we often compare the consistency of two types. So the type expression of the symbol table is designed to be easy to read, modify and compare.
We usually have two kinds of solutions. The first method is expressed in an inheritance structure. Defines a base class typebase, and then a heap of inheritance underneath. At first glance it's oop, not really. We have some special operations for each particular type of semantic analysis, so let's illustrate the operation of determining whether the type is equal or not. We know that the virtual function in OOP solves one dimensional allocation problem. We get a base, and for Base->method, we can always evaluate to the actual type of base. What if we need to assign two types at the same time? For example, equal (BASE1,BASE2), this operation is more meaningful if and only if the actual species of Base1 and Base2 are the same. This time we transform into base1->equal (BASE2), it is also unavoidable to Base2 to do a bit of dynamic_cast or something similar operation.
So I personally prefer the second way. We create a unique ID for each type. For example, int is 0, int (int,int) is 1, int* is 2 or something. Compare two types for equality and direct the ID to compare, the ID equals the type is equal, the ID is unequal, the type is not equal. How do you do it in real practice? We know that the process of semantic analysis produces a bunch of new types that can theoretically be infinitely many. There are some attributes for each type. For example, the basic type is finite and can be expressed with an enum. The function type requires a return value and a table of parameter types. So when we take attributes to get an ID, the symbol table first checks to see if the type already exists, then returns the corresponding ID, does not exist, creates a new record, and then binds a new ID. For example, the Cminus type table uses the following interface to assign IDs:
class VL_CMinusTypeTable : public VL_Base
{
public:
VInt GetPrimitiveType(VLE_CMinusPrimitiveType Type);
VInt GetPointer(VInt Type);
VInt GetArray(VInt Type , VInt Count);
VInt GetFunction(VInt ReturnType , VL_List<VInt , true>& ParameterTypes);
VInt CreateStruct();
VL_CMinusTypeSlot* GetType(VInt Type);
};
If we know the ID of a type, ask for the ID of its pointer type, call Getpointer (typeID). Through this set of functions, we can always not worry about whether or not to have two IDs pointing to the same type, or a type accidentally has more than one ID, very good management.
The second problem is to save the type of each expression and the scope of the statement. I do not recommend keeping this information in the syntax tree. The reason is more complicated, because a piece of code may have different meanings in different contexts, and then one day we suddenly need to keep the semantic analysis of this code in these environments, if it's already inside the syntax tree, it's over, and you can only copy the syntax tree. So I recommend that the grammar analysis can not get all the information stored in the symbol table. Because expressions and statements are pointers, we need only a few maps to save the expression and the additional information of the statement.
The third issue is scope. The scope of a variable or parameter is limited, so we have to create a scope tree, in which each node is looking at the parent node, as it doesn't matter if I can see the child nodes. So for a specific scope, a scope becomes a linked list that holds all the symbol names for the current scope, and then it knows the parent scope directly or indirectly. Let me give you a visual example. Let's say we have code:
int A=0;
int B(int C,int D)
{
int E=0;
}
To deal with this code, we have established three scope. The first is global scope, which records a and B. The second one is the scope of the function, which records C and D. The third is a scope that belongs to the statement, which records E. So we strung them together with a list: statement scope-> function scope-> global scope.
The good thing about this is that it's convenient for us to find scope. For example, the current context is statement scope, so it should be possible to see variables, parameters, global functions, and global variables. Adding a symbol is also very convenient, as long as the current scope does not have the name, regardless of the scope of the above we can add, add the above scope of the same symbol to cover the same.
A scope can actually record something else, such as the nearest loop expression (used to determine whether a break should exist), the function to which it belongs (not the expression after return), and many other assorted things.
The fourth question is how to create a symbol table. In the previous article, we built the statements and expressions into two large inheritance structures. The expression adds a function called GetType and returns an ID. Statement to establish a function called validate to verify that the statement is legitimate. Their arguments are both the symbol table and the current scope, so that the expression creates a heap of IDs in order to create the type, in order for the expression to know the type of each variable, create scope. So a recursive go down, the symbol table also has, the type also checked. That's why semantic analysis produces symbolic tables.
The symbol table is introduced here. The basic questions that a high-level language encounters are actually all the same. The following articles address specific issues, such as inheritance, reflection, garbage collection, and other language-specific issues.