2. Handling of ID and reserved words
In the C language, the system reserved a lot of keywords, also known as reserved words, such as the Int,short,char of the data type, control the If,then branch execution, and so on.
Any keyword, in essence, is also an ID, such as it also has a length, the int is 3, the short is 5, there is content, such as Int,short. But compared to the general ID,
It also has other properties, such as an int that represents the data type, and it has a range of values that range from-xxx to XXX.
Because of this relationship, GCC stores both the ID and the keyword in a single table, which is defined as follows:
#define MAX_HASH_TABLE 1009
Tree hash_table[max_hash_table]; /* ID Hash Buckets */
A function Get_identifier (text) is defined to manipulate this table:
The specific implementation of the function is to find the ID through the incoming string, if not, create this ID, here new a tree_identifier, and save it in the Hash_table table, in order to speed up the search, the hash algorithm is used here. Here we see the use of struct Tree_common struct chain members, for the hash algorithm, it is difficult to avoid conflicts, for incoming different strings, it is possible to calculate the hash value is the same, at this time, we put the conflicting ID in the chain member variable ;
The end of the function gives the member of the Tree_identifier Length,pointer an attached value.
At the beginning of implementation, GCC started creating these keywords that represent data types, such as Int,short,char,void, which is created in the function Init_lex, and the corresponding
Tree nodes are attached to:
ridpointers[(int) Rid_int]
ridpointers[(int) Rid_char]
ridpointers[(int) rid_void]
ridpointers[(int) Rid_short]
The ID of the keyword is generated, and some other parameters are attached, and the process is done in init_decl_processing; to represent a data type in GCC, use the following structure:
struct Tree_type
{
Char common[sizeof (struct Tree_common)];
Union Tree_node *values;
Union Tree_node *sep;
Union Tree_node *size;
Enum Machine_mode Mode:8;
unsigned char size_unit;
unsigned char align;
unsigned char sep_unit;
Union Tree_node *pointer_to;
Union Tree_node *reference_to;
int parse_info;
int symtab_address;
Union Tree_node *name;
Union Tree_node *max;
Union Tree_node *next_variant;
Union Tree_node *main_variant;
Union Tree_node *basetypes;
Union Tree_node *noncopied_parts;
/* Points to a structure whose details depend on the language in use. */
struct Lang_type *lang_specific;
};
In the data type that represents the int type, its two member variable Sep,max is more important; Sep represents its minimum value, and Max represents its maximum value;
A tree node of type Integer_type is created in the function Make_signed_type, which is actually a struct Tree_type type node.
In this function, the Sep,max member of the created node is given the INTEGER_CST node, which is actually a struct TREE_INT_CST type node:
struct TREE_INT_CST
{
Char common[sizeof (struct Tree_common)];
Long Int_cst_low;
Long Int_cst_high;
};
As you can see, it has two members of special members: Int_cst_low,int_cst_high
For the minimum value, the value it gives is: int_cst_low=0x80000000,int_cst_high=0xffffffff
For the maximum value, the value it gives is: int_cst_low=0x7fffffff,int_cst_high=0x0
They are all created in the Build_int_2 function;
In the last layout_type of the function Make_signed_type, set the size member of the INT data node, which is also a struct TREE_INT_CST type node, but its
The Int_cst_low value is 4, while Int_cst_high is 0;
This type of int node generates the node data and its ID, which is then encapsulated as a declaration type node, which is represented by a struct tree_decl struct:
struct tree_decl{ char common[sizeof (struct Tree_common)]; char *filename; int linenum; Union Tree_node *size; Enum Machine_mode Mode:8; unsigned char size_unit; unsigned char align; unsigned char voffset_unit; Union Tree_node *name; Union Tree_node *context; int offset; Union Tree_node *voffset; Union Tree_node *arguments; Union Tree_node *result; Union Tree_node *initial; char *print_name; char *assembler_name; struct Rtx_def *rtl; /* acts as link to register transfer language (RTL) info */ int frame_size; /* for function_decls:size of stack frame */ struct rtx_def *saved_insns; /* for function_decls:points to INSN that constitutes it definition on the permanent obstack. */ int block_symtab_address; /* Points to a structure whose details depend on the language in use. */ struct lang_decl *lang_specific;};
As can be seen, this is a huge complex structure, the Int type node into an int declaration node process, it will generate a struct TREE_DECL node, its
The member variable name will be the ID node value of int, and its type is the INT node that was just generated, and the Int declaration node is then put into the record of the global node
In Global_binding_level,
Global_binding_level->name points to the Int declaration node just created;
The Init_decl_processing function next creates a char type node, unsigned int type, short type, and these values are put back into global_binding_level->name
then connected by chain;
To summarize, GCC uses a hash table to store all IDs, including reserved words; GCC generates nodes for TREE_DECL structures when initialized for built-in data types (Int,short,char).
It is recorded in the name variable of Global_binding_level, which always points to the last declared node and is threaded through the chain of the node.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
GCC source code Analysis-Front End Chapter 2