4.2.3
In this section, we first analyze the semantic check of the basic expression primaryexpression, by the standard grammar of C, we can know that primaryexpression-related production is shown below, that is, expressions with a pair of parentheses (expression) It is also syntactically equivalent to the identifier ID, constant const, and string stringliteral.
Primary-expression:
Id
constant
String-literal
(expression)
For example, for an expression (a+b) +c, both (A+B) and C are basic expressions primaryexpression, and their syntax
The status is quite, but after parsing, the abstract syntax tree we generate for (A+B) +c is as follows:
(+ (+ a B) c)
For the syntax subtree (+ a B) for the addition operation, because its operator + is a two-tuple operator, its semantic check is done in the Checkbinaryexpression function, while the syntax subtree C is done in the Checkprimaryexpression function. In other words, when it comes to semantic checking of the basic expression primaryexpression, we only need to consider the identifier ID, the string stringliteral, and the constant constant, without regard to (expression). Let's take a look at how the UCC compiler handles strings, as shown in 4.2.8.
Figure 4.2.8 String StringLiteral
Figure 4.2.8 the 1th to 11th row of HELLO.C has 5 strings, visible from line 15th to 20th, except for the "123456" that initializes the global array buf[], the UCC compiler names several other strings. Str0,. str1,. STR2 and. Str3. For the string "123456", we can name the global character array buf, as shown in line 20th. However, for the local array of line 6th buf2[], the corresponding storage space is allocated dynamically at runtime, so the UCC compiler names the string "ABCdef" of line 6th as. Str3. The initialization of the buf2[] array needs to be done at run time by multiple assembly instructions on line 29th to 32nd, and the initialization of the global array buf[] is completed at compile time, as shown in line 19th to 20th. Line 3rd of the string "123456" and the 6th line of the string "ABCdef", respectively, for the initial taxiing such as buf[] and buf2[] Such a character array, in the C language, this is a special use of the initialization array, the semantics of such a string check, is handled in DECLCHK.C's checkinitializerinternal () function, which is checked when the array is initialized, without invoking the checkprimaryexpression () function in EXPRCHK.C. The other strings in the above hello.c are not arrays of characters for the initial taxiing such as buf[], and the UCC compiler calls the Checkprimaryexpression () function to name the strings. For C programmers, these strings are equivalent to an "anonymous" character array, as shown in line 15th to 17th of 4.2.8. Next, we can analyze the function checkprimaryexpression,4.2.9 for semantic checking of basic expressions as shown. For a constant like 123, we have already established its type information in the syntax tree node in the parsing, and in the semantic check, we do not need to do any other work, so the 5th line of Figure 4.2.9 is returned directly. In the case of strings, the UCC compiler takes a name such as. STR1, and joins the string into a linked list so that it can generate assembly code like the 15th to 18th line of 4.2.8 when the code is generated, mainly by AddString () on line 9th of Figure 4.2.9. function to complete. The string that is named after the UCC compiler is equivalent to the syntax status of the identifier, so the 8th line of the string corresponding to the syntax tree node is changed to op_id, in C, we can also take the address of the string "abc", such as printf ("%p\n",&" abc "), equivalent to the string" ABC "has a C programmer visible memory address, which means we can treat the string as an lvalue, so the 10th line will be the corresponding node of the Lvalue domain set to 1.
Figure 4.2.9 Checkprimaryexpression ()
Figure 4.2.9 the 13th to 33rd line is used to deal with the shape of the Var op_id, this need to check the symbol table, see if the indicator var has been declared, if it is not declared to use, then the 16th line error, and 17th line through the function addvariable () Add a variable of type int to the symbol table, which is the "will wrong" strategy so that subsequent semantic checks can continue. If a typedef-defined type name is used as a variable, as shown in INT32 = 3 in the 20th line of comments, an error is made on line 21st. For an enumerated constant red in the form of a 23rd-line comment, you can treat it as an integer constant, as shown in line 24th to 26th. For other identifiers, we copy the type information found from the symbol table to the syntax tree node, as shown in line 28th to 30th. The function name and array name do not act as an lvalue, the 31st line sets the Lvalue field to 0, and for the syntax tree node corresponding to the array name, if you call the Adjust () function to make the type adjustment, the Lvalue field in the Adjust () function is set to 0. For example, for the following expression, arr+1, when a semantic check is performed on a two-tuple operator, we call adjust () to adjust the type of the syntax tree node corresponding to the array name to int *, so that the arr+1 pointer operation can be performed, and for &arr, We need to treat arr as an lvalue so we can take its address, so we don't need to call the Adjust () function when we do semantic checks on operators &.
int arr[4];
arr+1;
&arr;
Let's take a look at 14.2.9 The addstring () function used in line 9th is shown in the corresponding code 4.2.10. The main thing we have to do is to take a name for these strings, and the No. 442 line of the FormatName () function completes this work, which is a C-language variable parameter function, and we will use a section to introduce the implementation principle of C-language variable parameter function.
Figure 4.2.10 AddString ()
Figure 4.2.10 Line No. 439 creates a struct Symol object, line No. 440 to No. 445 is used to initialize the symbol object, line No. 441 sets the symbol to the sk_string category, and the tk_static of the No. 444 line means that these "nameless" strings are actually arrays of characters that are considered static by the UCC compiler. In the UCC compiler, the string is not added to the symbol table, but is recorded with a one-way linked list consisting of several struct symbol objects, and the global variable symbol strings in UCL\SYMBOL.C records the link head address of the linked list. The No. 447 line of Stringtail always points to the end of the list, and the No. 447 to No. 448 Line completes the insert operation.
In order to compare the changes of grammatical tree nodes after parsing and semantic inspection more clearly, we give the graph 4.2.11, which depicts the comparison of Arr, constant node 3 and index node arr[3] in the marker node. From this we can find that the constant nodes of the OP domain op_const have not changed before and after the semantic check, while the OP domain is OP_ID's identifier node has changed a bit. Semantic checking, by looking at the symbol table, we add type information to the Ty field on the right side of the ARR node in the graph, the type of arr in the symbol table is int [5], but after the type adjustment of the Adjust () function we described earlier, the type of the ARR node is int *, Its Ty field is supposed to point to a struct type object, for the sake of simplicity, we bid directly on the figure int *, whose Val field after parsing is a pointer to the string "arr", but after the semantic check, we let Val point to the identifier arr corresponding to the symbol object, We need to use the content of the symbol object in Arr for intermediate and assembly code generation at a later stage. In the left-hand parsing of the ARR node, we did not draw the contents of its IsArray and Ty fields, which are all 0 because we have cleared the contents of these objects by 0 when we created an object from the UCC heap space.
Figure 4.2.11 Changes in the syntax tree
Specifically, we call Checkexpression (Arr[3]) for semantic checking, and in the Checkexpression () function, the syntax tree is checked in order of sequential traversal, The reason is that only if you know the type of node that arr corresponds to, we know the type of arr[3] corresponding to the node. By examining the semantics of declaring int arr[5], we established the type system, which is stored in the symbol table. As we have described earlier, we will discuss the DECLCHK.C code when we examine the declaration. When a semantic check is made on the expression arr[3], we need to retrieve the type information for ARR from the symbol table, and then let the type information propagate from the bottom up to the syntax tree. The OP domain of the node arr is op_id, so we use the Checkprimaryexpression function for semantic checking, As shown in line 31st of Figure 4.2.9, the lvalue of the ARR node in this function is first set to 1, but when Checkprimaryexpression returns, we return to the checkpostfixexpression () function, which is the context, We call the Adjust () function to complete the type adjustment of the left dial hand tree arr, so that the lvalue of the ARR node is 0, that is, the array name arr is not an lvalue, its type is int *, while the IsArray is set to 1, representing the type adjustment before the ARR node is an array type. Because the type of the ARR node is int *, thus the arr[3] node is of type int, and the Lvalue field of the arr[3] node is 1, indicating that the node is an lvalue.
C Compiler Anatomy _4.2 semantic Check _ Expression semantics Check (3) _ String and identifier