PL/0 is a subset of Pascal. The PL/0 Compilation Program analyzed here includes analysis and processing of PL/0 language source programs, compilation and generation of pcode-like code, explain the function of running the generated pcode class on the VM.
The PL/0 language compilation program uses the syntax analysis as the core and the scanning method. Lexical Analysis and code generation are independent subprograms for the syntax analysis program to call. Syntax analysis also provides Error Reporting and error recovery. If the source program does not pass the compilation error, call the class pcode interpreter to explain and execute the generated class pcode code.
Lexical analysis subroutine analysis:
The lexical analysis subroutine is called getsym. The function is to read a word symbol (token) from the source program and put its information into the global variables sym, ID, and num. When the syntax analyzer needs words, these three variables are obtained directly. (Note! The syntax analyzer calls the getsym subroutine to obtain new words for the next use every time the values of these three variables are used up. Instead of calling the getsym process when a new word is required .) The getsym process obtains characters from the source program by repeatedly calling the getch subprocess and concatenates them into words. The row Buffer technology is used in the getch process to improve the program running efficiency.
Lexical analyzer analysis process: When getsym is called, it obtains a character from the source program through the getch process. If the character is a letter, you can continue to get the character or number, and finally combine it into a word to query the reserved word table. If it is found to be a reserved word, the sym variable is assigned to the corresponding reserved word type value; if not found, the word should be a user-defined identifier (may be a variable name, constant name, or process name), and Sym is set to ident, store the word in the ID variable. Binary lookup is used to query reserved word tables to improve efficiency. If the characters obtained by getch are numbers, continue to use getch to obtain the numbers, and combine them into an integer. Then, set sym to number, and put the assembled values into the num variable. If other valid symbols (such as value assignment, greater than or less than or equal to) are identified, sym is of the corresponding type. If an invalid character is encountered, set sym to NUL.
Syntax analysis subroutine analysis:
Syntax analysis subprograms use top-down recursive subprograms. syntax analysis also generates the corresponding code based on the language of the program, and provides an error handling mechanism. Syntax analysis mainly consists of the sub-program analysis process (Block), constant definition analysis process (constdeclaration), variable definition analysis process (vardeclaration), and statement analysis process (statement) expression, term, factor, and condition. These processes form a nested hierarchy. In addition, there are error reporting, code generation, test, and enter processes) query the position function and the auxiliary process of syntax analysis for listing the class pcode code process (listcode.
A complete PL/0 program consists of a division program and a period. Therefore, when the compilation program is running, the sub-program processing block is called in the main program to analyze the sub-program part (the Sub-program analysis process may also recursively call the block process ), then, judge whether the last read symbol is a full stop. If it is a full stop and there is no error in program analysis, it is a legal PL/0 program. You can run the generated code. Otherwise, it indicates that the source PL/0 program is invalid, output Error prompt.
The following describes the operating mechanism of PL/0 compiled programs based on each syntax unit.
Sub-program processing process:
After the syntax analysis starts, call the sub-program processing process (Block) to process the sub-program. The process entry parameter is set to layer 0, the position of the symbol table is 0, and the set of error recovery words is a full stop, declaration, or statement start character. After entering the block process, first set the pointer of the local data segment to 3, and prepare to allocate three units for storing static link SL, dynamic link DL, and return address Ra at runtime. Then, use tx0 to record the position of the current symbol table and generate a JMP command to jump to the starting position of the main program. As we do not know where the main program is started, therefore, the JMP target is set to 0 for the time being and will be changed later. At the same time, the position of the JMP command in the code segment is recorded in the current position of the symbol table. After judging that the number of nested layers does not exceed the specified number of layers, analyze the source program. First, determine whether a constant declaration is encountered. If yes, start the constant definition and store the constant into the symbol table. Next, use the same method to analyze the variable declaration. During the variable definition process, the DX variable is used to record the number of space allocated by the local data segment. If you encounter procedure reserved words, you can declare and define the process by recording the process name and level to the symbol table. The process definition method is to call the block process recursively, because each process is a sub-program. Because this is the sub-program in the sub-program, when calling the block, you need to add the current level Number "column" and pass it to the block process. After the program declaration is completed, it is about to process the statement. At this time, the value of the Code allocation pointer CX points to the starting position of the statement, which is exactly the position that the previous JMP Command needs to jump. Then, the jump position of the JMP command is changed to the current CX position through the previously recorded address value. In the symbol table, record the distribution address of the current Code segment and the size (dx value) of the local data segment ). Generate an int command and allocate DX space as the first command of the program segment. Next we will call the Statement Analysis Statement in the statement processing process. After the analysis is complete, the OPR command with the operand 0 is generated for returning from the sub-Program (for the master program at Layer 0, the program runs successfully and exits ).
Constant definition process:
The identifiers and corresponding values are obtained repeatedly through loops and stored in the symbol table. The name of the identifier and its value are recorded in the symbol table.
Variable definition process:
Similar to constant definitions, identifiers are obtained repeatedly through loops and stored in the symbol table. The symbol table records the name of the identifier, its layer, and its offset address in its layer.
Statement processing process:
The statement processing process is a nested subprogram that calls expressions, item processing, factor processing, and other processes and recursively calls itself to analyze statements. Statements that can be identified during statement processing include the value assignment statement, read Statement, Write statement, call statement, if statement, and while statement. When you encounter a begin/end statement, you can recursively call your own analysis. Generate the corresponding class pcode command at the same time of analysis.
Assignment Statement processing:
First, obtain the identifier on the left of the value assignment number, find its information from the symbol table, and confirm that the identifier is indeed a variable name. Then, the value of the expression on the right of the value is calculated by calling the expression processing process, and corresponding commands are generated to ensure that the value is placed at the top of the data stack during runtime. Finally, the corresponding sto command is generated based on the location information of the Left variable found above, and the stack top value is stored in the space of the specified variable to realize the value assignment operation.
Read statement processing:
Determine whether the read Statement syntax is reasonable (otherwise an error is reported) and generate the corresponding command: the first is the OPR command for operation on the 16th to read an integer from the standard input device, place it on the top of the data stack. The second is the STO command, which stores the value at the top of the stack into the unit of the variable in the brackets of the read Statement.
Write statement processing:
Similar to read statements. When the syntax is correct, generate a command: analyze each expression in the brackets of the Write statement by calling the expression process cyclically, generate the corresponding command to ensure that the expression value is calculated and placed on the top of the data stack. then generate the OPR command for Operation 14 and output the expression value. Finally, the OPR command of the 15th operation is generated to output a line feed.
Call Statement processing:
Find the right-hand identifier of the Call Statement from the symbol table to obtain its level and offset address. Then generate the corresponding Cal command. The protection field required to call the sub-process is automatically completed by the class pcode interpreter when the Cal command is interpreted and executed.
If statement processing:
According to the syntax of the IF statement, first call the conditions of the logic expression processing process to process the if statement, and place the corresponding real value to the top of the data stack. Next, record the distribution location of the code segment (the location of the JPC command generated below), and generate the condition transfer JPC command (in case of 0 or false transfer). If the transfer address is unknown, enter 0. Then, call the statement processing process to process the statement or block following the then statement. After the then statement is processed, the pointer allocation position of the current Code segment should be the transfer position of the preceding JPC command. Change the jump position of the JPC command recorded above to the pointer position of the current Code segment.
Processing of the begin/end statement:
Traverse every statement in the in/end statement block cyclically, analyze the process by calling the statement recursively, and generate the corresponding code.
Processing of the while statement:
First, use the cx1 variable to write down the distribution position of the current Code segment as the starting position of the loop. Then, process the conditional expression in the while statement to generate the corresponding code and place the result on the top of the data stack. Then, use the CX2 variable to write down the current position and generate the conditional transfer command. The transfer location is unknown and enter 0. Analyze the statement or statement block after the DO statement through recursive call Statement Analysis and generate the corresponding code. Finally, an unconditional jump command JMP is generated to jump to the position indicated by cx1, And the jump position of the conditional jump command referred to by CX2 is changed to the distribution position of the current Code segment.
Processing expressions, items, and factors:
According to the PL/0 syntax, the expression should start with a positive or unsigned and be connected by a number of items with a plus or minus sign. An item is composed of several factors connected by multiplication and division numbers. A factor may be an identifier, a number, or a subexpression enclosed in parentheses. According to this structure, the corresponding process is constructed, and recursive calling completes Expression Processing. Separate the item and factor to solve the priority problem of addition and subtraction and multiplication. During the repeated calls of these processes, the value of the fsys variable is always passed to ensure that the error symbol can be skipped in case of an error, so that the analysis process can proceed.
Processing of logical expressions:
First, judge whether it is a one-dimensional Logical expression: Judge the parity. If yes, analyze the value of the calculated expression by calling the expression processing process, and generate the odd command. If not, it must be a binary logical operator. by calling the expression processing process, the values of the left and right sides of the operator are analyzed in sequence, placed in the two spaces at the top of the stack, and then according to different logical operators, generate the corresponding logic judgment command and place it into the code segment.
Analyze the process of determining word validity and error recovery:
This process has three parameters, S1 and S2 are two symbol sets, and N is the error code. The function of this process is to test whether the current symbol (the value in the sym variable) is in the S1 set. If not, the error code N is output by calling the error report process, discard the current symbol and obtain the word through the lexical analysis process until the word appears in the S1 or S2 collection.
This process is flexible in actual use, mainly in two usage:
When entering a syntax unit, call this process to check whether the current symbol belongs to the starting Symbol Set of the syntax unit. If not, filter all the symbols except the start symbol and the successor symbol set.
At the end of the syntax Unit analysis, call this process to check whether the current symbol belongs to the set of subsequent symbols that should be used to call the syntax unit. If not, filter all the symbols except the successor and start symbols.