Pl|0 's lexical analysis program Getsym is an independent process, its function is to provide the word for the syntactic semantic analysis, the input string form of the source program is divided into a word symbol to pass to the syntactic semantic analysis, for this pl|0 compiler set up 3 whole variables as follows:
SYM the category of each word, expressed in the form of internal coding;
The ID holds the value of the identifier defined by the user, which is the in-machine representation of the identifier string;
NUM stores the user-defined number.
There are 5 types of words
The base word can also be called a reserved word, such as begin end If then
Operators such as +-et
Identifier user-defined variable name, constant name Procedure name
Constants such as 10 100 integers
interface character
If we call the basic word, the operator interface character is the language intrinsic word, but to the identifier, the constant is called the user-defined word, the lexical parser to the language intrinsic word only gives the category to store in the sym, but to the user-defined word (identifier or constant) gives the category and the value, its category is placed in the SYM, The value is placed in the ID or NUM, and the whole of the word is given by the compiler-defined scalar type symbol, as the ifsym,thensym,ident,number presented below is the element in the symbol
The lexical analysis program Getsym will complete the following tasks
(1) Filter space space in the lexical analysis is an indispensable interface, and in the syntax analysis is useless, so must filter out.
(2) identify the reserved word main program defines a character as an element of a one-dimensional array, called the reserved word table, the beginning of each letter letter, number string to look up this table, if the search is to identify the reserved word, the corresponding category is placed in the sym, if not found, it is considered a user-defined identifier.
(3) Identifiers are placed ident in sym with user-defined identifiers, and the value of the identifier itself is placed in the ID
(4) When scanning to a digital string, the decimal number in the form of a string is converted into a binary number represented in the machine, and then the number of the class is placed in the sym, and the value itself is stored in NUM;
(5) A combination of compound words to two characters of the operator, after recognition will be sent to the category Sym;
(6) Output source program for side read into character edge output
Since a word is often composed of one or more characters, a getsym process getch is defined in the lexical analysis process, which is called when the lexical analysis needs to take a character.
Description of the unit used by the Getch
CH: Holds the currently read characters, the initial value is set to null;
Line: A one-dimensional array in which the array element is a character, and the bounds pair is 1:80 a buffer for reading in a row of characters
Syntactic semantic analysis of pl|0 compiler program
The task of parsing is to identify whether the sequence of word symbols conforms to the given grammatical rules, and the grammatical rules of the pl|0 language are given before, and this section will be based on the grammatical form described in the syntax diagram, and give the intuitive thought of the parsing process.
pl|0 compiler syntax analysis using a top-down recursive subroutine method, roughly speaking, is corresponding to each non-terminator syntax unit, a separate processing process (subroutine), the grammar analysis from the beginning of the first word read into the beginning of the "program", along with the syntax of the description of the diagram into a grammar unit, And then along with the currently entered syntax description diagram of the arrow direction analysis, when encountering the descriptor is a terminator, then determine whether the current read words with the Terminator in the graph to match, if matching, then execute the corresponding semantic program (is the translation program), then read the next word continue to analyze, When a branch point is encountered, the current word and multiple terminator on the fulcrum are compared one at a time, and if they do not match, they may enter the next non-terminator syntax unit or make an error.
If a sequence of words for a pl|0 language program is in the entire parsing, it can be matched one after the other until the program terminator. , then say that the program you entered is correct. For the correct syntax analysis to do the corresponding semantic translation, generate the target code.
The above-mentioned grammatical analysis process is very straightforward, in fact, the use of recursive subroutine method to construct a grammar analysis program, the grammar has certain requirements and limitations, this issue we will be in the fifth chapter of the detailed discussion.
In addition, it can be clearly seen from the syntax of the pl|0, pl|0 syntax parsing, the various non-terminator syntax units corresponding to the analysis process there is a call relationship between the call graph is also called pl|0 syntax dependency graph, in the diagram, the arrow points to the program unit represents the existence of the call relationship, It is not difficult to see from the diagram that these subroutines are called directly or indirectly in the parsing process.
In addition to the main program, the processing of syntactic semantic analysis is done by the sub-procedure block process, the description part of the process block and the program body part of the processing instructions are as follows:
(1) Handling of the Description section
Because the Pl|0 language allows procedure call statements and allows procedures to be nested in definitions, each procedure should have a process header that defines constants, variables, and procedure identifiers that are local to its own process, also known as local quantities. The local quantity defined by each procedure can only provide itself and its own defined internal procedure reference, and the call relationship for the same layer side-by-side procedure is defined as a reference that can be defined later, and vice versa.
The processing task of the Description section is for each process (including the main program, also can be regarded as a main program) of the Description object, make the name table, fill in the level (the main program is the No. 0 level, the main program is defined by the first layer, as the depth of nesting increases and the number of levels increased) pl|0 allow 3 more layers, The attribute of the identifier and the relative position of the assignment. The attribute of the identifier is not the same, and the information required is different, and the logon information is done by calling the Enter procedure.
The name table is a one-dimensional array table,tx is a pointer to the index table, each element of the table is a record type of data, the table of Lev represents a hierarchy, DX represents to the level of the local variable allocation of the relative storage location, after each description of a variable dx plus 1.
For example, a fragment of a Pl|0 language process description section is:
CONST a=35,b=49;
VAR C,d, E;
PROCEDURE P;
VAR G
After analyzing the constants, variables, and process descriptions, the information in the table table is shown, where the ADR domain of the process name is the entry address of the body of the reverse-filling process when the target code of the process body is generated, and the level of the P procedure is Lev, then the level of the variable name g defined by P procedure is lev+ 1,size is the data space required to record the process, and in the process of watchmaking and table checking, how to ensure that the local quantity of each process is not referenced by its outer layer, it should summarize itself.
Table Table Header Index TX and level Lev, are the value parameters of the block, when the main program calls block the actual parameter value is 0, each process the relative starting position of the variable is the block built-in initial value dx:=3.
(2) Analysis of the process body
The main body of the program is composed of statements, after processing the description of the process is processed by the body of the statement, from the syntax of sentence parsing, when the syntax is correct to generate the corresponding statement function of the target code, when encountering the reference to the identifier call position function to check the table, See if there's a right definition. If there is, then use the appropriate information in the table for the generation of code, if not defined, call the error handler.
pl|0 compiler's target code structure and code generation
pl|0 compiler generated by the target code is an imaginary stack of computer assembly language, can be called Class pcode directive code, it does not rely on any actual computer, its instruction set is relatively simple, the instruction format is:
F L A
The F represents the function code, l denotes the level difference, that is, the reference variable or process of the sub-program and the description of the variable or process of the difference between the sub-program, a meaning of different instructions, the interpretation of each instruction is as follows:
The target directive has 8 lines.
LIT takes a constant to the top of the run stack, and a field is a constant value.
LOD puts variables at the top of the stack, a field is the process of explaining (defining) variables, allocating storage space to the variables, and the offset from the base address of the running stack relative to the process, and L is the difference between the call layer and the description layer.
STO deposits the contents of the top of the stack into a variable unit, the meaning of the a,l field is the same as the LOD directive
The CAL invokes the instruction of the procedure, a is the target program of the called procedure, the entry address, and L is the layer difference.
INT is the data area that the called procedure (or main program) opens in the run stack, and the number of units opened by the A field.
JMP is the unconditional transfer command, a for the turn address
JPC Conditional transfer instruction, when the Boolean value of the top of the stack is non-true, turn to the address of domain A, or else the order is executed.
The l domain of the OPR OPR determines the meaning of the operation for the contents of the 0,a domain, specifically:
A=0 returns the call point at the end of the procedure call and frees the data space of the called procedure on the running stack;
A=1 stack top value reversed result at top of stack
a=2 2~5 the stack top and the top of the stack to do arithmetic operations, the results are stored at the top of the second stack.
A=6 the stack top value to make a parity judgment, odd is true, even false, results in the top of the stack,
A=8~13 the stack top and the contents of the top of the stack to do relational operations, the results are placed on the top of the sub-stack,
a=14 stack top value output to the screen,
a=15 Screen output line break
A=16 reads an input from the command line to the top of the stack
A is an illegal instruction for all other values.
The target code of the compiler is generated when the program body is parsed, the target code is not normally generated in the processing instructions section, and when each statement in the parser body is parsed, the target code generation process is invoked to generate the target code with the PL|0 statement equivalence function until the compilation ends normally.
Pl|0 language code generation is done by the process Gen, the Gen process has three parameters, respectively, the function code of the target code, the layer difference, and the amount of displacement, (the meaning of the different instructions,) generated code sequence stored in the array code, codes for one-dimensional arrays, array elements are recorded data, Each record is a goal instruction, CX is the code subscript pointer, starting from 0 to increase the order, in fact, the order of the target code is the inner process of the row in front of the main program's target code at the end.
Syntax error handling for PL|0 compiler
Writing a program, often difficult to succeed at once, often there are various types of errors, usually grammatical error, semantic error, and run the wrong, the cause of the error is manifold, which brings a lot of difficulties in error handling, in the case of grammatical errors, any compiler in the grammar analysis encountered errors, will not stop this work, Instead, it is desirable to be able to pinpoint the location and nature of the error and correct it as much as possible, so that the compiler can continue to work, but it is difficult to do so for all errors, mainly because the compiler cannot fully determine the programmer's intentions, for example, in an expression where parentheses are not paired, It is not possible to determine where should not be, and sometimes because the wrong correction will result in the correct part of the later, but it is considered wrong, so the compiler can only take some measures to the source program errors as far as possible to detect, in order to modify,
The pl|0 compiler handles syntax errors in two ways:
For some errors that are easy to correct, such as dropping commas, semicolons, and so on, points out the wrong position and corrects it by filling the comma or semicolon with the correct method;
For some errors, it is difficult for the compiler to determine the corrective measures, in order to make the current error does not cause the entire program to crash, the error is confined to a local syntax unit, so that you need to skip some of the input word symbol, until read into a can make the compiler to resume normal parsing work of words, When parsing enters or exits a handler for a unit of syntax, it is called a test program, whose function is to check whether the current word belongs to the starting or followed symbol collection of the syntactic unit.
Pl|0 each non-terminator at the beginning of the grammar unit and followed by the symbol collection
The test procedure has three parameters, meaning:
S1 when a grammar parse enters or exits a grammar unit, the current word symbol should belong to the collection, which may be a set of starting symbols of a grammar unit or a set of subsequent symbols;
S2 in a certain error state, recoverable syntax analysis continues to work as a complement to the word symbol collection, because parsing error, that is, the current word symbol is not in the S1 collection, in order to continue compiling the program, you need to skip the following input word symbol, until the current input word symbol is S1 or S2;
n Integer number error message number
In order to further clarify the meaning of the s1,s2 set, a test procedure is called at the entrance of the procedure FACTOR by the parser of the factor (FACTOR), and its argument S1 is the set of the start symbol of the factor. (Facbegsys in text). S2 is the pass-through value of the argument fsys the invocation of each procedure, and when the compiler first calls block, the Fsys argument is [.] The and of the set of start and statement start characters that are associated with the description. In the future, as you call the parser hierarchy, the depth increases gradually. such as adding "," and "Endsys" when calling statements, adding "+", "-" when invoking an item in expression parsing, and adding "*" and "|" When invoking a factor in an item so that when the factor parser is entered, even if the current symbol is not the start of a factor, after an error, just skip a certain symbol. Encountered the current input word symbol in the fsys or in the factor start symbol set can continue to parse, at the exit of the factor process, also called the test program, but this is the S1 and S2 arguments on the contrary, indicating that the Fsys set of words at that time is the normal export of the word symbol is allowed when the symbol, The starting symbol for the factor is a supplemental word symbol that restores normal parsing, whereas the Pl|0 compiler has some flexibility in the call to test program tests.
Semantic errors, such as the identifier is not described in the reference, or although the description, but the reference and description of the property is inconsistent, then only give the error information and error location, compile work can continue, and on the run error, such as overflow, out of bounds, etc., only at run time, due to pl| 0 The functional limitations of the compiler cannot indicate the corresponding location in the source program for errors that occur at run time.
You can also give the error information table as follows
The target code of the Pl|0 compiler interprets the storage allocation at execution time
When the source program is parsed and if no errors are found, the interpreter is called by the compiler, and the target code stored in code is interpreted from code[0]. When the compilation is finished, it is not useful to record the table of identifiers in the source program. Therefore, the store simply stores the read-only target program and the runtime's data area in the array code. S is a one-dimensional integer array defined by the interpreter, because the target program of the Pl|0 language is an illusion of the storage space of the stack computer, following the last-in-first-out rule, the data space is allocated for each process (including the main program) when it is called, and the allocated data space is freed.