Lex and YACC application method (8). Use the stack compilation syntax
Papaya 20070604
I. Sequencing
The previous series focuses on the application of recursive syntax tree in compiler theory. This article will introduce another
Implementation method ---- stack.
The stack is widely used in the underlying system and is also very good at processing Syntax structures.
Describes how to construct a stack to complete syntax analysis.
Important: The following is a unified debugging and testing environment for the full sample code in this series of articles. For lex and YACC files
It must be stored in UNIX format, which is similar to the shell in Linux and Unix. The shell in DOS format cannot be executed.
The following error occurs when compiling a DOS file using the same bison and Lex:
Red Hat Linux release 9 (Shrike)
Linux 2.4.20-8
GCC version 3.2.2 20030222
Bison (GNU bison) 1.35
Lex version 2.5.4
Flex version 2.5.4
Note: This site will inevitably have errors and omissions. Lex, YACC series Article http://blog.csdn.net/liwei_cmg/category/207528.aspx
Ii. Example
This example provides the following functions:
1. integer, floating point, and string types are supported.
2. Variable storage is supported. Variable names can contain multiple characters.
3 supports integer and floating-point +-*/() = Algorithm
4. String assignment is supported.
5 print integer, float, and string types supported
6. Variable values can be printed.
7. Supports the while if else switch and nested control structures.
8 supported >>=<=! === Six comparison operations and string comparison
9 Support & | composite comparison operation
10 ignore spaces and tabs
11. Single line comment supported #
12 supports multiple combinations {}
13 support specific display of compilation errors
14 support for external variable value input (integer, floating point, and complex)
15. Support obtaining external variables (integer, floating point, and complex)
16 complete enterprise application model
Iii. Full sample code (omitted)
A. Stack. L
----------------------------------------------
B. Stack. Y
----------------------------------------------
C. Stack. h
----------------------------------------------
D. stackparser. c
----------------------------------------------
E. Public. h
----------------------------------------------
F. Main. c
----------------------------------------------
G. mk compile the shell file
----------------------------------------------
Bison-D stack. Y
Lex stack. L
Gcc-g-C Lex. yy. c stack. Tab. c stackparser. c
Ar-rl stack. A *. o
Gcc-g-o LW main. c stack.
H. mkclean shell file
----------------------------------------------
Rm stack. Tab. c
Rm stack. Tab. h
Rm Lex. yy. c
RM *. o
RM *.
Rm LW
Iv. Ideas
The Code listed above is currently the longest.
It can be seen that writing a stack compiler is not a matter of clustering. Even for the current instance, there must be a lot
Good place. To design a stack compiler, we often need to start with the simplest and easiest statement.
A. Simple stack analysis ideas
Let's take a simple example, A = 1 + 2. To complete the calculation of this formula, we must first press
Import the stack, and then analyze the + operation. At this time, you need to output 1 and 2 from the stack, execute the + method, and then press 3 into the stack. Continue analysis, required
Press it into a. In the final = operation, assign values to 3 and A to the stack, and then import a to the stack. The operation sequence is as follows:
ID: 0 Act: pushvalue
ID: 1 Act: pushvalue
ID: 2 Act: add
ID: 3 Act: pushvar
ID: 4 Act: assign
The difficulty in using stacks for compilation is to abstract the extremely complex syntax structure to simple inbound and outbound stack operations. Single
From this point of view, it is very difficult to reverse. Generally, we need to merge the command strings according to the set syntax.
The rule is compiled into an ordered sequence of commands. Then, develop the stack action execution function for the instruction sequence, execute the instruction in sequence, and call
Function.
Fortunately, as early as N years ago, Foreigners formed a very powerful compilation theory system (lex, YACC)
To complete the merge syntax. We only need to implement external rule actions.
B. Merge syntax Design of lex and YACC
Similar to the previous example, g_var is used to store the variable information during compilation, and g_sbuff is used to store the compilation statement.
G_string is added to uniformly store all strings in the compiled statement. The design methods of lex and YACC are similar,
Only some statement flags are redundant, such as ifx, elsex, and switchx. These labels are used to generate commands in the correct order.
Sequence.
Lex and YACC store all the compiled result commands in g_command. See addcommand.
/* Memory instruction set structure */
Typedef struct {
Int itypeaction;
Int itypeval;
Float fval;
Int Ivar;
Int istring;
Int iControl;
} Tcommand;
The design of this Instruction Set is critical. itypeaction describes the type of this instruction set, and itypeval indicates the instruction.
Value Type. Fval stores integer floating-point values and Ivar stores variable indexes. If the istring is not-1, it indicates words.
Index of the string. iControl indicates the control information returned by the command.
C. Stack Compilation
After compilation, Lex and YACC generate the command into g_command, traverse g_command, and call
Use relevant action functions to perform inbound and outbound stack operations. (See act functions.) Here, the entry and exit operations are g_command indexes.
All the results are stored in g_command, which is hard for outsiders to understand.
Tcommand struct elements are relatively independent. fval and istring are mutually exclusive. Ivar flag variable index and iControl
Only used to control the value of the stack.
At the beginning, we need to make overall analysis on various syntax structures. For example, if branch and loop statements
You need to maintain a control state. Due to the existence of nested statements, this control state must have the characteristics of a stack. Each Press
The new if statement must be used to judge the existing stack. If the previous if statement is false, the IF statement is still false even if it is true.
External else also needs to perform similar processing. Then, stack operations are performed on endif, marking that the IF/else has been processed. Switch is greater than if
It is a little more complicated. We need to record the original value and compare it with each case. While not only requires conditions but also jumps, this is
The act action function has an important reason for returning values.
The above analysis requires systematic consideration, which also facilitates future function expansion, such as Goto.
The stackvalue and stackcontrol stacks are introduced here. Value is used for normal sequential calculation, control
It is used for control structures such as if else switch while. For details about the control stack, see act_if, act_else, and other controls.
The action function reads the control information when compiling the command sequence to determine whether to execute the command. Note that Act
The action function also returns the index of the next command, which is mainly used for processing loops, jumps, and other aspects. The default value is sequential execution.
In general, the meaning of tcommand elements must be independent, and the g_command data must be processed strictly.
Validity.
D. Pass values and obtain Variables
The idea of the callback function is that the second-level pointer is used because the existence of the string creates some minor troubles,
The return value type is used for external judgment and processing. However, GCC in Linux UNIX does not support value passing by reference.
All are passed by pointers.
5. Notes
A. Stack. l and stack. y files must be in UNIX format, which is similar to shell in Linux and UNIX, and DoS
The shell in the format cannot be executed. An error is prompted when compiling DoS Files in the bison and Lex formats.
B. segmentfault mostly occurs in out-of-bounds memory (previously highlighted), and this type
No error occurs in the code that generates such errors, but the memory value is garbled. This is generally caused by improper pointer usage.
C. Avoid all warning items. For example, in stack. Y, remove the pre-Description of the function and prompt warning,
In execution, a fundamental error occurs when the function passes values.
D. The stack overflow error still occurs in compiling C/C ++. You can use ulimit to view the parameters, but most of them are also caused by memory
This is related to out-of-bounds. The first point of an inexplicable error is to consider a memory issue.
E. Stack. l, stack. Y. Pay attention to the rule application sequence and shift-Reduce sequence.
F. Patience is the most important thing. Try to print some debugging information as much as possible. It is not very important to debug complex syntax structures.
Easy to use.
Vi. Summary
The lex and YACC applications are currently running on Unix/Linux platforms and generate C code. Although C ++ does not use these interfaces
But cannot meet the requirements of the Windows platform. Starting from the following, we will introduce the use of such tools in windows,
And C/C ++/Java code generation.