Top-down syntax analysis and implementation of arithmetic expressions (I)

Source: Internet
Author: User

Anyone who has learned how to compile a sentence knows how to analyze the top-down Syntax of a sentence. I have referred to Academician Chen huowang's "compilation principles of advanced programming languages". In this article, I mainly describe how to implement a syntax analysis program from the perspective of compilation principles, by analyzing a typical example-an arithmetic expression, you can understand how to construct a practical syntax analysis program. At the same time, it also provides programmers with a solution to practical problems.

This article includes the following content:
1. Formula for generating arithmetic expressions;
2. Construction of top-down syntax analysis algorithms and generative functions;
3. Improvements to generative Functions;
4. error handling in syntax analysis;
5. Implementation of top-down syntax analysis programs.

1. Formula for generating arithmetic expressions

Here, I want to implement five arithmetic expressions: addition, subtraction, multiplication, division, and parentheses. For example, the syntax G1 of a simple arithmetic expression contains the following formula:
G1:
E-> E + E | E-E | E * E | E/E | (E) | I
To clarify the priority of operators (the priority of parentheses is higher than that of multiplication and division, and the priority of multiplication and division is higher than that of addition and subtraction), you can rewrite the syntax G1 as follows:
The modified grammar G2:
E-> T + E | T-E | T
T-> F * T | F/T | F
F-> (E) | I
Any arithmetic expression that has the priority of addition, subtraction, multiplication, division, and parentheses can be deduced by the formula in the above syntax. For example, for a row such as I-I * (I + I) the arithmetic expression has the following derivation process (where I is a number or variable identifier, the derivation needs to start from the start letter E, the following is the leftmost derivation ):

E => T-E => F-E => I-E => I-T => I-F * T => I-I * F => i-I * (E) => I-I * (T + E) => I-I * (F + E) => I-I * (I + E) => I-I * (I + T) => I-I * (I + F) => I-I * (I + I)

In this article, we will use the generative structure syntax analysis program in grammar G2.

2. Construction of top-down syntax analysis algorithms and generative Functions

We can transform the process of deriving a sentence from the start letter E to the terminator into a syntax tree. The root node (that is, the start character) is at the top and the leaf node (that is, the terminator, top-down syntax analysis is the process of traversing such a syntax tree from top to bottom. That is, each traversal starts from the root node (start character) and reaches the leaf node (Terminator) through each intermediate node (non-terminator except the start character ). If every generated expression is made into a function, we can easily traverse the syntax tree through recursive calls and backtracking of these functions. Therefore, we need three functions for the three generative expressions in grammar G2:
Void E_AddSub (); // The formula corresponding to non-terminator E
Void T_MulDiv (); // The formula corresponding to the non-terminator T
Void F_Number (); // The formula corresponding to the non-terminator F

We analyze input streams to implement top-down syntax analysis. During syntax analysis, we need an input character buffer to store the input arithmetic expression string. We need a character indicator to indicate the characters currently being analyzed, an error processing module is also required. In algorithm design and implementation, we use three global members: ch, advance, and error. Their meanings are as follows:

Ch the character specified by the current indicator
Advance () refers to a function that points to the next character in the input character buffer.
Error () error handler Function

From this we can construct a top-down syntax analysis algorithm, first analysis of the generative E-> T + E | T-E | T, it may be divided into the following three generative:
E-> T + E
E-> T-E
E-> T
First, write E-> T + E syntax analysis functions:

// List 1: syntax analysis functions of formula E-> T + E
Void E_AddSub ()
{
T_MulDiv (); // call the generative function analysis T of non-terminator T
If (ch = '+') // If the current character is '+ ',
{
Advance (); // The next character
E_AddSub (); // call the generative function analysis E of non-terminator E
}
Else // if not '+'
Error (); // handle the error
}

Seeing the algorithm in the above function, you probably can think of the top-down syntax analysis algorithm of the generative E-> T-E, that is, the If (ch = '+ ') change '+'. The following is a generation E-> T algorithm, which is simple:

// List 2: syntax analysis functions with formula E-> T
Void E_AddSub ()
{
T_MulDiv (); // call the generative function analysis T of non-terminator T
}

As you can see, you can write an analysis function for each generated expression and use them to call each other to traverse the syntax tree and derive sentences. Because e-> T + E, E-> T-E, E-> T three production types can be combined into e-> T + E | T-E | T, we can also combine the corresponding three generative functions into one function, because the generative e-> T, therefore, only the analysis function with non-terminator t can be called in E's generative function. Even if the next character is not '+' or '-', no error processing is required, and e-> T + E | the combination of T-E with a branch statement if (CH = '+' | CH = '-') to determine the right, this way, the Combined E-generated functions are as follows:

// Listing 3: analysis functions of generative e-> T + E | T-E | T
Void e_addsub ()
{
T_muldiv (); // call the generative function analysis t of non-terminator t
If (CH = '+' | CH = '-') // if the current character is '+' or '-',
// If it is '+', use the formula E-> T + E for derivation,
// If it is '-', use the formula E-> T-E to derive.
{
Advance (); // The next character
E_addsub (); // call the generative function analysis E of non-terminator E
} // At this time, the formula E-> T + E | the derivation algorithm of the T-E is concluded.
// If the next character is not '+' or '-',
// This function is derived based on formula E-> T and does not have to handle errors.
}

Similarly, you can easily write the top-down syntax analysis functions of the generative formula T-> F * T | f/T | f and F-> (e) | I:

// List 4: generation T-> F * T | f/T | f Analysis Function
Void t_muldiv ()
{
F_number (); // call the generative function analysis of non-terminator f
If (ch = '*' | ch = '/') // If the current character is '*' or '/',
// If it is '*', use the generative formula T-> F * T to derive,
// If it is '/', the formula T-> F/T is used for derivation.
{
Advance (); // The next character
T_MulDiv (); // call the generative function analysis T of non-terminator T
} // At this time, the formula T-> F * T | the derivation algorithm of F/T ends.
// If the next character is not '*' or '/',
// This function is derived based on the generative T-> F and does not have to handle errors.
}

// List 5: generation F-> (E) | analysis function of I
Void F_Number ()
{
If (ch = '(') // if the character indicated by the current indicator is '('
{// Derivation based on formula F-> (E)
Advance (); // skip '(', the indicator points to the next character
E_AddSub (); // call the generative function analysis E of non-terminator E
If (ch! = ') // Determines whether the next character is ')',
// You must ensure that the right and left brackets are matched.
Error (); // if an error occurs, handle the error.
Advance (); // if there is ')', the syntax is correct. Skip ')'

Return; // return
}
If (CH is a number) // if the character indicated by the current indicator is a number
{// Derivation based on formula F-> I
Advance (); // skip this number. The indicator points to the next character.
} // The syntax is correct and F-> I derivation is completed.
Else // if the character indicated by the current indicator is neither a number nor '('
Error (); // an error occurs, turning to the error handler
 
Return; // return
}

Because the deduction of syntactically compliant sentences starts with the start letter E, we need to implement this in the main program for syntax analysis:

// List 6: Main Program
Int main ()
{
....................................
// Initialize the input character buffer and character indicator
// Call the analysis function with the start letter e to start top-down syntax analysis:
E_addsub ();
// Analysis ends
....................................
Return 0;
}

The above functions implemented according to this method implement the top-down traversal of the syntax tree, thus displaying the top-down syntax analysis process. However, these functions do not implement specific functions, for example, to execute an arithmetic expression or calculate the value, I will consider these issues one after another in the following sections.

(To be continued)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.