Reprinted please indicate the source: http://www.cnblogs.com/dzodzo/archive/2009/12/15/1624225.html
Http://www.fsderno.com/pdf/complier1.pdf (PDF)
Introduction
The course on compilation principles is being written. To combat forgetting, write this articleArticleEnhance your memory and hope to help you.
In the compilation principle, you need to convert the regular expression to DFA. One step is to calculate the nullable, firstpos, lastpos, and followpos of each node on the syntax analysis tree. If you do not understand the principles of the computation rules, this is a very painful task. The purpose of this article is to clarify the two problems: Why computing and how computing are required.
Prerequisites:
Sentence: Given the syntax G [Z], if there is Z + => X, and X in Vt *; then X is the syntax G [Z] sentence.
Or node: the internal node labeled as parallel operator |.
Cat node: indicates the internal node of the connection operator.
Star node: the internal node labeled as asterisk operator.
Location: Each Terminator corresponds to a location. 1.
Calculate nullable (N)
After finishing the preliminary knowledge, let's talk about nullable (n), which is easier to understand. Why do we need to calculate nullable (n? That's because we need to calculate firstpos and lastpost based on the value of the nullable (n) function of node n. When it comes to this, someone may scold me. "You did not say this ". I hope you will not be confused, because the problem is interlocking. I just put the answer back.
Nullable (n) indicates whether to include null strings in the sentence set derived from N as the root node. If the resulting sentence set contains null strings, nullable (n) = true; if no empty string exists, then nullable (n) = false. Now we analyze the five situations of node N:
1. When node N is a leaf node and the value is null, the sentence to be exported using N as the root node must be empty. Therefore, nullable (n) is true. 2.
2. When node N is a leaf node and the value is ID, the N-derived sentence has a non-null value ID. Therefore, nullable (n) is false.
3. When node N is or, N must be an internal node. Because of the yes or operation, when the left subtree (C1) of node N or the right subtree (C2) can export null strings, node n can also export null strings. That is, nullable (n) = nullable (C1) or nullable (C2 ). In the following three cases, nullable (n) = true. (C1-> ε indicates that C1 can export an empty string ε, the same below)
4. When node N is a cat node, node n must be an internal node. Due to the join operation, when the left subtree (C1) of node N and the right subtree (C2) can simultaneously export null strings, node n can export null strings, that is, nullable (N) = nullable (C1) and nullable (C2 ). In one of the following cases, nullable (n) = true.
5. When node N is a star node, node n must be an internal node. According to the definition of the Kleene closure operation, the sentence set derived from node n contains null strings. Therefore, nullable (N) = true.
Calculate firstpos (N)
After completing the nullable calculation rules, let's talk about firstpos (n ). Why is it firstpos? That's because we are preparing for followpos calculation: Don't be depressed. Continue and you will surely know the ultimate reason! The firstpos (n) function defines the position set in the subtree with node N as the root. These positions correspond to the first symbol of a sentence derived from the root node N ("A" indicates that there may be multiple, so the firstpos calculation result is a set of positions ). We still follow the nullable analysis method to describe the computing rules of firstpos (n) based on five node types.
1. When node N is a leaf node and empty string, there is no first symbol, that is, firstpos (n) = {ø }.
2. When node n is the leaf node of position I. At this time, node n can only push and export the terminator of location I, so firstpos (n) = {I}
3. When node N is or (internal node), perform the or operation to export the first position set firstpos (C1) from the left subtree (C1) it is included in firstpos (n), and the Set firstpos (C2) at the first position of the right subtree (C1) is also included in firstpos (n ). Therefore, firstpos (n) is equal to the union of the left and right subtree firstpos. That is, firstpos (n) = firstpos (C1) U firstpos (C2 ).
4. When node N is a cat node (internal node), the connection operation is performed. If the left subtree cannot export an empty string, the first symbol of a sentence pushed by node n must be in firstpos (n. If the left subtree can push and export empty strings, the first symbol may appear in the sentence of the right subtree.
Therefore: If (nullable (C1 ))
Firstpos (n) = firstpos (C1) U firstpos (C2) // Figure 9
Else
Firstpos (n) = firstpos (C1) // figure 10
5. When node N is a star node (internal node), node N has only one subtree (C1). No matter whether node n can push and export empty strings, firstpos (N) = firstpos (C1 ).
Calculate lastpos (N)
The rule for calculating lastpos is essentially the same as the rule for calculating firstpos. However, in the rule for cat nodes, the roles of subtree C1 and C2 must be reversed. To avoid lengthy articles, this article will not detail the lastpos calculation rules.
Calculate followpos (I)
After reading the above, I believe you have crossed most of the obstacles. In order to win, we will continue to introduce the followpos calculation rules. To describe the calculation method rules, first introduce why followpos () is calculated (). Back to the second section of this article, we calculate the objective of converting a regular expression to DFA. DFA has multiple States (corresponding to the position in this article), and on a state n, followpos (n) indicates the set of the next state that can be reached in the current state. Therefore, as long as we know the starting state set, we can construct the DFA of the Regular Expression by calculating followpos (for the specific construction method, refer to Chinese Violet P113 ).
Next we will analyze the calculation rules of followpos (N). Fortunately, there are only two cases discussed this time. Because only node N is a cat node or a star node will make the location of a regular expression after another location. The or node will only select one of the subtree to deduce the sentence, so there is no dependency between the two subtree positions.
1. When n is a cat node, the join operation is performed. Therefore, the firstpos of right subtree C2 must be followed by each position in the lastpos set of left subtree C1. For example, lastpos (C1) = {1, 3}, for each position I, followpos (I) = first (C2), that is, follow (1) = follow (3) = first (C2 ).
2. When n is a star node, we know from the definition of closure operations that the followpos set of position I in lastpos (n) is equal to firstpos (n ). It's like a hungry snake. It's dizzy and thinks its tail is another food. It's about to go up. At this time, its head is followed by its tail, this is similar to the discussion on the star node. To make it clear, let's take another simple example. firstpos (n) = {2}, lastpos (n) = {1, 3}, then 1 may follow 2, that is, follow (1) = {2 }. Follow (3) is the same.
Summary
Summary of calculation rules for nullable (N), firstpos (n), and lastpos (n:
Summary of followpos (I) calculation rules:
① When n is a cat node and Its left and right subtree are C1 and C2, for all positions I, firstpos (C2) in lastpos (C1) all the locations in are in followpos (I.
② When n is a star node and I is a position in lastpos (N), all the positions in firstpos (n) are in followpos (I.
Conclusion
Due to the rush of time, this article will certainly have many omissions and I hope you can give me some suggestions.