Speaking of compilers from Lex & YACC (Address Book problem 1)
Use Lex and YACC to solve Address Book problems (1)
Preface
There are not many good examples of using lex and YACC to construct lexical analysis in general compilation principles. many tutorials simply mention the lex and YACC tools when explaining lexical analysis and syntax analysis. Many Chinese university textbooks do not even mention Lex and YACC. In fact, Lex and YACC are not only developed to build a compilation system. This section describes how to use Lex and YACC by a simple problem of extracting the information of communication records.
Extract Address Book Information
A few days ago, I received a friend asking me how to extract the name and phone number of the person in the address book through lexical and language analysis. I changed the problem as follows:
I have a notepad. The information in it is the communication record information generated by the telephone. In the format of record.txt. The information is composed of the following methods.
--------- 2004.1.10 ----------
Name: jeclee
Tel: 05513606124
--------- 2004.1.11 ----------
Name: Wangan
Tel: 075528979205
...
Now I want to build a database system. I need to enter the name and phone number of the person who calls me. So I need to consider extracting useful information from the record format files generated by such telephones. Of course, there are too many solutions, but in this section, we will explore the use of lex and YACC tools to easily construct a syntax analyzer for the information.
Search for Lex and YACC tools
Maybe you think it is too troublesome to use the compilation principle to solve this problem, but when we have Lex and YACC, the complicated processing will be simplified. Lex and YACC are two tools in Unix. Generally, You need to search for them online when using Windows OS. I use flex.exeand bison.exe in cygwin. Bison.exe is YACC. cygwin is a UNIX simulation tool on Windows. You can use the next cygwin.
Input File of lexical analyzer
I have mentioned the issue of regular expressions in previous articles in this series. For detailed explanations, please refer to the compilation principles teaching materials.
Here I will first give some basic lexical regular expressions, which will appear in almost every lexical input file.
Digit [0-9]
Number {digit} +
Letter [A-Za-Z _]
Identifier ({letter} | _) ({number} | {letter} | _)*
Newline [/n] | [/R] [/n]
Whitespace [/T] +
In the record file on the phone, there is also the flag Header "--------- 2004.1.10 ----------" We did not consider. The Regular Expression of the Flag header is very simple, that is, the combination of "-" and numbers and points. Therefore, you can easily write down its regular expression.
Begin [-] + ({number} [.]) + [-] +
Here, the [-] represents the "-" symbol, and the number is given before it, which is an integer. and comma ". [.]. so ({number} [.]) + It indicates the date information in the record header, but we do not need to know the date information here, so we do not need to extract it separately, and it can be completely buried in a simple regular expression.
All right, sort these regular expressions into a flex input file named record. L.
The entire telephone record uses a fixed syntax, so the syntax input file is relatively simple. After lexical analysis, our work is almost half done.
2003-1-13
Author: Tang Liang tangl_99
QQ: 8664220
MSN: tangl_99@hotmail.com
Email: tangl_99@sohu.com
Chengdu, Sichuan University, Computer Science Institute