Well, I admit that I am heading again and should not have pulled python out to compare with C. I have no intention of ignoring the C language. All I want to talk about is that it is quite comfortable to use Python than to use C to solve some problems.
The first chapter of beauul ul code is a C-language "Regular Expression" engine designed by another half-god and semi-programmer Rob Pike. The reason why "regular" is double quotation marks is as follows: will note ). Not to mention that dozens of lines of code are amazing masters. I will not repeat the details. If I can all be the first to say this code, kernighan will not be able to speak out in person. In short, he said: I was amazed by how compact and elegant this code was... I think that since kernighan is so appreciative, it must be the highest level of C language in terms of "compat and elegant. It is estimated that this is also the highest level of programming, right? For a while, I had no idea until one day I saw this Regular Expression Engine in 14 lines of Python. Again.
First, sort out the code. In fact, the code in the original post is not concise enough because it implements an existing function in the python standard library using three lines. To cancel these three lines, use the Import function. The Code is as follows:
From itertools import chain as iconcat
Def nil (s): yield s
Def seq (L, R ):
Return Lambda S: (SR for SL in L (s) for Sr in r (SL ))
Def ALT (L, R ):
Return Lambda S: iconcat (L (s), R (s ))
Def Star (e ):
Return Lambda S: iconcat (nil (s), seq (E, Star (e) (s ))
Def plus (e): Return seq (E, Star (e ))
Def char (c ):
Def match (s ):
If S and S [0] = C: yield s [1:]
Return match
As far as the amount of code is concerned, it is not much different from the C "Regular Expression" engine in beautiful Code. However, the Regular Expression Engine implemented in Python has unparalleled advantages in both functionality and simplicity, after analyzing how to use the Regular Expression Engine, let's take a look at its benefits.
When using this regular engine, a regular expression is constructed using the char, nil, SQ, ALT, Star, and plus functions, represented by BNF. The format is as follows:
EXP-> char (c) |
Nil |
SEQ (exp, exp) |
Alt (exp, exp) |
Star (exp) |
Plus (exp)
Its syntax is:
EXP-> char (c) indicates matching strings starting with the letter C;
EXP-> nil indicates matching an empty string;
EXP-> seq (exp1, exp2) indicates that if exp1 matches S1 and exp2 matches S2, exp matches the string connected by S1 and S2;
EXP-> ALT (exp1, exp2) indicates that string s must match one of exp1 or exp2;
EXP-> Star (exp1) matches an empty string or a string connected by one or more substrings that match exp1;
EXP-> plus (exp1) matches a string connected by one or more child strings that match exp1.
In short, it is consistent with the regular expression in the textbook regardless of the format or semantics. For example, the regular expression E = C (A | D) * R is displayed in the regular engine table:
E = seq (char ('C '),
SEQ (plus (ALT (char ('A'), char ('D '))),
Char ('R '))
Now it is a problem to use. The Python type of the constructed regular expression is a function. This function accepts a parameter, that is, the matched string. The result obtained by calling this function is a set in which each element is a string that matches a regular expression with the target string, the content of this string is the rest of the target string that cannot be matched. That is to say, if the regular expression can match the target string, the returned set is not empty. Otherwise, an empty set is obtained. Therefore, we can use:
If E (STR ):
To determine whether the regular expression e matches the string Str.
To understand the working principle of the Regular Expression Engine, you just need to grasp the aforementioned point: "The result of calling this function is a set, where each element is a string, matches a regular expression with the target string, and the content of this string is the unmatching part of the target string. In fact, each of nil, Char, seq, ALT, and plus meets this definition. This is also one of the advantages of the python Regular Expression Engine Design: simple, easy to understand, easy to implement, easy to ensure correct. It is quite difficult to write the regular engine in beautiful code. I admit, at my level, even if Rob Pike told me his design and asked me to write code, I could not write it correctly. Loops, pointers, and boundary conditions are too prone to errors. Such code is not from the hands of masters. The Python version of the regular engine is not the same, simple and clear semantics, good controllability, even let me write can also write a pair.
Although the amount of code is quite large, this python is more powerful than the C version in terms of functions. First, Version C is not a real Regular Expression Engine. It cannot express "match exp1 or match exp2", which greatly limits its practicality, while python is a textbook-style standard Regular Expression Engine. Second, the C version cannot express (ABC) * expressions, that is, the content of its Kleene closure can only be one letter, while the Python version of Kleene closure can be any expression. This is why quotation marks are required for the first word "regular.
The combination of the Python version code is also better than that of the C version. If you want to add functions on the basis of the original version, users in Python only need to add new functions, while users in C need to modify existing functions, so it is difficult to whisper on the same day. For example, if the Python version does not have the "match any character" function, we only need to add two lines of code to implement this function:
Def any (s ):
Yield s [1:]
Not only is the implementation simple, but it is also easy to test because you do not change the existing code.
Why is Python a lot better than C in implementing a simple Regular Expression Engine. Let's take a look at the language features not available in C used in Python: generator function, generator expression, high-level function, string slicing. String slicing is not only a simple expression for generating sub-strings, but also supports Python's Automatic Memory Management for ease of use.