How to construct complex regular expressions _ regular expressions

Source: Internet
Author: User
Tags arithmetic
The text question originally is "how constructs the complex regular expression", but feels some ambiguity, feels the regular formula originally is very simple, I teach the person how to make it trivial big. On the contrary, my intention is to say that even if the complex regular formula is not afraid, find the right way to construct it.

evasive

The text given by Snopo is like this: OR and name= ' Zhangsan ' and id=001 or age>20 ' area= ' and like, asking how to extract the correct SQL query statement.

A brief analysis shows that the middle part is desirable, except that there are several like, or, and at both ends. It is more complicated to construct a regular expression that resolves a query statement that conforms to SQL syntax. However, for specific problems, it can be simpler. The above-mentioned bad structure of the SQL statement, should be used automatically generated by the program, its two ends will have some text that does not meet the requirements. Just get rid of the text.

So, I wrote the regular expression: s/^ (?:(?: O R|and|like) \s*) +|\s* (?:(?: O R|and|like) \s*) +$//mi, which removes all the like, or, and the possible whitespace characters from the end of a multiline string, and the remainder is required.

Divide and conquer

After the answer was passed, Snopo was clearly not satisfied with the "lazy" approach. He continues to ask, can you write a regular formula for the conditional query statement that the SQL syntax requires? (Only the where part is considered, and you do not have to write a complete select.) )

Indeed, from the point of view of rapid problem-solving, as long as can be effectively resolved, by what means can be, but from the perspective of learning knowledge, not to the point, but to find out, is the path. Even so, just look at how you can use the regular to resolve the SQL query statement.

The simplest query statement should be a true or false judgment, that is, where 1; where True; where false, and so on. Such statements use regular, direct/(?:-?\d+| true| False)/I.

A slightly more complex single statement can be a left-right comparison, i.e.

Copy Code code as follows:

Name like ' zhang% ', or age>25, or work in (' It ', ' hr ', ' r&d ')

。 By simplifying it, the structure becomes a OP B. Where a represents the variable, OP represents the comparison operator, and b represents the value.

A: The simplest a, should be \w+. Given the actual situation, the variable contains the dot or caret, such as ' table.salary ', which can be recorded as/[\w. ']+/. This is a more general refinement. If the requirements are more stringent, you can also do so that the caret on both sides appear (conditional judgment).
Op:where commonly used several relations are: =, <>, ",", >=, <=, Between, like, in. Using a simple regular description, become:/(?: [<>=]{1,2}| between| like| IN)/I.
B:b's situation can be divided into 3 kinds: variables, numbers, strings, lists. For simplicity's sake, you don't think of arithmetic expressions here.

◦ variables, you can directly apply the definition of a. Not to repeat.
◦ Number: Use/\d+/to define. No decimal and negative numbers are considered.

◦ String: Includes single quote strings and double quote strings. The middle can include the quoted quotes that are escaped. I wrote a quote string regular expression that meets this requirement, in the form of:/([' "]) (?: \ \['"]| [^\\1]) *?\1/. However, since it is only a part of a huge machine, the risk of writing it is extremely high. First, it uses a reverse reference, and second, the reverse reference uses a global reverse reference number. I wrote a function that automatically generates a global number to solve this problem. But the details here are not too deep. We should talk about the frame first, and then the details. Should not be caught in the details of the ocean.

◦ List: The list is a form such as (1, 3, 4) or ("It", "HR", "R&d"), which is connected by a comma with a simple variable, with parentheses on both sides. The list's single item is represented by I, which represents a number | string. At this point, the list becomes:/\ (I (?:, i) *?\)/. It represents an opening parenthesis, an I, a series of other list items (0 or more) consisting of commas, I, and a closing parenthesis. Whitespace characters are not considered for simplicity.
• At this point, you can summarize the regular frame of a single statement: S =~/a OP b/i. s represents a single statement here.
More complex is multiple statements, which can be composed of a single statement, with and or or connected in the middle. The task is completed by reasonably constructing a single statement, which is stably compiled into multiple statements.

Following the example above, with s representing a single statement, then compound statement C is C =~ S (?:(?: O R|and) S) *?/. At this point, a shape of the conditional statement parser was born. Here's a step-by-step implementation of Python, for example.

Python implementation
Repeat: Although given the implementation, but still please pay attention to the idea, ignoring the code.
Copy Code code as follows:

#!/usr/bin/python
#-*-Coding:utf-8-*-
#
#author: Rex
#blog: http://iregex.org
#filename test.py
#created: 2010-08-06 17:12

#generage quoted string;
#including ' and ' string
#allow \ ' and \ ' inside
Index=0
Def gen_quote_str ():

Global index
Index+=1
CHAR=CHR (96+index)
return R "" "(?) P<quote_%s>[' "]) (?: \ \['"]| [^'"]) *? (? p=quote_%s) "" "% (char, char)


#simple variable
Def a ():
return R ' [\w. ']+ '

#operators
Def op ():
Return R ' (?: [<>=]{1,2}| between| like| IN) '


#list item within (,)
#eg: ' A ', a.b, "asdfasdf\" AASDF "
DEF item ():
Return R "(?:%s|%s)"% (A (), Gen_quote_str ())


#a complite list, like
#eg: ("regex", "is", "good")
def items ():
return r "" "\ (\s*
%s
(?:, \s*%s) * \s*
\) "" "% (item (), item ())

#simple comparison
#eg: a=15, b>23
def s ():
return R "" "%s \s*%s \s* (?: \ w+| %s | %s) "" "% (A (), OP (), Gen_quote_str (), items ())

#complex comparison
# name like ' zhang% ' and age>23 and work in ("HR", "it", ' r&d ')
Def c ():
return R "" "
(? ix)%s
(?:\ s*
(?: and|or) \s*
%s \s*
)*
"" "% (s (), S ())

Print "A:\t", A ()
Print "Op:\t", OP ()
Print "Item:\t", ITEM ()
Print "Items:\t", ITEMS ()
Print "S:\t", S ()
Print "C:\t", C ()

The result of this code running on my machine (Ubuntu 10.04, Python 2.6.5) is:
Copy Code code as follows:

A: [\w. ']+
OP: (?: [<>=]{1,2}| between| like| IN)
ITEM: (?: [\w. ']+|? P<quote_a>[' "]) (?: \ \['"]| [^'"]) *? (? P=QUOTE_A))
ITEMS: \ (\s*
(?: [\w. ']+| (? P<quote_b>[' "]) (?: \ \['"]| [^'"]) *? (? P=quote_b))
(?:, \s* (?: [\w.]]+| (?) P<quote_c>[' "]) (?: \ \['"]| [^'"]) *? (? P=quote_c))) * \s*
\)
S: [\w. ']+ \s* (?: [<>=]{1,2}| between| like| IN) \s* (?: \ w+| (? P<quote_d>[' "]) (?: \ \['"]| [^'"]) *? (? P=quote_d) | \ (\s*
(?: [\w. ']+| (? P<quote_e>[' "]) (?: \ \['"]| [^'"]) *? (? P=QUOTE_E))
(?:, \s* (?: [\w.]]+| (?) P<quote_f>[' "]) (?: \ \['"]| [^'"]) *? (? P=quote_f))) * \s*
\) )
C:
(? ix) [\w. ']+ \s* (?: [<>=]{1,2}| between| like| IN) \s* (?: \ w+| (? P<quote_g>[' "]) (?: \ \['"]| [^'"]) *? (? P=quote_g) | \ (\s*
(?: [\w. ']+| (? P<quote_h>[' "]) (?: \ \['"]| [^'"]) *? (? P=quote_h))
(?:, \s* (?: [\w.]]+| (?) P<quote_i>[' "]) (?: \ \['"]| [^'"]) *? (? p=quote_i))) * \s*
\) )
(?:\ s*
(?: and|or) \s*
[\w. ']+ \s* (?: [<>=]{1,2}| between| like| IN) \s* (?: \ w+| (? P<quote_j>[' "]) (?: \ \['"]| [^'"]) *? (? P=quote_j) | \ (\s*
(?: [\w. ']+| (? P<quote_k>[' "]) (?: \ \['"]| [^'"]) *? (? P=quote_k))
(?:, \s* (?: [\w.]]+| (?) P<quote_l>[' "]) (?: \ \['"]| [^'"]) *? (? p=quote_l))) * \s*
\)) \s*
)*

See the matching Effect chart:



an arithmetic expression

I remember just saying, "For simplicity's sake, the arithmetic expression is not considered here." However, parsing arithmetic expressions is a very interesting topic, as long as it is an algorithmic book, it will be mentioned (infix expression prefix expression, and so on). Of course it can also be described by using regular expressions.

The main ideas are:

Copy Code code as follows:

Expr-> expr + term | Expr-term | Term
Term-> term * factor | Term/factor | Factor
Factor-> Digit | (expr)

and code:
Copy Code code as follows:

#!/usr/bin/python
#-*-Coding:utf-8-*-
#
#author: Rex
#blog: Http://jb51.net
#filename math.py
#created: 2010-08-07 00:44

Integer=r "\d+"

Factor=r '%s (?: \.%s) '% (integer, integer)

Term= '%s (?: \s* [*/] \s*%s) * "% (factor, factor)

Expr= "(? x)%s (?: \s* [+-] \s*%s) *"% (term, term)

Print expr

Take a look at its output and the matching effect chart:


Tips

• If you do not use complex regular formula to solve the problem, it must not be used.
• If you must write a more complex regular formula, refer to the following guidelines.
• From big picture, first understand what the overall structure of the text to be parsed is, divided into small parts;
• From a small point of view, trying to achieve each of the widgets, and strive to each part is complete, strong, and put in the overall will not conflict.
• Assemble these parts properly.
• Divide and conquer benefits: only a module error, the other part is correct, you can quickly locate errors, eliminate bugs.
• Use capture brackets carefully, unless you know what you're doing, know what side effects it will have, and whether there are feasible solutions. For short regular expressions, one or two extra parentheses are harmless, but for complex regular expressions, a pair of extra parentheses can be a fatal error.
• Use Free-space mode as much as possible. At this point you are free to add comments and whitespace characters to improve the readability of regular expressions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.