Comprehensive Analysis of Linux Regular Expressions (6)

Last Update:2013-12-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Through the introduction in the previous articles, we should have a relatively comprehensive understanding of Linux regular expressions. How should we use Linux regular expressions, now let's take a look at a tool that is easy to test when learning regular expressions, and introduce some ideas for writing regular expressions.

A convenient tool for learning Regular Expressions

The best way to learn regular expressions is of course practice. Although there are many tools that support regular expressions, it is not very convenient to do exercises only.
Here I recommend a dedicated Regular Expression writing test tool, PHPEdit Regular Expression Editor. This is a free software mainly used to debug Perl Compatible Regular Expression functions used by PHP. You can easily enter the target string and regular expression, and view the matching result in real time. You can download this tool from its download page.
The interface of the program is very concise, but it is found that some of its functions seem to be faulty. Only the functions of preg_match_all and preg_replace are normal. In addition, do not add the pattern delimiters in the match mode input box, the program parses all the content in the input box as a mode.
Fortunately, as a regular expression exercise tool, its function is sufficient.
All the examples mentioned in this article can be tested in it, input the mode in the top box, and write the target string into the input box in the middle, click "run the regxwp" to get the matching result below.

Ideas for writing regular expressions

A tips to avoid too many matches
We have already talked about the problem that writing unreasonable Regular Expressions causes too many matches. The problem is how to avoid similar situations as much as possible. Here is a little trick.
If you find that your custom mode matches too many results, a good way is to change your mind. Instead of considering what needs to be matched in the next step of my mode, consider what needs to be avoided in the next step of my mode. We can use metacharacters to answer "^" and character classes to easily achieve this effect, which often produces more accurate matching.
To illustrate the benefits of this idea, we will first give an example that has nothing to do with regular expressions. to consider such a problem, the probability of throwing a dice at a time is 1/6. If you are expected to throw six times, what is the probability of throwing a 6?
Some people may calculate this. The probability of one time is 1/6, and the probability of six times is 6 1/6, which is equal to 1. This result is obviously wrong. Although you throw six times, you cannot guarantee that you will throw a six. It seems a little difficult to solve this question from a positive idea.
If we change our thinking, the solution will be much clearer. We can change the question to this question. If you throw six dice, what is the probability that each round cannot throw six? This problem is much easier to solve. According to the multiplication principle of probability, the probability of throwing a number not 6 at a time is 5/6, in six times, the probability that each time is not 6 is 5/6 to the power of 6, which is about equal to 33%. Then, we can get the answer we need by subtracting 1 from this number.
You can regard the matching of each part of the model as a dice throwing process. The matching probability of each part is very similar to that of the above example.
How to Improve the parsing efficiency of Regular Expressions
For regular expressions that match the same content, some modes are more efficient than others. For a simple example, the use of the character class "[aeiou]" is better than the use of the Branch selection mode "(a | e | I | o | u ). "More Effective. Generally, it is more efficient to use the simplest and basic mode as possible.
Use nested infinite repeated quantifiers as much as possible with caution. parsing strings may take a considerable amount of time when an unmatched target string is encountered. For example, in the following pattern segment "(a +) *", when the target string "aaaa" is not matched, the parser will try 33 different matching methods for it, this number will increase dramatically as the length of the unmatched string increases.
Some regular expression tools have optimized some specific pattern matching to improve efficiency, understand what optimizations have been made to your regular expression work and try to use the optimized mode to greatly improve the efficiency of your regular expression execution. For example, PHP has optimized the parsing of a pattern such as/a +) * B/. When the pattern ends with a definite character, the parser first checks whether the end of the Target matches the pattern. If not, it immediately returns the failed matching result and stops parsing. If the above style is changed to "(a +) * d", because the end is no longer a definite character, this mode will be parsed according to the normal process. If you want to see the differences between the two, set the target string to 25 lowercase a characters in the tool we mentioned earlier, and then test the two modes respectively, the former immediately ends, while the latter needs to wait for about one second (the author uses the XP1700 + processor ).
In addition to using optimized models as much as possible, restructuring some models can also greatly improve efficiency. This is a good example of using backward assertions in combination with the one-time subpattern to match the ending characters.
Here we are going to end this tutorial. Due to the length and level constraints, there may be many omissions in the article, and I would like to ask for your understanding. The most comprehensive introduction to Regular Expressions is some Perl-related documents and works. For more information about Regular Expressions, see the Mastering Regular Expressions book written by Jeffrey Friedl, there are many examples. However, after learning about the basic concepts of regular expressions, I think it is more practical to carefully read the related sections of regular expressions in the tools I often use, I hope you can better understand the use of regular expressions in practice.

Linux Regular Expression 1)
Linux regular expression 5)
Detailed analysis of ten aspects of Linux Server Security Protection
How to handle Linux crashes
Common LinuxYUM commands

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Comprehensive Analysis of Linux Regular Expressions (6)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support