Use regular expressions to delete annotation methods in Python

Last Update:2017-01-13 Source: Internet

Author: User

Tags regular expression first row in python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

About multiple line matches in Python non-greedy, multiline matching regular expression examples
Tips for some regular:

The code is as follows

Copy Code

1 Non-greedy flag

>>> Re.findall (R "A (d+?)", "a23b")
[' 2 ']
>>> Re.findall (R "A (d+)", "a23b")
[' 23 '] Note the comparison of this situation:

>>> Re.findall (R "A (d+) b", "a23b")
[' 23 ']
>>> Re.findall (R "a" (d+?) B "," a23b ")
[' 23 ']2 if you want to match multiple lines, then add re. S and RE.M logo
Re. S:. Will match line breaks, default. Do not match newline characters

>>> Re.findall (R "A (d+) b.+a (d+) b", "a23bna34b")
[]
>>> Re.findall (R "A (d+) b.+a (d+) b", "a23bna34b", re. S
[(' 23 ', ' 34 ')]
>>>re. The m:^$ flag will match each row, and the default ^ and $ will only match the first row

>>> Re.findall (r "^a (d+) b", "a23bna34b")
[' 23 ']
>>> Re.findall (r "^a (d+) b", "a23bna34b", re. M
[' 23 ', ' 34 '] but, if there is no ^ sign,

>>> Re.findall (R "A (d+) b", "a23bna23b")
[' 23 ', ' 23 '] visible, is not required re. M

popularity:52%

This problem, as I have said before, is to use [Ss] to simulate the point number of a line break. The original is here: "DIY Universal Wildcard". You can write out such JavaScript code to eliminate multiline annotations in this way:

The code is as follows	Copy Code
To uncomment C-style multiple line comment function Uncomment_multi (str) { Return Str.replace (//[ss]?*//g, ""); }

JavaScript implementation of Single-line annotations (imperfect)
Single-line annotations are not as simple as you might think. If you think that as long as str.replace ("//.*$"), then you must ensure that the text to be processed is the simplest, as follows:

The code is as follows	Copy Code
var pig= "ASE"; This is a comment.

In fact, this is not workable. The following examples abound in real-world programs:

The code is as follows	Copy Code
var url= "http://iregex.org"; This is my site. var url= "//not Real comment here http://iregex.org"; This is my site.

I tried to use JavaScript to write a mock negative-looking function that could handle http://, but the function doesn't look pleasing, and it can't handle a double slash in quotes. I'm really disappointed with the simple nature of JavaScript's regular support. So I turned to Perl to do the job. Let's take a look at the JavaScript I wrote. function to delete single-line annotation:

The code is as follows

Copy Code

Perl version Delete comments and source code (relatively perfect)
Text to be tested
Well, now that you've sacrificed a mighty Perl, just go ahead and do the same. I will use the following relatively complex text to validate my program:

The code is as follows	Copy Code
<! DOCTYPE h/tml Public "-//w3c//dtd XHTML" 1.0 transitional//en "" http://www.w3.org/TR/xhtml1/DTD/ Xhtml1-transitional.dtd "> Sdfasdf//real comment here//"

Careful analysis of the characteristics of single-line annotations
The correct analysis of its characteristics is to write a reasonable and efficient procedure of the premise. Observation shows that the characteristics of single-line annotation are as follows:

1. Double slashes within quotation marks (including single and double quotes) do not count as annotations.
2. The quotation marks are paired, and the quotation marks that are escaped by the backslash between the two quotation marks do not count as Terminator. For example, "Hello"//world, the//world part of this section cannot be counted as a comment.
3. A string that consists of consecutive, non-quotes, non-diagonal sections is not a comment. In particular, a single slash cannot be counted as a comment. Why does the first half not only have quotes but not slashes? Since [^ ' "]+ is likely to be wrongly matched abcde//real comment" quoted string in comment ", we generalize a condition [^ ']/]+; Also, because of the need to avoid abcde/real comment "quoted string in comment", it is also necessary to specifically add that a single slash is not a comment. The regular formula is [^ ']/]| (? <!/)/(?! /).
4. In addition to the above, starting with a double slash until the end of the line is the annotation. Because we use the concept of end-of-line, we need to specifically indicate in the regular formula that ^$ matches the line at the end of a row. Use//m to represent.
Regular implementation

The code is as follows	Copy Code
#!/usr/bin/perl-w $str = << "EOF"; <! DOCTYPE h/tml Public "-//w3c//dtd XHTML" 1.0 transitional//en "" http://www.w3.org/TR/xhtml1/DTD/ Xhtml1-transitional.dtd "> Sdfasdf//real comment here//" Eof #print $str; if ($str =~ m% ^ (?: [^'"/]\| (? <!/)/(?! /)\| (? <quote>[' "]) (?: \ g{quote}\| (?! G{quote}). * G{quote} )* (? <comment>//.*) $ %XM) { Print $+{comment}; }

A few Additions
• The program will run successfully in the perl5.10 version. Because of the use of named Capture (? <quote>[' "]), this compares higher-order features. Of course, do not use 5.10 is not no way, we can use numbered capture, just look more not intuitive bale.
• After the match is finished, named captures are saved in the hash table%+. This can be easily invoked using the print $+{comment}.
• X mode is specified to add white space characters and line wraps so that regular expressions look layered. In fact, it is extremely unwise to use X-mode for complex regular expressions.
• Heredoc is used in order to easily represent single double quotes in a string. Personally feel less convenient than the triple quotes of Python.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More