Use regular expressions to delete annotation methods in Python

Source: Internet
Author: User
Tags regular expression first row in python

About multiple line matches in Python non-greedy, multiline matching regular expression examples
Tips for some regular:

The code is as follows Copy Code

1 Non-greedy flag

>>> Re.findall (R "A (d+?)", "a23b")
[' 2 ']
>>> Re.findall (R "A (d+)", "a23b")
[' 23 '] Note the comparison of this situation:

>>> Re.findall (R "A (d+) b", "a23b")
[' 23 ']
>>> Re.findall (R "a" (d+?) B "," a23b ")
[' 23 ']2 if you want to match multiple lines, then add re. S and RE.M logo
Re. S:. Will match line breaks, default. Do not match newline characters

>>> Re.findall (R "A (d+) b.+a (d+) b", "a23bna34b")
[]
>>> Re.findall (R "A (d+) b.+a (d+) b", "a23bna34b", re. S
[(' 23 ', ' 34 ')]
>>>re. The m:^$ flag will match each row, and the default ^ and $ will only match the first row

>>> Re.findall (r "^a (d+) b", "a23bna34b")
[' 23 ']
>>> Re.findall (r "^a (d+) b", "a23bna34b", re. M
[' 23 ', ' 34 '] but, if there is no ^ sign,

>>> Re.findall (R "A (d+) b", "a23bna23b")
[' 23 ', ' 23 '] visible, is not required re. M

popularity:52%


This problem, as I have said before, is to use [Ss] to simulate the point number of a line break. The original is here: "DIY Universal Wildcard". You can write out such JavaScript code to eliminate multiline annotations in this way:

The code is as follows Copy Code
To uncomment C-style multiple line comment
function Uncomment_multi (str)
{
Return Str.replace (//*[ss]*?*//g, "");
}

JavaScript implementation of Single-line annotations (imperfect)
Single-line annotations are not as simple as you might think. If you think that as long as str.replace ("//.*$"), then you must ensure that the text to be processed is the simplest, as follows:

The code is as follows Copy Code

var pig= "ASE"; This is a comment.

In fact, this is not workable. The following examples abound in real-world programs:

The code is as follows Copy Code
var url= "http://iregex.org"; This is my site.
var url= "//not Real comment here http://iregex.org"; This is my site.

I tried to use JavaScript to write a mock negative-looking function that could handle http://, but the function doesn't look pleasing, and it can't handle a double slash in quotes. I'm really disappointed with the simple nature of JavaScript's regular support. So I turned to Perl to do the job. Let's take a look at the JavaScript I wrote. function to delete single-line annotation:

The code is as follows Copy Code
  function Uncomment_single (str)
{
    var result;
    var single=new RegExp ("//;", "IG");
    var start=0;
    while (result=single.exec (str)!=null)
    {
         var part=str.slice (start,result.index);
        var negleft=new RegExp ("http:$", "I");
        if (! negleft.test (part))
         {
            return Str.slice (0,result.index);
       }
        start=result.index+ Result[0].length-1;
   }
    return str;
}

Perl version Delete comments and source code (relatively perfect)
Text to be tested
Well, now that you've sacrificed a mighty Perl, just go ahead and do the same. I will use the following relatively complex text to validate my program:

The code is as follows Copy Code
<! DOCTYPE h/tml Public "-//w3c//dtd XHTML" 1.0 transitional//en "" http://www.w3.org/TR/xhtml1/DTD/ Xhtml1-transitional.dtd "> Sdfasdf//real comment here//"

Careful analysis of the characteristics of single-line annotations
The correct analysis of its characteristics is to write a reasonable and efficient procedure of the premise. Observation shows that the characteristics of single-line annotation are as follows:

1. Double slashes within quotation marks (including single and double quotes) do not count as annotations.
2. The quotation marks are paired, and the quotation marks that are escaped by the backslash between the two quotation marks do not count as Terminator. For example, "Hello"//world, the//world part of this section cannot be counted as a comment.
3. A string that consists of consecutive, non-quotes, non-diagonal sections is not a comment. In particular, a single slash cannot be counted as a comment. Why does the first half not only have quotes but not slashes? Since [^ ' "]+ is likely to be wrongly matched abcde//real comment" quoted string in comment ", we generalize a condition [^ ']/]+; Also, because of the need to avoid abcde/real comment "quoted string in comment", it is also necessary to specifically add that a single slash is not a comment. The regular formula is [^ ']/]| (? <!/)/(?! /).
4. In addition to the above, starting with a double slash until the end of the line is the annotation. Because we use the concept of end-of-line, we need to specifically indicate in the regular formula that ^$ matches the line at the end of a row. Use//m to represent.
Regular implementation

The code is as follows Copy Code
#!/usr/bin/perl-w
$str = << "EOF";
<! DOCTYPE h/tml Public "-//w3c//dtd XHTML" 1.0 transitional//en "" http://www.w3.org/TR/xhtml1/DTD/ Xhtml1-transitional.dtd "> Sdfasdf//real comment here//"
Eof
#print $str;
if ($str =~
m%
^
(?:
[^'"/]|
(? <!/)/(?! /)|
(? <quote>[' "])
(?: \ g{quote}|
(?! G{quote}). *
G{quote}
)*
(? <comment>//.*)
$
%XM)
{
Print $+{comment};
}

A few Additions
• The program will run successfully in the perl5.10 version. Because of the use of named Capture (? <quote>[' "]), this compares higher-order features. Of course, do not use 5.10 is not no way, we can use numbered capture, just look more not intuitive bale.
• After the match is finished, named captures are saved in the hash table%+. This can be easily invoked using the print $+{comment}.
• X mode is specified to add white space characters and line wraps so that regular expressions look layered. In fact, it is extremely unwise to use X-mode for complex regular expressions.
• Heredoc is used in order to easily represent single double quotes in a string. Personally feel less convenient than the triple quotes of Python.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.