Reading Notes-proficient in regular expressions-Chapter 2nd-Getting Started examples and extensions-2

Source: Internet
Author: User

5. Use the surround view function to add a comma for the value

Four types of Surround View:
· Certainly reverse view (? <=...) The subexpression can matchLeft sideText
· Negative Reverse view (? <! ...) The subexpression cannot match.Left sideText
· Certainly sequential view (? =...) The subexpression can matchRight sideText
· Negative sequential view (?! ...) The subexpression cannot match.Right sideText
For the affirmative sequence, view the text from left to right and try to match the subexpression. If the subexpression can match, a successful match message is returned. Example :(? =/D) indicates that if the currentLocationIf the character on the right is a number, the match is successful. You can view the text from right to left in reverse order.
It should be noted that the four types of Surround view match positions rather than a character. For example, for the string "Jeffrey", expression "(? = Rey) "indicates the matching of rey on the right, that is, the location between Jeff and ref. If the string modification function is used for matching the position, the string information is added.
$ Value = ~ S /(? : \ D )(? : \ D) + $/, the/g matching position is: the left side is a number, and the right side to the end is exactly a multiple of 3.
Take the number 12345678 as an example: the first matching to the position between 2 and 3, 12,345,678, the second matching to the position between 5 and 6.
There is a matching $ value = ~ S/(\ d) + $/$1, $2/g. Note that this expression matches the entire string 12345678 for the first time, $1 = 12, $2 = 345678, modified to 12, 345678, because the first direct match to the entire string, so the match is complete, here there is a lot different from the loop match, the surround view match matches only that position.

6. Text-to-HTML Conversion

Through this example, we can find that the original regular expression can still be written incorrectly. The Code is as follows:

Undef $/; # enter the "File Read" Mode
$ Text = <>; # Read the first file name specified in the command line.
$ Text = ~ S/& amp;/g; # set the basic HTML
$ Text = ~ S/</& lt;/g; # characters &, <and>
$ Text = ~ S/>/& gt;/g; # Escape HTML

$ Text = ~ S/^ \ s * $/<p>/mg # divide paragraphs

# Converting to link form
$ Text = ~ S {
\ B
# Save the address to $1
{
\ W [-. \ w] * # username
\@
[-A-z0-9] + (\. [-a-z0-9] +) * \. (COM | Edu | info) # hostname
}
\ B
}{ <A href = "mailto: $1" >1 1 </a>} gix

Print $ text; # display HTML text

· $ Text = ~ S/^ \ s * $/<p>/Mg # divide paragraphs
/G is a global match
/M is an enhanced row anchor, that is, ^ and $ will switch from string mode to logical row mode. That is, for string mode, a text can only have one start ^ and one end $, however, in logical row mode, each row has its own start ^ and end $.
· $ Text = ~ S {RegEx} {replacement} Modifier
In fact, Perl supports user-defined delimiters. The default Delimiter is S /... /... /, which can also be defined as s {...} {...}, or S |... |... |
For s {...} {...} because the delimiter {} is not selected, "/" is no longer used as the delimiter. Therefore, "/" in </a> can be used directly without escaping.
·/X modifier
The regular expression in the code will be very long if it is not wrapped in a line break. It is difficult to read and annotate it. The/x modifier enables users to orchestrate this expression in a "loose arrangement" to enhance readability. In addition, annotations marked with # can appear in expressions. After the/x modifier is added, the blank characters in the expression become "ignore itself" metacharacters. If you want to use common characters such as spaces, you can escape them. If you do not escape them, they are ignored metacharacters. In addition, \ s can always match blank characters, this is not changed.
· CONCLUSION
Because there are many HTML tags such as </a>, you can use s {...} To save escape /.
When the regular expression is long,/X enables the expression to wrap and add comments.
When a logical row needs to be matched,/M enables ^ and $ to match the start and end of the logical row.

7. Process duplicate words

You must highlight the words that repeatedly appear in each sentence. If a row contains duplicate words, mark the file where the duplicate words appear at the beginning of the row.

$/= ". \ N"; # Set the special "Block Mode". The end of a piece of text is a combination of a dot and a line number.

While (<>)
{
Next unless {
# Match a word:
\ B # Start position of a word
([A-Z] +) # store words in $1
# Match any number of blank characters and/or tags behind a word
(# Blank storage in $2
(? :
\ S # white space characters
| # |
<[^>] + >#< Tag> format tag
) + # At least once
)
# Matching the first word again
(\ 1 \ B)
}
# The above is a regular expression, the following is a replacement string, followed by modifiers,/I,/g, and/x
{\ E [7 M $1 \ e [M $2 \ e [7 M $3 \ e [m} igx
S/^ (? : [^ \ E] * \ n) + // mg; # Remove all unmarked rows
S/^/$ argv:/mg; # Add a file name at the beginning of each line
Print;
}

· $ \ = ". \ N"
$ \ Is a special variable. A line is determined by a line break, for example, \ n. In this fast mode, it is determined by. \ n. Such a sentence may span multiple rows.
· While (<>) and print
<> The string can be assigned to a special variable, and the variable stores the default strings S/.../and print.
· Next unless indicates that if the current code block is not executed, the code below the code block does not need to be executed, similar to if () {} else continue; (C ++)
· {\ E [7 m $1 \ e [m $2 \ e [7 m $3 \ e [m} igx
\ E [7 m is the starting mark of the highlighted character, and \ e [m is the ending mark of the highlighted character.
· S/^ (? : [^ \ E] * \ n) + // mg;
Because the highlighted code is embedded in the rows with duplicate words, you only need to find the rows without highlighted words, but it will be replaced with null to delete the row without repeated words.
· S/^/$ ARGV:/mg; # Add a file name at the beginning of each line
ARGV provides the name of the input file.
· CONCLUSION
The code is directly from the book. It cannot be guaranteed whether the code can run normally without actual tests. However, the Code itself introduces a lot of knowledge, especially the/m and/x modifiers. With the previous/g and/I modifiers, the knowledge of regular expressions is deepened.

8. Summary

The second chapter uses some examples to show a lot of specific regular expressions, and also introduces PERL. Now I have learned a lot: metacharacter meaning, escape, $ text = ~ M/... $ text = ~ S/..., modifier:/I,/g,/m,/x, surround view function and so on.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.