Regular Expressions-Getting Started

Source: Internet
Author: User
Tags alphabetic character lowercase

Write the program for 7 years, from the people around me to see, quite a lot of programmers see the regular expression is very advanced, and even think it is a gap can not be crossed. The reason for this is simple: just because you don't spend a couple of hours learning the basics of the regular. Yes, it only takes a few hours, and you can write the regular expressions you need. In order to guide these friends who dare not touch the regular, I wrote this regular introductory article in particular, hoping to help you.


What the regular expression is.
A more formal explanation is that a regular expression uses a single string to describe and match a series of strings that conform to a certain syntactic rule.
Here, I would like to use a more popular natural language to describe it: Regular expressions are used to describe a large number of characters in a class of characters collectively. For example, in the case of a Chinese character, it covers all the Chinese characters.
Now do you see what it is? Still don't understand. It doesn't matter, keep looking down, and the text will help you understand the regular by using some simple examples.


Why do you use regular expressions.
It can be said that the regular expression can do, can be achieved through normal programming. So why do we have to learn the regular? The reason is simple:
1 The regular expression can greatly simplify the code, and realize more conveniently;
2 using regular expressions to handle strings, the code is easier to understand;
3 Usually, regular expressions are much higher in speed than their own writing logic;


How the regular expression is to be used.
How you use regular expressions depends on what programming language you use, and we'll look at the familiar JavaScript.

var reg = new RegExp ("^[a-z]+$"); Can also be written as: var reg =/^[a-z]+$/;

^: Indicates the beginning of a string
[A-Z]: denotes any lowercase letter
+: Indicates that the front letter appears at least 1 times, does not cap
$: Indicates end of string


Application One: Reg.test ("ABCD")//true
All lowercase letters from beginning to end, all matches are successful, return true


Application Two: Reg.test ("8ddde")//false
The match failed because the beginning is not a letter, return false


Let's look at how it is used in C # (first to refer to the namespace: System.Text.RegularExpressions).
Regex reg = new Regex ("^[a-z]+$");
Reg. IsMatch ("ABCD"); True
Reg. IsMatch ("8ddde"); False


The preceding list is used to detect whether a string is the expected rule, and now we use the regular to get what is needed in a large string.
var str = "Regular expression (Regular Expression) is a logical formula for string manipulation"
var reg =/[a-za-z]+/g;//finally Add "G" to find all eligible, without "G" representation to find the first eligible
var result = Str.match (reg);//returns result is an array containing all found content
//result[0]: Regular
//result[1]: Expression


and see how it is implemented in C #.
String str = "Regular expression (Regular Expression) is a logical formula for string manipulation"
Regex reg = new Regex ("[a-za-z]+");
MatchCollection result = Reg. Matches (str);
foreach (Match m in result) {
    Console.WriteLine (m.value);
}
Output:
Regular
Expression


metacharacters in regular expressions


To write a regular expression, be sure to know which characters are available in the expression, and what meaning to represent. This is like "human" to represent the yellow, white, black and so on. All the metacharacters and descriptions are listed below.

Metacharacters Describe
\ Marks the next character as a special character, or a literal character, or a backward reference, or a octal escape character. For example, "\ n" matches a newline character. "\\n" matches the character "n". The sequence "\ \" matches "\" and "\ (matches" ().
^ Matches the start position of the input string. If the multiline property of the RegExp object is set, ^ also matches the position after "\ n" or "\ r".
$ Matches the end position of the input string. If the multiline property of the RegExp object is set, the $ also matches the position before "\ n" or "\ r".
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, "zo+" can Match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)?" You can match ' do ' in ' does ' or ' does '.
N n is a non-negative integer. Matches the determined n times. For example, "o{2}" cannot match "O" in "Bob", but can match two o in "food".
{N,} n is a non-negative integer. Match at least n times. For example, "o{2,}" cannot match "O" in "Bob", but can match all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".
{N,m} M and n are non-negative integers, of which n<=m. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". "o{0,1}" is equivalent to "O?". Notice that there is no space between the comma and the two number.
? When the character is immediately following any other qualifier (*,+,?,{n},{n,},{n,m}), the match pattern is not greedy. Non-greedy patterns match as few strings as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "Oooo", "O?" A single "O" will be matched, and "o+" will match all "O".
. Point Matches any single character except "\ n". To match any character including "\ n", use a pattern like "[\s\s]".
(pattern) Match pattern and get this match. The obtained matches can be obtained from the resulting matches collection, use the Submatches collection in VBScript, and use the $0...$9 property in JScript. To match the parentheses character, use "\ (" or "\)".
(?:p Attern) Matches pattern but does not get a matching result, which means it is a non fetch match and is not stored for later use. This is in use or the character "(|)" It is useful to combine parts of a pattern. For example, "Industr (?: y|ies)" is an expression more abbreviated than "Industry|industries".
(? =pattern) Forward positive check, match the lookup string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, the Windows (? =95|98| nt|2000) "Can match windows in Windows2000, but cannot match windows in Windows3.1." It does not consume characters, that is, after a match occurs, the next matching search begins immediately after the last match, instead of starting after the character that contains the pre-check.
(?! Pattern Forward negation, which matches the lookup string at the beginning of any string that does not match the pattern. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, Windows (?! 95|98| nt|2000) "Can match windows in Windows3.1, but cannot match windows in Windows2000."
(? <=pattern) The reverse positive check is similar to positive, but in the opposite direction. For example, "(? <=95|98| nt|2000) Windows can match "Windows" in "2000Windows", but it does not match "windows" in "3.1Windows".
(? <!pattern) Reverse negation is similar to positive negation, except in the opposite direction. For example, "(? <!95|98| nt|2000) Windows can match "Windows" in "3.1Windows", but it does not match "windows" in "2000Windows".
X|y Match x or Y. For example, "Z|food" can match "Z" or "food". "(z|f) Ood" matches "Zood" or "food".
[XYZ] Character set combination. Matches any one of the characters contained. For example, "[ABC]" can Match "a" in "plain".
[^XYZ] Negative character set combination. Matches any characters that are not included. For example, "[^ABC]" can match "Plin" in "plain".
[A-z] The range of characters. Matches any character within the specified range. For example, "[A-z]" can match any lowercase alphabetic character in the range "a" through "Z". Note: The range of characters can be expressed only when hyphens are within a group of characters, and between two characters. If the beginning of a group of characters, only the hyphen itself is represented.
[^a-z] Negative character range. Matches any character that is not in the specified range. For example, "[^a-z]" can match any character that is not in the range "a" through "Z".
\b Matches a word boundary, which is the position between the word and the space. For example, "er\b" can Match "er" in "never", but cannot match "er" in "verb".
\b Matches a non-word boundary. "er\b" can Match "er" in "verb", but cannot match "er" in "Never".
\cx Matches the control character indicated by X. For example, \cm matches a control-m or carriage return character. The value of x must be one-a-Z or a-Z. Otherwise, c is treated as a literal "C" character.
\d Matches a numeric character. equivalent to [0-9].
\d Matches a non-numeric character. equivalent to [^0-9].
\f Matches a page feed character. Equivalent to \x0c and \CL.
\ n Matches a line feed character. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s Matches any non-white-space character. equivalent to [^ \f\n\r\t\v].
\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.
\w Matches any word character that includes an underscore. Equivalent to "[a-za-z0-9_]".
\w Matches any non word character. Equivalent to "[^a-za-z0-9_]".
\xn Matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be a determined two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04&1". ASCII encoding can be used in regular expressions.
\num Matches num, where num is a positive integer. A reference to the match that was obtained. For example, "(.) \1 "matches two consecutive identical characters.
\ n Identifies a octal escape value or a backward reference. n is a backward reference if you have at least n obtained subexpression before \ nthe. Otherwise, if n is an octal number (0-7), then N is an octal escape value.
\nm Identifies a octal escape value or a backward reference. NM is a backward reference if at least NM has obtained the subexpression before \nm. If there are at least N fetches before \nm, then n is a backward reference followed by a literal m. If all the preceding conditions are not satisfied, if both N and M are octal digits (0-7), then \nm will match octal escape value nm.
\nml If n is an octal number (0-7) and both M and L are octal digits (0-7), the octal escape value NML is matched.
\un Matches n, where N is a Unicode character represented in four hexadecimal digits. For example, \u00a9 matches the copyright symbol (©).

Author: Zhu Huazhen

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.