Brother Lian Learning python----regular expression matching rules

Source: Internet
Author: User

Regular Expressions-match rules the basic pattern matches everything from the very basic to the beginning. Patterns, which are the most basic elements of regular expressions, are a set of characters that describe the character of a string. Patterns can be simple, consist of ordinary strings, or can be very complex, often with special characters representing a range of characters, repeating, or representing context. For example: ^once This pattern contains a special character ^, which indicates that the pattern matches only those strings that begin with once. For example, the pattern matches the string "Once Upon a Time" and does not match "there once is a man from NewYork". Just as the ^ symbol represents the beginning, the $ symbol is used to match strings that end in a given pattern. bucket$ This pattern matches the "who kept all of this cash in a bucket" and does not match "buckets". The characters ^ and $ are used together to indicate exact matches (the string is the same as the pattern). For example: ^bucket$ only matches the string "bucket". If a pattern does not include ^ and $, then it matches any string that contains the pattern. For example: pattern once with string there once is a man from newyorkwho kept all of his cash in a bucket. is a match. The letters in the pattern (O-N-C-E) are literal characters, that is, they represent the letter itself, and the numbers are the same. Some other slightly more complex characters, such as punctuation and white characters (spaces, tabs, etc.), are used to escape sequences. All escape sequences begin with a backslash (\). The escape sequence for a tab is: \ t. So if we're going to check if a string starts with a tab, you can use this pattern: ^\t similar to \ n for "new line", \ r for Enter. Other special symbols can be used in front with a backslash, such as the backslash itself with \ \, period. Use \. To indicate, and so on. Character clusters in programs in the Internet, regular expressions are often used to validate the user's input. When the user submits a form, to determine whether the input phone number, address, email address, credit card number, etc. is valid, with ordinary literal-based characters is not enough. So to use a more liberal way of describing the pattern we want, it's a character cluster. To create a character cluster that represents all vowel characters, place all the vowels in a square bracket: [Aaeeiioouu] This pattern matches any vowel character, but only one character. The hyphen can represent a range of characters, such as: [A-z]//Match all lowercase letters [a-z]//Match all uppercase letters [A-ZA-Z]//Match all the letters [0-9]//Match all numbers [0-9\.\-]//Match all numbers, periods and minus signs [\f\r\t\n]//Match all white characters the same, these also represent only one character, which is a very important one. If you want to match a string consisting of a lowercase letter and a single digit, such as "Z2", "T6" or "G7", but not "ab2", "r2d3", or "B52", use this pattern: ^[a-z][0-9]$ Although [A-z] represents a range of 26 letters, But here it can only match a string with the first character being a lowercase letter. The previous mention of ^ represents the beginning of a string, but it has another meaning. When used in a set of square brackets ^ is, it means "non" or "exclude" meaning, often used to remove a character. Also in the previous example, we asked that the first character cannot be a number: ^[^0-9][0-9]$ this pattern matches "&5", "G7" and "2", but does not match "12" or "66". Here are a few examples of excluding specific characters: [^a-z]//Except lowercase letters all characters [^\\\/\^]//except for (\) (/) (^) all characters [^\ "\"]//except for double quotation marks (") and all characters other than single quotation marks (') special characters". " (point, period) is used in regular expressions to denote all characters except "New line." So the pattern "^.5$" matches any two-character string that ends with the number 5 and begins with other non-"new line" characters. Mode "." You can match any string, except for an empty string, and to include only a "new line" of strings. The regular expressions for PHP have some built-in universal character clusters, which are listed as follows: Character cluster description [[: Alpha:]] any number [[: Igit:]] any digit [[£ alnum:]] Any letter and number [[: Space:]] any white space character [[: Upper:]] Any uppercase letters [[: Lower:]] any lowercase letter [[: Xdigit:]] Any punctuation mark [[:p UNCT:]] Any 16-digit number, equivalent to [0-9a-fa-f] determined to repeat the occurrence until now, you already know how to match a letter or number, But in more cases, you might want to match a word or a group of numbers. A word consists of several letters, and a group of numbers has several singular parts. The curly braces ({}) following the character or character cluster are used to determine the number of occurrences of the preceding content. Character cluster description ^[a-za-z_]$ all letters and underscores ^[[:alpha:]]{3}$ all 3-letter words ^a$ Letters A^A{4} $aaaa ^a{2,4} $aa, AAA or aaaa^a{1,3} $a, AA or aaa^a{2,}$ String containing more than two a ^A{2,} such as: Aardvark and Aaab, but Apple does not a{2,} such as: Baad and AAA, but Nantucket not \t{2} two tabs. {2} All two characters these examples describe three different uses of curly braces. A number {x} means that the preceding character or character cluster appears only x times; a number plus a comma {x,} means that the preceding content appears x or more times, and two numbers separated by commas {x, y} indicate that the preceding content appears at least x times, but not more than y times.      We can extend the pattern to more words or numbers: ^[a-za-z0-9_]{1,}$//All strings that contain more than one letter, number, or underscore ^[1-9][0-9]{0,}$//all positive integers ^\-{0,1}[0-9]{1,}$ All integers ^[-]? [0-9]+\.? [0-9]+$//all floating-point numbers The last example is not very well understood, is it? Let's see: Start with an optional minus sign (^), follow 1 or more digits ([0-9]+), and a decimal (\). Keep up with 1 or more numbers ([0-9]+), and there's nothing else ($) behind. Below you will know the simpler way to use it. Special characters? is equal to {0,1}, and they all represent: 0 or 1 of the preceding content or the preceding content is optional. So just the example can be simplified as: ^\-? [0-9] {1,}\.? [0-9] {1,}$ special characters * are equal to {0,}, and they all represent 0 or more of the preceding content. Finally, the character + is equal to {1,}, which represents 1 or more of the preceding contents, so the above 4 examples can be written as: ^[a-za-z0-9_]+$//All strings containing more than one letter, number, or underscore ^[1-9][0-9]*$// Some positive integers ^\-? [0-9]+$//All integer ^\-? [0-9]+\.? [0-9]*$//all floating-point numbers of course this does not technically reduce the complexity of regular expressions, but it makes them easier to read.

  

Brother Lian Learning python----regular expression matching rules

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.