Summary of regular expressions in Java

Source: Internet
Author: User
Tags alphabetic character

First, the basic knowledge of regular expressions

1.1 Period Symbol
Suppose you are playing English Scrabble and want to find three-letter words, and the words must start with a "T" letter and End with an "n" letter. Also, if you have an English dictionary, you can use regular expressions to search for all of its contents. To construct this regular expression, you can use a wildcard character-the period symbol "." Thus, the complete expression is "T.N", which matches "tan", "ten", "Tin" and "ton", also matches "t#n", "TPN" and even "T n", there are many other meaningless combinations. This is because the period symbol matches all characters, including spaces, tab characters, and even line breaks:

1.2 Square brackets Symbol

In order to solve the problem that the period symbol matching range is too wide, you can specify the characters that appear to be meaningful in square brackets ("[]"). At this point, only the characters specified inside the square brackets characters participate in the match. In other words, the regular expression "t[aeio]n" only matches "tan", "Ten", "Tin" and "ton". But "Toon" does not match, because within square brackets you can only match a single

Characters:

1.3 "or" symbol

If you want to match "toon" in addition to all the words that match above, then you can use the "|" Operator. | The basic meaning of an operator is a "or" operation. to match "Toon", use the "t (A|e|i|o|oo) n" Regular expression. You cannot use a square extension, because the square brackets allow only a single character to be matched, and the parentheses "()" must be used here. Parentheses can also be used to group.

1.4 Symbol for number of matches

The following table shows the syntax for regular expressions:

Table 1.1 Regular Expression syntax

Metacharacters Description
. matches any single character. For example, the regular expression "B.G" can match the following string: "Big", "Bug", "B g", but does not match "Buug".
$ Matches the line terminator. For example, the regular expression "ejb$" can match the end of the string "I like EJB", but cannot match the string "EE without ejbs! ”。
^ Matches the start of a row. For example, the regular expression "^spring" is able to match the beginning of the string "Spring is a Java EE framework", but does not match "I use Spring in my Project".
* Match 0 to more characters before it. For example, the regular expression "zo*" can Match "Z" and "Zoo"; the regular expression ". *" means that you can match any string.
/ An escape character that is used to match a meta character as a normal character. For example, the regular expression/$ is used to match the dollar sign, not the end of the line; a wildcard character used to match point characters, not any characters.
[] Matches any one of the characters in the parentheses. For example, the regular expression "b[aui]g" matches bugs, big and bugs, but does not match beg. You can use the hyphen "-" in parentheses to specify the interval of the character to simplify the representation, such as the regular expression [0-9] can match any numeric character, so that the regular expression "a[]c" can Match "a0c", "A1c", "a2c" and other strings, you can also create multiple intervals, such as "[ A-za-z] "can match any uppercase and lowercase letter. There is also a meta-character "^" used in conjunction, which is not like the previous "^" to indicate the beginning of the match line, but rather "exclude", in order to match the other than the specified range of characters, you can use the left parenthesis and the first character between the ^ character, for example "[^163a-z]" will be able to match any character except 1, 6, 3, and all uppercase letters.
( ) The expression that is enclosed in () is defined as a group, andthe character that matches the expression is saved to a staging area, which is useful when the string is extracted.
| A logical OR operation is performed on two matching criteria. ' Z|food ' can match "z" or "food". ' (z|f) Ood ' matches "Zood" or "food".
+ Matches the preceding subexpression one or more times. For example, the regular expression + + matches 9, 99, 999, and so on.
? Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does". There is another use for this meta-character, which is to represent a non-greedy pattern match, followed by an introduction
{n} Matches the determined n times. For example, "e{2}" cannot match "D" in "bed", but can match two "E" in "seed".
{n,} Match at least N times. For example, "e{2,}" cannot match "E" in "bed", but can match all "E" in "Seeeeeeeed".
{n,m} Matches at least N times and matches up to M times. "e{1,3}" will match the first three "E" in "Seeeeeeeed".

Suppose we want to search the text file for Social Security numbers in the United States. The format of this number is 999-99-9999. The regular expression used to match it is shown. In regular expressions, a hyphen ("-") has a special meaning, which represents a range, such as from 0 to 9. Therefore, when you match a hyphen symbol in a social security number, it is preceded by an escape character "/".

If you're searching, you want a hyphen to appear or not-that is, 999-99-9999 and 999999999 are in the right format. At this point, you can add the word "?" after the hyphen symbol. The quantity-qualified symbol.

One format for American car licences is four digits plus two letters. Its regular expression is preceded by the number part "[0-9]{4}", plus the letter part "[A-z]{2}".

1.5 "no" symbol

The "^" symbol is called a "no" symbol. If used in square brackets, "^" indicates a character that you do not want to match. For example, the regular expression in Figure four matches all words, except for words that begin with an "X" letter.

1.6 Parentheses and blank symbols

The "/S" symbol is a blank symbol that matches all whitespace characters, including the tab character. If the string matches correctly, then how do you extract the month part? Simply add a parenthesis around the month to create a group, and then use the Oro API to extract its value.

1.7 Other symbols

For simplicity, you can use some shortcut symbols that are created for common regular expressions. As shown in the following:

/t: tab, equivalent to/u0009/n: newline character, equivalent to/u000a/d: Represents a number, equivalent to [0-9]/D: Represents a non-number, equivalent to [^0-9]/s: Represents a newline character, Tab tab, and other white space characters/s: Represents a non-whitespace character/w: An alphabetic character, equivalent to [ A-ZA-Z_0-9]/W: Non-alphabetic characters, equivalent to [^/w] For example, in the previous social Security Number example, all occurrences of "[0-9]" where we can use "/d". Here are the procedures I have compiled: for reference:[Java]View Plaincopy
  1. Package org.luosijin.test;
  2. Import Java.util.regex.Matcher;
  3. Import Java.util.regex.Pattern;
  4. /**
  5. * Regular Expressions
  6. * @version V5.0
  7. * @author Rosikin
  8. * @date 2009-11-9
  9. */
  10. Public class Regex {
  11. /** 
  12. * @param args
  13. * @author Rosikin
  14. * @date 2009-11-9 11:27:28
  15. */
  16. public static void Main (string[] args) {
  17. Pattern pattern = Pattern.compile ("b*g");
  18. Matcher Matcher = Pattern.matcher ("BBG");
  19. System.out.println (Matcher.matches ());
  20. System.out.println (Pattern.matches ("B*g","BBG"));
  21. //Verify ZIP/Postal Code
  22. System.out.println (Pattern.matches ("[0-9]{6}", "200038"));
  23. System.out.println (Pattern.matches ("//d{6}", "200038"));
  24. //Verify phone number
  25. System.out.println (Pattern.matches ("[0-9]{3,4}//-?[  0-9]+ ", " 02178989799 "));
  26. GetDate ("Nov 10,2009");
  27. Charreplace ();
  28. //Verify ID: Determine if a string is an ID number, that is, 15 or 18 digits.
  29. System.out.println (Pattern.matches ("^//d{15}|//d{18}$", "123456789009876"));
  30. GetString ("D:/dir1/test.txt");
  31. Getchinese ("Welcome to China, Jiangxi Fengxin, welcome, you!");
  32. Validateemail ("[email protected]");
  33. }
  34. /** 
  35. * Date Extraction: Extract the Month to
  36. * @param str
  37. * @author Rosikin
  38. * @date 2009-11-9 11:56:06
  39. */
  40. public static void GetDate (String str) {
  41. String regex="([a-za-z]+) |//s+[0-9]{1,2},//s*[0-9]{4}";
  42. Pattern pattern = pattern.compile (regEx);
  43. Matcher Matcher = Pattern.matcher (str);
  44. if (!matcher.find ()) {
  45. System.out.println ("date format is wrong!");
  46. return;
  47. }
  48. System.out.println (Matcher.group (1));  The index values for the//group are starting from 1, so the first grouping method is M.group (1) instead of M.group (0).
  49. }
  50. /** 
  51. * Character substitution: This example replaces all occurrences of one or more contiguous "a" in a string with "a".
  52. *
  53. * @author Rosikin
  54. * @date 2009-11-10 12:06:03
  55. */
  56. public static void Charreplace () {
  57. String regex = "A +";
  58. Pattern pattern = pattern.compile (regex);
  59. Matcher Matcher = Pattern.matcher ("okaaaa letmeaseeaaa aa Booa");
  60. String s = Matcher.replaceall ("A");
  61. System.out.println (s);
  62. }
  63. /** 
  64. * String Extraction
  65. * @param str
  66. * @author Rosikin
  67. * @date 2009-11-10 12:20:48
  68. */
  69. public static void getString (String str) {
  70. String regex = ". +/(. +) $";
  71. Pattern pattern = pattern.compile (regex);
  72. Matcher Matcher = Pattern.matcher (str);
  73. if (!matcher.find ()) {
  74. System.out.println ("File path format is incorrect!)  ");
  75. return;
  76. }
  77. System.out.println (Matcher.group (1));
  78. }
  79. /** 
  80. * Chinese Extract
  81. * @param str
  82. * @author Rosikin
  83. * @date 2009-11-10 12:27:17
  84. */
  85. public static void Getchinese (String str) {
  86. String regex = "[//u4e00-//u9fff]+";//[//u4e00-//u9fff] for Chinese characters
  87. Pattern pattern = pattern.compile (regex);
  88. Matcher Matcher = Pattern.matcher (str);
  89. StringBuffer sb = new StringBuffer ();
  90. While (Matcher.find ()) {
  91. Sb.append (Matcher.group ());
  92. }
  93. System.out.println (SB);
  94. }
  95. /** 
  96. * Verify Email
  97. * @param Email
  98. * @author Rosikin
  99. * @date 2009-11-10 12:34:50
  100. */
  101. public static void Validateemail (String email) {
  102. String regex = "[0-9a-za-z][email protected][0-9a-za-z]+//.[  0-9a-za-z]+ ";
  103. Pattern pattern = pattern.compile (regex);
  104. Matcher Matcher = pattern.matcher (email);
  105. if (matcher.matches ()) {
  106. System.out.println ("This is a legitimate email");
  107. }else{

Summary of regular expressions in Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.