Notes on application of regular expressions in php

Source: Internet
Author: User
Tags character set control characters html tags numeric php script preg regular expression valid

With the development of Internet, web applications are becoming more and more popular in people's lives. some netizens will post some articles that contain some uncivilized words and pictures. Now, the country advocates civilized networks and removes vulgar Internet content to make the network develop in a healthy direction. regular expressions provide powerful functions for searching and replacing strings or images.

So today I will give you a detailed explanation of the relevant content about regular expressions, mainly for PHP.

1. What is a regular expression?


A Regular Expression (Regular Expression) was proposed by Ste-Phen Kleene, an American mathematician in 1956. It published a paper titled "neural network event representation, the concept of a regular expression is introduced. A regular expression is used to describe the expression called "regular expression set algebra". Therefore, the term "regular expression" is used. later, Ken Thompson, the chief inventor of Unix, used regular expressions in computational search algorithms and research. Now, regular expressions have been popularized in the Unix field. regular expressions can be applied to a variety of operating systems, including Unix and Hp. Due to the powerful functions and convenience of regular expressions, regular expressions are introduced in multiple languages, such as php, C #, c, C ++, Java, etc., so regular expressions have been widely used in many systems.


When to use a regular expression


Regular expressions are concise and flexible expressions for searching and replacing strings. for example, when a form is required on a webpage, data such as telephone, birthday, and E-mail are strings in a specific format, however, sometimes users randomly fill in invalid content. If you need to filter the text of the articles published by users in the blog or forum, you need to use regular expressions.


Basic regular expression syntax


The regular expression is in the form? Matching mode ?, Which is located in "?" The part between delimiters is the pattern to be matched in the target object. You only need to put the pattern content of the target object to "?" Between delimiters.


Metacharacters
Regular expressions provide special metacharacters to allow users to customize mode content more flexibly ". metacharacter refers to the number of times that a given component in a regular expression must appear before matching can be satisfied. in layman's terms, it is some characters with special meanings, such as "*. * in txt, which indicates the meaning of any string. if you want to find a file with * in the file name, you need to escape *, that is, add? Escape characters. Common metacharacters include "+", "*", and "?". "+" Specifies that its leading character must appear one or more times in a row in the target object. "*" specifies that its leading character must appear zero or multiple times in the target object. "? "Requires that its leading object must appear zero or once consecutively in the target object. "^" indicates the start position of the string or the start position of each row in multiline matching mode. "$" indicates the end position of each row in the string's delimiter or multiline matching mode.


Use several typical regular expressions as an example to describe their meanings


(1)/Eo + /. the expression contains the "+" metacharacter, indicating that it can be used with the "Eo", "Eool ", when one or more strings with the letter "O" appear consecutively after the letter "E.
(2)/we */. the expression contains the metacharacter "*", indicating that it can be used with "we", "wee ", or "well" and other strings with zero or multiple letters "e" consecutively appear after the letter "w.
(3)/tel? /. The expression contains. "?" Character, indicating that it can match "tell" or "tech" in the target object, and matches zero or one l character string consecutively after letter e. the usage of other primary metacharacters is as follows.
S: used to match a single space character, including the tab key and line break;
S: used to match all characters except a single space character;
D: used to match numbers from 0 to 9;
W: used to match letters, numbers, or underscores;
W: used to match all characters that do not match w;
·: Used to match all characters except line breaks.


Qualifier


In a regular expression, you can enclose several characters in square brackets to represent a metacharacter. in addition to metacharacters, regular expressions support the concept of delimiters. these delimiters can specify how many times a given component of a regular expression must appear before matching is met. Therefore, they can adapt to the uncertainty when they do not know how many characters to match. there are three usage instructions for the qualifier.
(1) {n} n is a non-negative integer. matched n times. for example, 'O {2} 'cannot match 'O' in 'Bob', but can match two 'O' in "good '.
(2) {n,} n is a non-negative integer. match at least n times. for example, 'O {2,} 'cannot match 'O' in 'Bob', but it can match all 'o' in 'goooood '. 'B {1,}' is equivalent to 'B +'; 'B {0,}' is equivalent to 'B *'.
(3) {n, m} m and n are non-negative integers, where n <= m. it can be matched at least n times and at most m times. for example, "o {}" will match the first three 'O' in "gooooood"; 'O {} 'is equivalent to 'O? ', Note that there must be no space between the comma and two numbers.

2. Application of regular expressions in php
Regular expressions are widely used in web systems. As long as you understand how to write regular expressions, you can use them in different ways. For example: checks the data format, replaces the relevant text, and extracts the text content of interest.


Verify the validity of the E-mail address in the string
The most basic email address format can be viewed as <user name @ domain name>. there is no uniform standard for the user name of each service provider. Apart from numbers and letters, "-" and "-" are allowed. ", and some can both, or allow other special characters. this can only be determined based on specific circumstances. it is assumed that ". ","-", and". ","-", cannot appear in the first or last position;". ","-", cannot be connected. in the domain name, "-" and "-" can only appear except numbers and letters, and cannot appear in the first or last place. "connection. we can also learn from the domain name that the last segment is more than one character and only has letters. based on the above description, we can write an expression to determine whether the string is an email address: The steps are as follows:
^: Match starts

([A-z0-9A-Z] + [-|.]?) +: A number or letter must be greater than 1 digit, "-" or ".". The preceding combination must be repeated more than once.

[A-z0-9A-Z]: the user name ends with a number or letter

@: Match "@"

([A-z0-9A-Z] +: Match multiple-digit words or letters
(-[A-z0-9A-Z] + )? : Match-add multiple digits or letters 0 or 1 time
.: Match "."
) +: Matches the content in the brackets multiple times.
[A-zA-Z] {2,}: Match letters more than 2 times
$: Matching end
By combining the above items, we can match a comprehensive
Email address. The regular expression is as follows:
^ ([A-z0-9A-Z] + [-|.]?) + [A-z0-9A-Z] @ ([a-z0-9A-Z] + (-[a-z0-9A-Z] + )? .) + [A-zA-Z] {2,} $
The php script for matching verification is as follows:
<? Php
$ Email = "tcmorningdew @ gmail. com"; // the email to be verified
Address
If! Preg-match ('/^ ([a-z0-9A-Z] + [-|.]?)
+ [A-z0-9A-Z] @ ([a-z0-9A-Z] + (-[a-z0-9A
-Z] + )? .) + [A-zA-Z] {2,} $/", $ email )){////
Start to check the format of the email, and return 0 if the email does not match.
Echo "your email address is incorrect ";}
Else {
Echo "your email address is correct ";}
? >


Use the regular expression function to find and replace text and code
A regular expression is a text pattern that can fully match the text and describe one or more variants of the text to be searched. for example, how can I replace an indecent word posted on an online Blog or forum with a regular expression? The following describes how to use a regular expression function to find and replace the corresponding string.
<? Php
$ String = "morning dew blog"; // waiting for verification
$ S-array = array ("/Wow/", "/tmd/", "/Grass Mud/"); // find indecent words
$ R-array = array ("oh", "too", "are you ?"); // Replace the word string
Echo 'original: '. $ string;
Echo '<br/> ';
Echo 'after replacement: '. preg-replace ($ s-array, $ r-ar-ray, $ string );
// Preg-replace (pattern, replacement, string). This regular expression function replaces all strings matching the expression pattern with replacement.
? >

Regular expressions have simple syntax and powerful functions, especially for data verification. If a program method is used to solve a data verification problem, some data may not be recognized by the program. if you create a regular expression rule, try to customize the corresponding pattern matching the search string, and then determine whether an expression is valid based on whether any part of the string matches the pattern, so as to verify the legitimacy of different data. the disadvantage of a regular expression is that it may be complicated to create and has vague meanings. however, it is easy to write as long as you have mastered the grammar rules correctly.

Regular expressions are often used for websites. Below are some explanations and examples for your reference and modification:
2. "^ d + $" // non-negative integer (positive integer + 0)
3. "^ [0-9] * [1-9] [0-9] * $" // positive integer
4. "^ (-d +) | (0 +) $" // non-positive integer (negative integer + 0)
5. "^-[0-9] * [1-9] [0-9] * $" // negative integer
6. "^ -? D + $ "// integer
7. "^ d + (. d + )? $ "// Non-negative floating point number (positive floating point number + 0)
8. "^ ([0-9] +. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] *. [0-9] +) | ([0-9] * [1-9] [0-9] *) $ "// positive floating point number
9. "^ (-d + (. d + )?) | (0 + (. 0 + )?)) $ "// Non-positive floating point number (negative floating point number + 0)
10. "^ (-([0-9] +. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] *. [0-9] +) | ([0-9] * [1-9] [0-9] *) $ "// negative floating point number
11. "^ (-? D +) (. d + )? $ "// Floating point number
12. "^ [A-Za-z] + $" // A string consisting of 26 letters
13. "^ [A-Z] + $" // a string consisting of 26 uppercase letters
14. "^ [a-z] + $" // a string consisting of 26 lowercase letters
15. "^ [A-Za-z0-9] + $" // string consisting of digits and 26 letters
16. "^ w + $" // a string consisting of digits, 26 English letters, or underscores
17. "^ [w-] + (. [w-] +) * @ [w-] + (. [w-] +) + $" // email address
18. "^ [a-zA-z] +: // (w + (-w + )*)(. (w + (-w + )*))*(? S *)? $ "// Url
19. /^ (d {2} | d {4})-(0 ([1-9] {1}) | (1 [1 | 2]) -([0-2] ([1-9] {1}) | (3 [0 | 1]) $ // year-month-day
20. /^ (0 ([1-9] {1}) | (1 [1 | 2]) /([0-2] ([1-9] {1}) | (3 [0 | 1]) /(d {2} | d {4}) $ // month/day/year
21. "^ ([w-.] +) @ ([0-9] {1, 3 }. [0-9] {1, 3 }. [0-9] {1, 3 }.) | ([w-] + .) +) ([a-zA-Z] {2, 4} | [0-9] {1, 3}) (]?) $ "// Emil
22./^ (+? [0-9] {2, 4}-[0-9] {3, 4}-) | ([0-9] {3, 4 }-))? ([0-9] {7,8}) (-[0-9] + )? $ // Phone number
23. "^ (d {1, 2} | 1dd | 2 [0-4] d | 25 [0-5]). (d {1, 2} | 1dd | 2 [0-4] d | 25 [0-5]). (d {1, 2} | 1dd | 2 [0-4] d | 25 [0-5]). (d {1, 2} | 1dd | 2 [0-4] d | 25 [0-5]) $ "// IP address
24.
25. Regular Expression Matching Chinese characters: [u4e00-u9fa5]
26. Match double byte characters (including Chinese characters): [^ x00-xff]
27. Regular expression for matching empty rows: n [s |] * r
28. Regular expressions matching HTML tags:/<(. *)>. * </1> | <(. *)/>/
29. Regular expression matching spaces at the beginning and end: (^ s *) | (s * $)
30. regular Expression Matching the Email address: w + ([-+.] w +) * @ w + ([-.] w + )*. w + ([-.] w + )*
31. regular Expression Matching URL: ^ [a-zA-z] +: // (\ w + (-\ w + )*)(\. (\ w + (-\ w + )*))*(\? \ S *)? $
32. The matching account is valid (starting with a letter, may be 5-16 bytes, may be letters and numbers underline): ^ [a-zA-Z] [a-zA-Z0-9 _] {} $
33. Match the Chinese phone number: (d {3}-| d {4 }-)? (D {8} | d {7 })?
34. Match Tencent QQ number: ^ [1-9] * [1-9] [0-9] * $
35.
36.
37. Metacharacters and their behavior in the context of the regular expression:
38.
39. Mark the next character as a special character, a literal character, a back reference, or an octal escape character.
40.
41. ^ match the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ matches the position after 'N' or 'R.
42.
43. $ match the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches the location before 'n' or 'R.
44.
45. * match the previous subexpression zero or multiple times.
46.
47. + match the previous subexpression once or multiple times. + Is equivalent to {1 ,}.
48.
49 .? Match the previous subexpression zero or one time .? It is equivalent to {0, 1 }.
50.
51. {n} n is a non-negative integer that matches a definite n times.
52.
53. {n ,}n is a non-negative integer and matches at least n times.
54.
55. {n, m} m and n are non-negative integers, where n <= m. Match at least n times and at most m times. There must be no space between a comma and two numbers.
56.
57 .? When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible.
58.
59 .. match any single character except "n. To match any character including 'n', use a pattern like '[. n.
60. (pattern) match pattern and obtain this match.
61.
62 .(? : Pattern) matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use.
63.
64 .(? = Pattern) forward pre-query: matches the search string at the beginning of any string that matches pattern. This is a non-get match, that is, the match does not need to be obtained for future use.
65.
66 .(?! Pattern) negative pre-query, and (? = Pattern).
67.
68. x | y matches x or y.
69.
70. [xyz] character set combination.
71.
72. [^ xyz] combination of negative character sets.
73.
74. [a-z] character range, matching any character in the specified range.
75.
76. [^ a-z] the negative character range matches any character that is not in the specified range.
77.
78. B matches a word boundary, that is, the position between a word and a space.
79.
80. B matches non-word boundaries.
81.
82. cx matches the control characters specified by x.
83.
84. d matches a numeric character. It is equivalent to [0-9].
85.
86. D matches a non-numeric character. It is equivalent to [^ 0-9].
87.
88. f matches a form feed. It is equivalent to x0c and cL.
89.
90. n matches a line break. It is equivalent to x0a and cJ.
91.
92. r matches a carriage return. It is equivalent to x0d and cM.
93.
94. s matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [fnrtv].
95.
96. S matches any non-blank characters. It is equivalent to [^ fnrtv].
97.
98. t matches a tab. It is equivalent to x09 and cI.
99.
100. v matches a vertical tab. It is equivalent to x0b and cK.
101.
102. w matches any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.
103.
104. W matches any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.
105.
106. xn matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers.
107.
108. num matches num, where num is a positive integer. References to the obtained matching.
109.
110. n identifies an octal escape value or a back reference. If n contains at least n obtained subexpressions, n is a back reference. Otherwise, if n is an octal digit (0-7), n is an octal escape value.
111.
112. nm identifies an octal escape value or a back reference. If there are at least is preceded by at least nm obtained subexpressions before nm, then nm is backward reference. If at least n records are obtained before nm, n is a backward reference followed by m. If none of the preceding conditions are met, if n and m are octal numbers (0-7), nm matches the octal escape value nm.
113.
114. If n is an octal digit (0-3) and m and l are both octal digits (0-7), the nml value matches the octal escape value.
115.
116. un matches n, where n is a Unicode character represented by four hexadecimal numbers.
117.
118. Regular Expression Matching Chinese characters: [u4e00-u9fa5]
119.
120. Match double byte characters (including Chinese characters): [^ x00-xff]
121.
122. Regular expression for matching empty rows: n [s |] * r
123.
124. Regular expressions matching HTML tags:/<(. *)>. * </1> | <(. *)/>/
125.
126. Regular expression matching the first and last spaces :( ^ s *) | (s * $)
127.
128. regular Expression Matching the Email address: w + ([-+.] w +) * @ w + ([-.] w + )*. w + ([-.] w + )*
129.
130. Regular expression matching URL: http: // ([w-] +.) + [w-] + (/[w -./? % & =] *)?
131.
132. Use regular expressions to restrict text box input in a webpage form:
133.
134. you can only enter Chinese characters using regular expressions: onkeyup = "value = value. replace (/[^ u4E00-u9FA5]/g, '')" onbeforepaste = "clipboardData. setData ('text', clipboardData. getData ('text '). replace (/[^ u4E00-u9FA5]/g ,''))"
135.
136. you can only enter the full-width characters: onkeyup = "value = value. replace (/[^ uFF00-uFFFF]/g, '')" onbeforepaste = "clipboardData. setData ('text', clipboardData. getData ('text '). replace (/[^ uFF00-uFFFF]/g ,''))"
137.
138. use a regular expression to limit that only numbers can be entered: onkeyup = "value = value. replace (/[^ d]/g, '')" onbeforepaste = "clipboardData. setData ('text', clipboardData. getData ('text '). replace (/[^ d]/g ,''))"
139.
140. you can only enter numbers and English letters using regular expressions: onkeyup = "value = value. replace (/[W]/g, '')" onbeforepaste = "clipboardData. setData ('text', clipboardData. getData ('text '). replace (/[^ d]/g ,''))"
141.
142. ========= Common regular expressions
143.
144.
145.
146. Regular Expression Matching Chinese characters: [u4e00-u9fa5]
147.
148. Match double byte characters (including Chinese characters): [^ x00-xff]
149.
150. Regular expression for matching empty rows: n [s |] * r
151.
152. Regular expressions matching HTML tags:/<(. *)>. * </1> | <(. *)/>/
153.
154. Regular expression matching the first and last spaces :( ^ s *) | (s * $)
155.
156. Regular expression matching IP address:/(d +). (d +)/g //
157.
158. regular Expression Matching the Email address: w + ([-+.] w +) * @ w + ([-.] w + )*. w + ([-.] w + )*
159.
160. Regular expression matching URL: http: // (/[w-] +.) + [w-] + (/[w -./? % & =] *)?
161.
162. SQL statement: ^ (select | drop | delete | create | update | insert). * $
163.
164. 1. Non-negative integer: ^ d + $
165.
166. 2. Positive integer: ^ [0-9] * [1-9] [0-9] * $
167.
168. 3. Non-positive integer: ^ (-d +) | (0 +) $
169.
170. 4. Negative integer: ^-[0-9] * [1-9] [0-9] * $
171.
172. 5. Integer: ^ -? D + $
173.
174. 6. Non-negative floating point number: ^ d + (. d + )? $
175.
176. 7. Positive floating point: ^ (0-9) +. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] *. [0-9] +) | ([0-9] * [1-9] [0-9] *) $
177.
178. 8. Non-positive floating point: ^ (-d +. d + )?) | (0 + (. 0 + )?)) $
179.
180. 9. Negative floating point number: ^ (-(regular expression of positive floating point number) $
181.
182. 10. English string: ^ [A-Za-z] + $
183.
184. 11. English capital string: ^ [A-Z] + $
185.
186. 12. Lowercase English string: ^ [a-z] + $
187.
188. 13. English character numeric string: ^ [A-Za-z0-9] + $
189.
190. 14. English numerals and underline strings: ^ w + $
191.
192. 15. Email Address: ^ [w-] + (. [w-] +) * @ [w-] + (. [w-] +) + $
193.
194. 16. URL: ^ [a-zA-Z] +: // (w + (-w + )*)(. (w + (-w + )*))*(? S *)? $
195. Or: ^ http: // [A-Za-z0-9] +. [A-Za-z0-9] + [/=? % -&_~ '@ []': +!] * ([^ <> ""]) * $
196.
197. 17. Zip code: ^ [1-9] d {5} $
198.
199. 18, Chinese: ^ [u0391-uFFE5] + $
200.
201. 19. Phone number: ^ (d {2, 3}) | (d {3 }-))? (0d {2, 3}) | 0d {2, 3 }-)? [1-9] d {6, 7} (-d {1, 4 })? $
202.
203. 20. Mobile phone number: ^ (d {2, 3}) | (d {3 }-))? 13d {9} $
204.
205. 21, double byte characters (including Chinese characters): ^ x00-xff
206.
207. 22. Match the first and last Spaces: (^ s *) | (s * $) (trim functions like vbscript)
208.
209. 23. Matching HTML tags: <(. *)>. * </1> | <(. *)/>
210.
211. 24. Match null rows: n [s |] * r
212.
213. 25. Extract the network link in the Information: (h | H) (r | R) (e | E) (f | F) * = * ('| ")? (W | \ |/|.) + ('| "| * |> )?
214.
215. 26. Email address extracted from the information: w + ([-+.] w +) * @ w + ([-.] w + )*. w + ([-.] w + )*
216.
217. 27. Extract the image link in the Information: (s | S) (r | R) (c | C) * = * ('| ")? (W | \ |/|.) + ('| "| * |> )?
218.
219. 28. Extract the IP address in The Information: (d +). (d +)
220.
221. 29. Extract the Chinese mobile phone number from The Information: (86) * 0 * 13d {9}
222.
223. 30. Extracted Chinese landline numbers from the Information: (d {3, 4}) | d {3, 4}-| s )? D {8}
224.
225. 31. Extract Chinese phone numbers (including mobile and landline phone numbers) from the Information: (d {3, 4}) | d {3, 4}-| s )? D {7, 14}
226.
227. 32. China Zip code extracted from the information: [1-9] {1} (d +) {5}
228.
229. 33. Extract floating point numbers (decimal places) in The Information ):(-? D *).? D +
230.
231. 34. Extract any number in the Information :(-? D *) (. d + )?
232.
233. 35. IP: (d +). (d +)
234.
235. 36. Telephone area code:/^ 0d {2, 3} $/
236.
237. 37. Tencent QQ: ^ [1-9] * [1-9] [0-9] * $
238.
239. 38. Account (starting with a letter, may be 5-16 bytes, may be an alphanumeric underline): ^ [a-zA-Z] [a-zA-Z0-9 _] {} $
240.
241. 39, Chinese, English, numbers and underscores: ^ [u4e00-u9fa5_a-zA-Z0-9] + $

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.