Regular Expression
Regular Expressions are applicable in many scenarios. For example, verify whether the string matches the format, find the matched string, and replace the matched string.
Many programming languages support regular expressions with similar syntax.
A regular expression consists of two parts: Pattern and matching string.
We usually say that writing a regular expression is to write a pattern. Then verify that some input strings match this pattern.
Regexp
In ruby, the pattern is called Regexp. The content wrapped in/.../or % r (...) Is this Regexp.
For example
- /regexp/
- %r(regexp)
Haystack contains y, so they match.
- /y/.match('haystack') #=> #<MatchData "y">
Haystack does not contain needle, so nil is returned if it does not match.
- /needle/.match('haystack') #=> nil
Haystack contains hay, so it matches.
- /hay/.match('haystack') #=> #<MatchData "hay">
Metacharacters and Escapes
Character(
,)
,[
,]
,{
,}
,.
,?
,+
,*
They are all metacharacters. They have special meanings in the mode. If you want to match these strings, you need to add the backslash \ before them so that these special characters are escape from the mode, represents a common character.
- /1 \+ 2 = 3\?/.match('Does 1 + 2 = 3?') #=> #<MatchData "1 + 2 = 3?">
Ruby expressions can also be embedded into the pattern, as long as they are written in.
- Place = "Tokyo"
- /# {Place}/. match ("Go to Tokyo ")
- #=># <MatchData "Tokyo">
Character Classes
Use some [] characters to check whether they appear in match. /[AB]/represents a or B,/AB/represents a followed by B.
- /W[aeiou]rd/.match("Word") #=> #<MatchData "Word">
The two connected characters represent a range. [a-d] and [abcd] indicate a range.
[] Can contain multiple ranges [a-dx-z] and [abcdxyz.
- /[0-9a-f]/.match('9f') #=> #<MatchData "9">
- /[9f]/.match('9f') #=> #<MatchData "9">
^ Indicates the opposite, except for the content after ^ In the mode.
- /[^a-eg-z]/.match('f') #=> #<MatchData "f">
- /./, Which represents any character, except for new lines.
- //./M, which represents any character. m indicates that multiple rows can be matched.
- /\ W/, represents a character, [a-zA-Z0-9].
- /\ W/, representing a non-character, [^ a-zA-Z0-9].
- /\ D/, representing a number [0-9].
- /\ D/, representing a non-number, [^ 0-9].
- /\ H/, representing a hexadecimal character, [0-9a-fA-F].
- /\ H/, which is a non-hexadecimal character [^ 0-9a-fA-F].
- /\ S/
, Represents a blank character, [\ t \ r \ n \ f]/.
- /\ S/
, Representing a non-blank character, [^ \ t \ r \ n \ f]/.
Repetition already exists
Repeated characters can indicate the number of repeated characters.
Repetition is greedy by default, and it will try its best to backward match and match more content. Lazy match only finds the most recent matching string and only matches the minimum number.
By adding? It turns greed into laziness.
- /<.+>/.match("<a><b>") #=> #<MatchData "<a><b>">
- /<.+?>/.match("<a><b>") #=> #<MatchData "<a>">
References
1. Regexp
2. Regular expressions to be mastered by SEO
This article is from the "breakthrough IT architects" blog, please be sure to keep this source http://virusswb.blog.51cto.com/115214/1043505