Ruby regular-expression learning notes

Source: Internet
Author: User
Tags hash modifier modifiers regular expression split

Literal constructor for Ruby regular expression:

//
Try this:

>>//.class
=> Regexp
Pattern matching consists of two parts, a regular expression (regexp), and a string. A regular expression predicts a string that either satisfies the prediction or is not satisfied.

See if the match can be matched using the match method. Do an experiment:

>> puts "Match!" "If/abc/.match (" The alphabet starts with ABC. ")
The!
=> Nil
>> puts "Match!" "If" The alphabet starts with ABC. " Match (/abc/)
The!
=> Nil
In addition to the match method, there is a pattern matching operator =~, which is placed in the middle of a string and regular expression:

>> puts "Match!" "If/abc/=~" The alphabet starts with ABC. "
The!
=> Nil
>> puts "Match!" "If" The alphabet starts with ABC. "=~/abc/
The!
=> Nil
No match will return nil. Match is different from what =~ returns. =~ returns a numeric index that matches the starting character, and match returns an instance of the Matchdata class. Do an experiment:

>> "The alphabet starts with ABC" =~/abc/
=> 25
>>/abc/.match ("The alphabet starts with ABC")
=> #<matchdata "abc" >
Match mode

The middle thing is not a string, it's your predictions and limitations on strings.

Literal characters

The literal character in the regular expression matches itself, for example:

/a/
The match is the letter A.

Some words have special meaning, if you do not want it to express special meaning, you can use the line to escape it:

/\?/
Wildcard characters

. Represents any character other than a newline character.

/h.t/
Match Hot,hit ...

Character class

The character class is in a set of square brackets:

/h[oi]t/
In the above regular expression, that character class means to match O or I. That is, the above pattern will match "hot", "hit", but does not match "h!t" or anything else.

The following character class matches lowercase A through Z:

/[a-z]/
^ Symbols in the character class Express negation:

/[^a-z]/
Matches a hexadecimal number, which may take several character ranges in a character class:

/[a-fa-f0-9]/
Match number 0-9:

/[0-9]/
0-9 is too common, so there is a simple form, d means digit:

/\d/
Numbers, characters, and underscores, w means word:

/\w/
Whitespace, such as spaces, tab, and line breaks, s indicates space:

/\s/
Uppercase expression negative form: \d,\w,\s.

The

To give you the yes/no result matching operation:

Regex.match (String)
String.match (Regex)
Child matching

For example, we have a line of words about a person:

peel,emma,mrs.,talented Amateur
I want to get a person's last name, and title. We know that the fields are separated by commas, and we know the order: last Name,first name,title,occupation.

First there are some alphabetic characters,
And then a comma,
Then again some alphabetic characters,
Then it's a comma.
And then Mr. or Mrs.
Match the pattern of the above character Fule:

/[a-za-z]+,[a-za-z]+,mrs?\./
S? It means this s can have or can not. So Mrs? It will also match Mr or Mrs. Do an experiment on IRB:

>>/[a-za-z]+,[a-za-z]+,mrs?\./.match ("peel,emma,mrs.,talented amateur")
=> #<matchdata "Peel,emma,mrs." >
We've got a Matchdata object. Now, what do we do with Pell and Mrs? You can use parentheses to group matching patterns:

/([a-za-z]+), [a-za-z]+, (mrs?\.) /
Try this matching pattern again:

>>/([a-za-z]+), [a-za-z]+, (mrs?\.) /.match ("peel,emma,mrs.,talented amateur")
=> #<matchdata "Peel,emma,mrs." 1: "Peel" 2: "Mrs." >
By using $ to get a match in the first group, $ $ can be matched in a second packet:

>> puts $
Peel
=> Nil
>> puts $
Mrs.
=> Nil
Match success and failure

No match found, the return value is nil, try:

>>/a/.match ("B")
=> Nil
If the match succeeds, the Matchdata object is returned, and its Boolean value is true. There are also information about matching, such as where the match started, how many strings were covered, what was gained in the group, and so on.

To use Matchdata, you have to store it first. Practice:

string = "My phone number is (123) 555-1234."
Phone_re =/\ (\d{3}) \ \s+ (\d{3})-(\d{4})/
m = Phone_re.match (String)

Unless M
Puts "no match"
Exit
End

Print "Entire string:"
Puts M.string
Print "Match:"
Puts M[0]
Puts "three groups:"
3.times do |index|
Puts "#{index + 1}:#{m.captures[index]}"
End
Puts "get the first packet match:"
Puts M[1]
The result:

Entire string: My phone number is (123) 555-1234.
Matches: (123) 555-1234
Three groupings:
1:123
2:555
3:1234
Get the first grouping match:
123
Two ways to get caught

What is captured from the matching pattern grouping from the Matchdata object:

M[1]
M[2]
...
M[0] Gets the full content of the match.

Another way to get a packet capture is to use the captures method, which returns an array in which the items in the array are captured substrings.

M[1] = = M.captures[0]
M[2] = = M.captures[1]
Let's look at an example:

>> ((a) ((b) c))/.match ("abc")
=> #<matchdata "ABC" 1: "ABC" 2: "a" 3: "BC" 4: "B" >
Named capture

>> re =/(? <first>\w+) \s+ ((? <middle>\w\.) \s+)? (? <last>\w+)/
The:

>> m = Re.match ("Samuel L. Jackson")
=> #<matchdata "Samuel L. Jackson", "Samuel" Middle: "L." Last: "Jackson" >
>> M[:first]
=> "Samuel"
>> M[:last]
=> "Jackson"
Additional Information for Matchdata

Followed by the example of the phone number:

Print "matches the previous section:"
Puts M.pre_match

Print "matches the following section:"
Puts M.post_match

Print "Second capture start character:"
Puts M.begin (2)

Print "Third capture end character:"
Puts M.end (3)
The result of the output is:

Before the match: My phone number is
After the matching part:.
Second capture start character: 14
Third capture end character: 22
The Begin and End methods are to be validated.
Quantifiers,anchors,modifiers

Quantifiers (qualifier), anchors (tag), modifiers (modifier).

Qualifier

Quantifiers can specify the number of times that something in a match should match.

0 or one

?
Cases:

/mrs?/
S can appear 0 or one time.

0 or more

*
Cases:

/\d*/
One or more

+
Cases:

/\d+/
Greedy quantifier

* and + These two qualifiers are very greedy. It means that they will match as many characters as possible.

Look at the. + Match in the following example:

>> string = "abc!def!ghi!"
=> "abc!def!ghi!"
>> match =/.+!/.match (string)
=> #<matchdata "abc!def!ghi!" >
>> puts Match[0]
abc!def!ghi!
=> Nil
You may expect to return the substring "abc!", but we get the "abc!def!ghi!". Qualifier + greedy eats up all the characters it can cover, one to the last! Number is over.

Can we add one after + and *? Number, let them not so greedy. Try again:

>> string = "abc!def!ghi!"
=> "abc!def!ghi!"
>> match =/.+?! /.match (String)
=> #<matchdata "abc!" >
>> puts Match[0]
Abc!
=> Nil
And do an experiment:

>>/(\d+?) /.match ("digits-r-us 2345")
=> #<matchdata "2" 1: "2" >
>> puts $
2
=> Nil
Let's look at a match:

>>/\d+5/.match ("digits-r-us 2345")
=> #<matchdata "2345" >
So try again:

>>/(\d+) (5)/.match ("Digits-r-us 2345")
=> #<matchdata "2345" 1: "234" 2: "5" >
Number of specific repetitions

Put the number of times into {}. The following matches three digits, a small horizontal line, followed by four digits:

/\d{3}-\d{4}/
It may also be a range that matches 1 to 10 digits below:

/\d{1,10}/
The first number in the curly braces is the minimum, and the following matches 3 or more digits:

/\d{3,}/
Limitations of parentheses

>>/([A-z]) {5}/.match ("Matt DAMON")
=> #<matchdata "DAM" 1: "N" >
The match you expect may be DAMON, but the actual match is N. If you want to match DAMON, you need to do this:

>>/([a-z]{5})/.match ("Matt DAMON")
=> #<matchdata "DAMON" 1: "DAMON" >
Anchors and assertions

Anchors (tags, anchors) and assertions (assert): Some conditions must be met before the character matches are processed.

^ Represents the beginning of a line, the end of a row.

The comment in Ruby is the start of the #, and the pattern that matches it can be like this:

/^\s*#/
^ Matching is the beginning of the line.

Anchors

^: The beginning of the line
$: End of line
\a: Start of string
\z: End of string
\z: End of string, pattern:/from the earth.\z/, matching: "From the earth\n"
\b: Word boundary
Lookahead assertions

You want to match a set of numbers, its end must be a bit, but you do not want to include this point in the matching content, you can do this:

>> str = "123 456". 789 "
=> "123 456. 789 "
>> m =/\d+ (? =\.) /.match (str)
=> #<matchdata "456" >
Lookbehind assertions

I'm going to match Damon, but it has to be in front of Matt.

Mode:

/(? <=matt) damon/
Try this:

>>/(? <=matt) Damon/.match ("Matt Damon")
=> #<matchdata "Damon" >
>>/(? <=matt) Damon/.match ("Matt1 Damon")
=> Nil
I'm going to match Damon, but it can't be in front of Matt.

Mode:

/(? <! Matt) damon/
Try this:

>>/(? <! Matt) Damon/.match ("Matt Damon")
=> Nil
>>/(? <! Matt) Damon/.match ("Matt1 Damon")
=> #<matchdata "Damon" >
Do not capture

Use:?:

>> str = "ABC def GHI"
=> "ABC def GHI"
>> m =/(ABC) (?:d EF) (GHI)/.match (str)
=> #<matchdata "abc def GHI" 1: "ABC" 2: "Ghi" >
Conditional matching

Conditional expression: (?) ( 1) b|c, if you get the $, match B, or the match is C:

>> re =/(a)? (? (1) b|c)/
=>/(a)? (? (1) b|c)/
>> Re.match ("AB")
=> #<matchdata "AB" 1: "a" >
>> Re.match ("B")
=> Nil
>> Re.match ("C")
=> #<matchdata "C" 1:nil>
Having the name of:

/(? <first>a)? (? (<first>) b|c)/
Modifiers

I this modifier indicates no case sensitivity:

/abc/i
M represents multiple lines:

/abc/m
X can change the way the regular expression parser treats spaces, ignoring the whitespace in the regular expression, except for the whitespace you use to escape.

/
\ ((\d{3}) \) # 3 digits inside literal parens (area code)
\s # One space character
(\d{3}) # 3 digits (Exchange)
-# Hyphen
(\d{4}) # 4 digits (second part of number
/x
converting strings and regular expressions

String-to-regexp

To use interpolation in regular expressions:

>> str = "Def"
=> "Def"
>>/abc#{str}/
=>/abcdef/
If the string contains characters that have special meaning in regular expressions, such as dots (.):

>> str = "A.C"
=> "A.C"
>> re =/#{str}/
=>/a.c/
>> Re.match ("A.C")
=> #<matchdata "A.C" >
>> Re.match ("abc")
=> #<matchdata "abc" >
You can escape these special characters:

>> regexp.escape ("A.C")
=> "A\\.C"
>> regexp.escape ("^abc")
=> "\\^ABC"
Try this again:

>> str = "A.C"
=> "A.C"
>> re =/#{regexp.escape (str)}/
=>/a\.c/
>> Re.match ("A.C")
=> #<matchdata "A.C" >
>> Re.match ("abc")
=> Nil
You can also:

>> regexp.new (' (. *) \s+black ')
=>/(. *) \s+black/
This is OK:

>> regexp.new (' mr\. David Black ')
=>/mr\. David black/
>> regexp.new ("Mr David Black") (Regexp.escape)
=>/mr\.\ david\ black/
Regexp-to-string

A regular expression can represent itself in the form of a string:

>> puts/abc/
(?-mix:abc)
Inspect

>>/abc/.inspect
=> "/abc/"
Methods for using regular expressions

Some of the methods in Ruby can use regular expressions as their arguments.

For example, in an array, you want to find items that have a character length greater than 10 and contain a number:

Array.find_all {|e| e.size >/\d/.match (E)}
String#scan

Find all the numbers contained in a string:

>> "Testing 1 2 3 Testing 4 5 6". Scan (/\d/)
=> ["1", "2", "3", "4", "5", "6"]
Group:

>> str = "Leopold Auer was the teacher of Jascha Heifetz."
=> "Leopold Auer was the teacher of Jascha Heifetz."
>> violinists = Str.scan ([a-z]\w+) \s+ ([a-z]\w+)/)
=> [["Leopold", "Auer"], ["Jascha", "Heifetz"]]
You can use this:

Violinists.each do |fname,lname|
Puts "#{lname}", "s".
End
The output is:

Auer ' s-name was Leopold.
Heifetz ' s-name was Jascha.
To join together:

Str.scan (/([a-z]\w+) \s+ ([a-z]\w+)/) do |fname, lname|
Puts "#{lname}", "s".
End
And do an experiment:

"One two Three". Scan (/\w+/) {|n| puts "Next number: #{n}"}
The output is:

Next Number:one
Next Number:two
Next Number:three
If you provide a block of code, scan does not store the result in an array, it sends each result to a block of code and discards the result. That is, you can scan a long thing without worrying too much about the memory problem.

Stringscanner

Stringscanner in the Strscan extension, it provides some tools for scanning and checking strings. You can use the position and the pointer to move.

>> require ' Strscan '
=> true
>> ss = Stringscanner.new ("Testing string scanning")
=> #<stringscanner 0/23 @ "Testi ..." >
>> Ss.scan_until (/ing/)
=> "Testing"
>> Ss.pos
=> 7
>> Ss.peek (7)
=> "string"
>> Ss.unscan
=> #<stringscanner 0/23 @ "Testi ..." >
>> Ss.pos
=> 0
>> Ss.skip (/test/)
=> 4
>> Ss.rest
=> "ing string scanning"
String#split

Split can separate a string into multiple substrings, and the returned substrings are in an array. Split can use regular expressions or plain text as delimiters.

Try this:

>> "Ruby". Split (//)
=> ["R", "U", "B", "Y"]
Converts the content of a text-based configuration file into Ruby's data structure.

>> line = "First_name=matt;last_name=damon;country=usa"
=> "First_name=matt;last_name=damon;country=usa"
>> record = Line.split (/=|;/)
=> ["First_Name", "Matt", "Last_Name", "Damon", "Country", "USA"]
Hash

>> data = []
=> []
>> record = Hash[*line.split (/=|;/)]
=> {"first_name" => "Matt", "Last_Name" => "Damon", "Country" => "USA"}
>> Data.push (record)
=> [{"First_Name" => "Matt", "Last_Name" => "Damon", "Country" => "USA"}]
The second argument of split, you can set the number of items returned:

>> "A,b,c,d,e". Split (/,/,3)
=> ["A", "B", "C,d,e"]
sub/sub! and gsub/gsub!

Sub and gsub, you can modify the contents of the string. Gsub modifies the entire string, and the sub modifies only one place at most.

Sub

>> "hit hit". Sub (/i/, "O")
=> "Hot hit"
Code block:

>> "hit". Sub (/i/) {|s| s.upcase}
=> "HIt"
Gsub

>> "hit hit". Gsub (/i/, "O")
=> "Hot Hot"
Capture

>> "OHt". Sub ([A-z]) ([A-z])/, ' \2\1 '
=> "Hot"
>> double every word. Gsub (/\b (\w+)/, ' \1 \1 ')
=> "Double every every word word"
= = = and grep

===

All Ruby objects Recognize = = This information, if you do not cover it, it is the same as = =. If you cover = =, then its function is a new meaning.

In regular expressions, the meaning of = = = is the test of the match.

Puts "match!" if Re.match (String)
Puts "match!" if string =~ re
Puts "match!" if re = = string
Try this:

Print "Continue?" (y/n) "
Answer = gets
Case answer
When/^y/i
Puts "great!"
When/^n/i
Puts "bye!"
Exit
Else
Puts "Huh?"
End
Grep

>> ["USA", "UK", "France", "Germany"].grep (/[a-z]/)
=> ["France", "Germany"]
Select can also:

["USA", "UK", "France", "Germany"].select {|c|/[a-z]/= = c}
Code block:

>> ["USA", "UK", "France", "Germany"].grep (/[a-z]/) {|c| c.upcase}
=> ["FRANCE", "GERMANY"]

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.