Ruby Regex學習筆記

最後更新：2017-01-13 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

Ruby Regex的字面構造器：

//
試一下：

>> //.class
=> Regexp
模式比對有兩個部分組成，一個Regex（regexp），還有一個字串。Regex預測字串，字串要麼滿足預測，要麼不滿足。

看看是不是匹配可以使用 match 方法。做個實驗：

>> puts "匹配！" if /abc/.match("the alphabet starts with abc.")
匹配！
=> nil
>> puts "匹配！" if "the alphabet starts with abc.".match(/abc/)
匹配！
=> nil
除了 match 方法，還有個模式比對操作符 =~ ，把它放在字串與Regex的中間用：

>> puts "匹配！" if /abc/ =~ "the alphabet starts with abc."
匹配！
=> nil
>> puts "匹配！" if "the alphabet starts with abc." =~ /abc/
匹配！
=> nil
沒有匹配就會返回 nil 。有匹配的話，match 與 =~ 返回的東西不一樣。 =~ 返回是匹配開始的字元的數字索引，match 返回的是 MatchData 類的一個執行個體。做個實驗：

>> "the alphabet starts with abc" =~ /abc/
=> 25
>> /abc/.match("the alphabet starts with abc")
=> #<MatchData "abc">
匹配模式

// 中間的東西可不是字串，它是你對字串做的預測與限制。

字面字元

Regex裡的字面字元匹配它自己，比如：

/a/
匹配的就是字母 a 。

有些字元有特別的意思，如果你不想讓它表達特別的意思，可以使用 \ 線 escape 一下它：

/\?/
萬用字元

. 表示除了分行符號以外的任一字元。

/h.t/
匹配 hot，hit ...

字元類

字元類會在一組方括弧裡：

/h[oi]t/
在上面的Regex裡，那個字元類的意思是匹配 o 或 i 。也就是上面這個模式會匹配 “hot”，“hit”，但不匹配 “h!t” 或其它的東西。

下面這個字元類匹配小寫字元 a 到 z ：

/[a-z]/
^ 符號在字元類裡表示否定：

/[^a-z]/
匹配十六進位數字，在字元類裡可能得用幾個字元範圍：

/[A-Fa-f0-9]/
匹配數字 0 - 9：

/[0-9]/
0 - 9 太常用了，所以還有個簡單的形式，d 表示 digit：

/\d/
數字，字元，還有底線，w 表示 word：

/\w/
空白，比如空格，tab，還有分行符號，s 表示 space：

/\s/
大寫的表示否定形式：\D，\W，\S。

匹配

給你 yes/no 結果的匹配的操作：

regex.match(string)
string.match(regex)
子匹配

比如我們有行文字是關於一個人的：

Peel,Emma,Mrs.,talented amateur
我想得到人的 last name，還有 title。我們知道欄位是用逗號分隔開的，我們也知道順序：last name，first name，title，occupation 。

首先是一些字母字元，
然後是一個逗號，
然後又是一些字母字元，
接著還是一個逗號
然後是 Mr. 或 Mrs.
匹配上面這種字元器的模式：

/[A-Za-z]+,[A-Za-z]+,Mrs?\./
s? 意思是這個 s 可以有也可以沒有。這樣 Mrs? 也就會匹配 Mr 或 Mrs 。在 irb 上做個實驗：

>> /[A-Za-z]+,[A-Za-z]+,Mrs?\./.match("Peel,Emma,Mrs.,talented amateur")
=> #<MatchData "Peel,Emma,Mrs.">
我們得到了一個 MatchData 對象。現在我們要 Pell 還有 Mrs 怎麼辦？可以使用括弧對匹配模式分組：

/([A-Za-z]+),[A-Za-z]+,(Mrs?\.)/
再試試這個匹配模式：

>> /([A-Za-z]+),[A-Za-z]+,(Mrs?\.)/.match("Peel,Emma,Mrs.,talented amateur")
=> #<MatchData "Peel,Emma,Mrs." 1:"Peel" 2:"Mrs.">
使用 $1 可以得到第一個分組裡的匹配，$2 可以得到第二個分組裡的匹配：

>> puts $1
Peel
=> nil
>> puts $2
Mrs.
=> nil
匹配成功與失敗

沒找到匹配，返回的值就是 nil，試試：

>> /a/.match("b")
=> nil
如果匹配成功會返回 MatchData 對象，它的布爾值是 true。還有些關於匹配的資訊，比如匹配在哪裡開始，覆蓋了多少字串，在分組裡獲得了什麼等等。

想使用 MatchData 得先把它儲存起來。練習一下：

string = "我的電話號碼是 (123) 555-1234."
phone_re = /$(\d{3})$\s+(\d{3})-(\d{4})/
m = phone_re.match(string)

unless m
puts "沒有匹配"
exit
end

print "整個字串："
puts m.string
print "匹配："
puts m[0]
puts "三個分組："
3.times do |index|
puts "#{index + 1}：#{m.captures[index]}"
end
puts "得到第一個分組匹配的內容："
puts m[1]
結果是：

整個字串：我的電話號碼是 (123) 555-1234.
匹配：(123) 555-1234
三個分組：
1：123
2：555
3：1234
得到第一個分組匹配的內容：
123
得到捕獲的兩種方法

從 MatchData 對象裡得到匹配模式分組捕獲到的內容：

m[1]
m[2]
...
m[0] 得到的是匹配的全部內容。

另一種得到分組捕獲內容的方法是使用 captures 方法，它返回的是一個數組，數組裡的項目就是捕獲的子字串。

m[1] == m.captures[0]
m[2] == m.captures[1]
再看個例子：

>> /((a)((b)c))/.match("abc")
=> #<MatchData "abc" 1:"abc" 2:"a" 3:"bc" 4:"b">
命名捕獲

>> re = /(?<first>\w+)\s+((?<middle>\w\.)\s+)?(?<last>\w+)/
匹配：

>> m = re.match("Samuel L. Jackson")
=> #<MatchData "Samuel L. Jackson" first:"Samuel" middle:"L." last:"Jackson">
>> m[:first]
=> "Samuel"
>> m[:last]
=> "Jackson"
MatchData 的其它資訊

接著之前的電話號碼的例子：

print "匹配之前的部分："
puts m.pre_match

print "匹配之後的部分："
puts m.post_match

print "第二個捕獲開始字元："
puts m.begin(2)

print "第三個捕獲結束字元："
puts m.end(3)
輸出的結果是：

匹配之前的部分：我的電話號碼是
匹配之後的部分：.
第二個捕獲開始字元：14
第三個捕獲結束字元：22
begin 與 end 方法待驗證。
Quantifiers，Anchors，Modifiers

Quantifiers（限定符），Anchors（標記），Modifiers（修飾符）。

限定符

限定符可以指定在匹配裡某個東西要匹配的次數。

零或一

?
例：

/Mrs?/
s 可以出現零次或一次。

零或多

*
例：

/\d*/
一或多

+
例：

/\d+/
Greedy quantifier

*與+ 這兩個限定符都很 greedy。意思就是它們會儘可能的匹配更多的字元。

觀察下面這個例子裡的 .+ 匹配的是什麼：

>> string = "abc!def!ghi!"
=> "abc!def!ghi!"
>> match = /.+!/.match(string)
=> #<MatchData "abc!def!ghi!">
>> puts match[0]
abc!def!ghi!
=> nil
你可能期望返回的是子字元 "abc!" ，不過我們得到的是 "abc!def!ghi!"。限定符 + 貪婪的吃掉了它能覆蓋的所有的字元，一直到最後一個 ! 號結束。

我們可以在 + 與 * 後面添加一個 ? 號，讓它們不那麼貪婪。再試一下：

>> string = "abc!def!ghi!"
=> "abc!def!ghi!"
>> match = /.+?!/.match(string)
=> #<MatchData "abc!">
>> puts match[0]
abc!
=> nil
再做個實驗：

>> /(\d+?)/.match("Digits-R-Us 2345")
=> #<MatchData "2" 1:"2">
>> puts $1
2
=> nil
再看個匹配：

>> /\d+5/.match("Digits-R-Us 2345")
=> #<MatchData "2345">
這樣再試一下：

>> /(\d+)(5)/.match("Digits-R-Us 2345")
=> #<MatchData "2345" 1:"234" 2:"5">
具體重複的次數

把次數放到 {} 裡。下面匹配的是三個數字，小橫線，接著是四個數字：

/\d{3}-\d{4}/
也可能是一個範圍，下面匹配的是 1 到 10 個數字：

/\d{1,10}/
大括弧裡第一個數字是最小值，下面匹配的是 3 個或更多的數字：

/\d{3,}/
括弧的限制

>> /([A-Z]){5}/.match("Matt DAMON")
=> #<MatchData "DAM" 1:"N">
你期望的匹配可能是 DAMON，但實際匹配的是 N 。如果你想匹配 DAMON ，需要這樣做：

>> /([A-Z]{5})/.match("Matt DAMON")
=> #<MatchData "DAMON" 1:"DAMON">
anchors 與 assertions

anchors（標記，錨）與 assertions（斷言）：在處理字元匹配之前先要滿足一些條件。

^ 表示行的開始，$ 行的結尾。

Ruby 裡的注釋是 # 號開頭的，匹配它的模式可以像這樣：

/^\s*#/
^ 匹配的是行的最開始。

anchors

^：行的開始
$：行的結尾
\A：字串的開始
\z：字串的結尾
\Z：字串的結尾，模式：/from the earth.\Z/，匹配："from the earth\n"
\b：字邊界
lookahead assertions

你想匹配一組數字，它的結尾必須有點，但你不想在匹配的內容裡包含這個點，可以這樣做：

>> str = "123 456. 789"
=> "123 456. 789"
>> m = /\d+(?=\.)/.match(str)
=> #<MatchData "456">
lookbehind assertions

我要匹配 Damon，但必須它的前面得有 matt。

模式：

/(?<=Matt )Damon/
試一下：

>> /(?<=Matt )Damon/.match("Matt Damon")
=> #<MatchData "Damon">
>> /(?<=Matt )Damon/.match("Matt1 Damon")
=> nil
我要匹配 Damon，但它的前面不能是 matt 。

模式：

/(?<!Matt )Damon/
試一下：

>> /(?<!Matt )Damon/.match("Matt Damon")
=> nil
>> /(?<!Matt )Damon/.match("Matt1 Damon")
=> #<MatchData "Damon">
不捕獲

使用：?:

>> str = "abc def ghi"
=> "abc def ghi"
>> m = /(abc) (?:def) (ghi)/.match(str)
=> #<MatchData "abc def ghi" 1:"abc" 2:"ghi">
條件匹配

條件運算式：(?(1)b|c) ，如果擷取到了 $1，就匹配 b ，不然就匹配的是 c ：

>> re = /(a)?(?(1)b|c)/
=> /(a)?(?(1)b|c)/
>> re.match("ab")
=> #<MatchData "ab" 1:"a">
>> re.match("b")
=> nil
>> re.match("c")
=> #<MatchData "c" 1:nil>
有名字的：

/(?<first>a)?(?(<first>)b|c)/
modifiers

i 這個修飾符表示不區分大小寫：

/abc/i
m 表示多行：

/abc/m
x 可以改變Regex解析器對待空格的看法，它會忽略掉在Regex裡的空格，除了你用 \ 符號 escape 的空白。

/
$(\d{3})$ # 3 digits inside literal parens (area code)
   \s         # One space character
(\d{3})      # 3 digits (exchange)
    -         # Hyphen
(\d{4})      # 4 digits (second part of number
/x
轉換字串與Regex

string-to-regexp

在Regex裡使用插值：

>> str = "def"
=> "def"
>> /abc#{str}/
=> /abcdef/
如果字串裡包含在Regex裡有特別意義的字元，比如點（.）：

>> str = "a.c"
=> "a.c"
>> re = /#{str}/
=> /a.c/
>> re.match("a.c")
=> #<MatchData "a.c">
>> re.match("abc")
=> #<MatchData "abc">
你可以 escape 這些特殊的字元：

>> Regexp.escape("a.c")
=> "a\\.c"
>> Regexp.escape("^abc")
=> "\\^abc"
這樣再試試：

>> str = "a.c"
=> "a.c"
>> re = /#{Regexp.escape(str)}/
=> /a\.c/
>> re.match("a.c")
=> #<MatchData "a.c">
>> re.match("abc")
=> nil
也可以：

>> Regexp.new('(.*)\s+Black')
=> /(.*)\s+Black/
這樣也行：

>> Regexp.new('Mr\. David Black')
=> /Mr\. David Black/
>> Regexp.new(Regexp.escape("Mr. David Black"))
=> /Mr\.\ David\ Black/
regexp-to-string

Regex可以使用字串的形式表示它自己：

>> puts /abc/
(?-mix:abc)
inspect：

>> /abc/.inspect
=> "/abc/"
使用Regex的方法

Ruby 裡的一些方法可以使用Regex作為它們的參數。

比如在一個數組裡，你想找出字元長度大於 10 ，並且包含一個數位項目：

array.find_all {|e| e.size > 10 and /\d/.match(e) }
String#scan

找到一個字串裡包含的所有的數字：

>> "testing 1 2 3 testing 4 5 6".scan(/\d/)
=> ["1", "2", "3", "4", "5", "6"]
分組：

>> str = "Leopold Auer was the teacher of Jascha Heifetz."
=> "Leopold Auer was the teacher of Jascha Heifetz."
>> violinists = str.scan(/([A-Z]\w+)\s+([A-Z]\w+)/)
=> [["Leopold", "Auer"], ["Jascha", "Heifetz"]]
可以這樣用：

violinists.each do |fname,lname|
puts "#{lname}'s first name was #{fname}."
end
輸出的是：

Auer's first name was Leopold.
Heifetz's first name was Jascha.
合并到一塊兒：

str.scan(/([A-Z]\w+)\s+([A-Z]\w+)/) do |fname, lname|
puts "#{lname}'s first name was #{fname}."
end
再做個實驗：

"one two three".scan(/\w+/) {|n| puts "Next number: #{n}" }
輸出的是：

Next number: one
Next number: two
Next number: three
如果你提供了一個代碼塊，scan 不會儲存把結果儲存到一個數組裡，它會把每個結果都發送給代碼塊，然後扔掉結果。也就是你可以 scan 一個很長的東西，不用太擔心記憶體的問題。

StringScanner

StringScanner 在 strscan 擴充裡，它裡面提供了一些掃描與檢查字串的工具。可以使用位置與指標移動。

>> require 'strscan'
=> true
>> ss = StringScanner.new("Testing string scanning")
=> #<StringScanner 0/23 @ "Testi...">
>> ss.scan_until(/ing/)
=> "Testing"
>> ss.pos
=> 7
>> ss.peek(7)
=> " string"
>> ss.unscan
=> #<StringScanner 0/23 @ "Testi...">
>> ss.pos
=> 0
>> ss.skip(/Test/)
=> 4
>> ss.rest
=> "ing string scanning"
String#split

split 可以把一個字串分離成多個子字串，返回的這些子字串會在一個數組裡。split 可以使用Regex或者純文字作為分隔字元。

試一下：

>> "Ruby".split(//)
=> ["R", "u", "b", "y"]
把一個基於文字的設定檔的內容轉換成 Ruby 的資料結構。

>> line = "first_name=matt;last_name=damon;country=usa"
=> "first_name=matt;last_name=damon;country=usa"
>> record = line.split(/=|;/)
=> ["first_name", "matt", "last_name", "damon", "country", "usa"]
hash：

>> data = []
=> []
>> record = Hash[*line.split(/=|;/)]
=> {"first_name"=>"matt", "last_name"=>"damon", "country"=>"usa"}
>> data.push(record)
=> [{"first_name"=>"matt", "last_name"=>"damon", "country"=>"usa"}]
split 的第二個參數，可以設定返回的項目數：

>> "a,b,c,d,e".split(/,/,3)
=> ["a", "b", "c,d,e"]
sub/sub! 與 gsub/gsub!

sub 與 gsub，可以修改字串裡的內容。gsub 修改整個字串，sub 最多隻修改一個地方。

sub

>> "hit hit".sub(/i/,"o")
=> "hot hit"
代碼塊：

>> "hit".sub(/i/) {|s| s.upcase}
=> "hIt"
gsub

>> "hit hit".gsub(/i/,"o")
=> "hot hot"
捕獲

>> "oHt".sub(/([a-z])([A-Z])/, '\2\1')
=> "Hot"
>> "double every word".gsub(/\b(\w+)/, '\1 \1')
=> "double double every every word word"
=== 與 grep

===

所有的 Ruby 對象都認識 === 這個資訊，如果你沒覆蓋它的話，它跟 == 是一樣的。如果你覆蓋了 === ，那它的功能就是新的意思了。

在Regex裡，=== 的意思是匹配的測試。

puts "Match!" if re.match(string)
puts "Match!" if string =~ re
puts "Match!" if re === string
試一下：

print "Continue? (y/n) "
answer = gets
case answer
when /^y/i
puts "Great!"
when /^n/i
puts "Bye!"
exit
else
puts "Huh?"
end
grep

>> ["USA", "UK", "France", "Germany"].grep(/[a-z]/)
=> ["France", "Germany"]
select 也可以：

["USA", "UK", "France", "Germany"].select {|c| /[a-z]/ === c }
代碼塊：

>> ["USA", "UK", "France", "Germany"].grep(/[a-z]/) {|c| c.upcase }
=> ["FRANCE", "GERMANY"]

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More