Most of Ruby's built-in types are similar to other programming languages. Mainly have strings,integers,floats,arrays and so on. However, only scripting languages, such as Ruby,perl, and awk, provide support for built-in expression types. Regular expressions, though relatively covert, are a powerful tool for text processing.
A regular expression is an easy way to match a string with a specified pattern. In Ruby, the typical way to create regular expressions is to write patterns between two slash/pattern/.
After all, Ruby is Ruby, and regular expressions are objects and can be manipulated like objects.
For example, you can use the following regular expression to write a pattern that matches a string containing Perl or Python.
<!--more-->
/perl| python/
In the forward slash body, there are two strings that we want to match, they use the "|" Separated. This pipe character means "left or right", which is Perl or Python in this pattern.
You can also use parentheses in a pattern, just as you would in an arithmetic expression, so this pattern can also be written as
You can also specify duplicates in the pattern. For example, a plus,/ab+c/matches a string with one or more b followed by a C. To replace the plus sign with an asterisk, the regular expression created by/ab*c/is matched by one followed by 0 or more b followed by a C.
You can also match a set of characters in a pattern. Common examples of character types are \s, which match a blank character (Space,tab, newline, etc.); \d match any number; \w match any typical word character. Period (.) Matches (basically) any character.
We combine all of these together to make a useful regular expression.
/\d\d:\d\d:\d\d/# A time such as 12:34:56
/perl.*python/ # Perl, zero or more other chars, then Python
/perl python/ # Perl, a space, and Python
/perl *python/ # perl, zero or more spaces, and Python
/perl +python/< c8/># Perl, one or more spaces, and Python
/perl\s+python/# perl, whitespace characters, then Python
/ruby (perl | Python)/# Ruby, a space, and either Perl or Python
It's a depressing thing to create a pattern and not be able to use it. Match operator =~ is used to match a regular expression of a string. If the match succeeds, =~ returns the position where the first match succeeded, otherwise it returns nil. In other words, you can use regular expressions in the if and while conditional declarations. For example, the following code fragment,
If the string contains text perl or Python, output a piece of information.
Puts "scripting language mentioned: #{line}" if line =~/perl| python/
You can use Ruby to replace all the places where Perl and Python appear.
Line.gsub (/perl| python/, ' Ruby ')
Pick an example from Ihower's Ruby on Rails combat Bible, using regular expressions to crawl mobile numbers:
Phone = "139-1234-5678"
if phone =~/(\d{3})-(\d{4})-(\d{4})/
Start_with =
mid_num = $
end_as = $3
end
General rules (for normal display, all within blocks of code)
- /a/match character A.
- /\?/Match Special characters? Special characters include ^, $,? , ., /, \, [, ], {, }, (, ), +, *.
- . matches any character, such as/a./to match AB and AC.
- /[ab]c/matches the range between AC and bc,[]. For example:/[a-z]/,/[a-za-z0-9]/.
- /[^a-za-z0-9]/matches a string that is not in the range.
- /[\d]/represents any number
- /[\w]/represents any letter, number, or _
- /[\s]/represents white space characters, including spaces, tab, and line wrapping.
- /[\d]/,/[\w]/,/[\s]/is the above negative situation.
Advanced Rules
- ? represents 0 or 1 characters. /mrs?\.? /Match "Mr", "Mrs", "Mr.", "Mrs.".
- * represents 0 or more characters. /hello*/matches "Hello", "Hellojack".
- + represents 1 or more characters. /a+c/matching: "abc", "Abbdrec" and so on.
- /d{3}/matches 3 digits.
- /d{1,10}/matches 1-10 digits.
- /d{3,}/matches 3 digits above.
- /([a-z]\d) {5}/match first in uppercase letters, followed by 4 digits of string.
Regular expression operations
Both string and RegExp support =~ and match two query matching methods:
Puts "I can say my name" =~/name/#-> a
=/name/.match ("I can say me name, my name I can say") #-> A is MATC Hdata
puts A[0] #-> name
As you can see, if you can match, =~ returns the matching string position, and match returns a Matchdata object. If it does not match, it returns nil. Matchdata can take out the contents of each child matching (or child mode) and look at the following example:
B1=/[a-za-z]+,[a-za-z]+,mrs?\./.match ("Jack,wang,mrs., nice person")
puts B1[0] #-> Jack,wang,mrs
( [a-za-z]+), ([a-za-z]+)], Mrs?\./.match ("Jack,wang,mrs., nice person:)
puts B2[0] #-> Jack,wang,mrs
puts [1] #-> Jack,wang
puts b2[2] #-> Jack
puts B2[3] #-> Wang
M[0] Returns a string that matches the main expression, and the following method is equivalent: M[n]==m.captures[n]
Ruby also automatically fills us with global variables that are named in numbers, $, $, and so on, and contain the strings that match the first pair of parentheses in the regular expression starting from the left, and so on. We see the sequence of matches, from outside to inside, from left to right.
Greedy quantifiers and non-greedy quantifiers
Quantifiers * (representing 0 or more) and + (representing one or more) are greedy, they match as many characters as possible, we can add one after * and +, and make it a non greedy quantifier:
The following code is: 1 or more characters followed by an exclamation point.
Teststr= "abcd!efg!"
Match=/.+!/.match (TESTSTR)
puts Match[0] #-> abcd!efg!
limitmatch=/.+?! /.match (TESTSTR)
puts Limitmatch[0] #-> abcd!
Anchor
An anchor is a condition that must be packed to continue to match:
- ^ Beginning of line
- $ line End
- \a the beginning of a string
- End of \z string
- The end of the \z string (excluding the last line feed)
- \b Word boundaries
C=/\b\w+\b/.match ("!!) stephen** ")
puts C[0] #-> Stephen
Pre-View Assertion
Pre-View assertion indicates a desire to know what the next specified is, but does not match
A positive ex-view assertion (?) =)
Let's say we want to match the sequence of a number, which ends with a dot, but does not want the dot as part of the pattern match.
Teststr= "123 456 789. 012 "
m=/\d+ (? =\.) /.match (TESTSTR)
puts M[0] #-> 789
Negative pre-View assertion (?!)
In the example above, if/\d+ (? =\.) /Change to/\d+ (?!) \.) /, the puts M[0] output is displayed as 123.
Modifier language
The modifier is at the back of the forward slash of the regular expression's most-closed regular expression
1.I makes regular expressions insensitive to case-insensitive
For example,/abc/i can match abc,abc,abc, and so on.
2.M allows regular expressions to match any character , including line breaks, which typically do not match a newline character with a dot wildcard.
Conversion between strings and regular expressions
Insert regular expression inside string
Teststr= "A.C"
re=/#{regexp.escape (TESTSTR)}/puts
-Re.match ("A.C") [0] #-> a.c test=re.match
("abc")
puts Test[0] #-> Nil
The regular expression is converted to a string
Puts/abc/.inspect #->/abc/
Common ways to use regular expressions:
- For if and while, etc.
- For Gsub, grep, etc.
- For Find_all, scan, etc.
For example, puts "Test 1 2 and Test 3 4". Scan (/\d/) outputs ["1", "2", "3", "4"].