The following examples illustrate the use of regular expressions to handle problems that are common in some jobs.
1.
Regexp_substr
The REGEXP_SUBSTR function uses regular expressions to specify the start and end points of the return string, and to return the same string as the VARCHAR2 or CLOB data in the source_string character set.
Grammar:
--1.regexp_substr the same as the SUBSTR function, returns the intercepted substring
Regexp_substr (srcstr, pattern [, position [, occurrence [, match_option]]]
Note:
Srcstr
SOURCE string
Pattern
Regular expression Style
Position
Start matching character position
Occurrence
Number of matches occurred
Match_option
Matching Options (case-sensitive)
1.1 Intercepting a substring from a string
SELECT regexp_substr (' 1psn/231_3253/abc ', ' [: alnum:]]+ ') from dual;
Output:1psn
[: alnum:]]+ represents a match of 1 or more alphabetic or numeric characters.
SELECT regexp_substr (' 1psn/231_3253/abc ', ' [: alnum:]]+ ', 1, 2) from dual;
output:231
There are two more parameters than the previous example
1
Indicates that the start of the first character of the source string finds a match
2
The string that represents the 2nd-time match (the default is "1", as in the previous example)
Select Regexp_substr (' @@/231_3253/abc ', ' @*[[:alnum:]]+ ') from dual;
output:231
@* means match 0 or more @
[: alnum:]]+ to match 1 or more alphabetic or numeric characters
Note: Pay attention to the difference between "+" and "*"
Select Regexp_substr (' 1@/231_3253/abc ', ' @+[[:alnum:]]* ') from dual;
Output: @
@+ means match 1 or more @
[: alnum:]]* to match 0 or more alphabetic or numeric characters
Select Regexp_substr (' 1@/231_3253/abc ', ' @+[[:alnum:]]+ ') from dual;
Output:null
@+ means match 1 or more @
[: alnum:]]+ to match 1 or more alphabetic or numeric characters
Select Regexp_substr (' @1psn/231_3253/abc125 ', ' [:d igit:]]+$ ') from dual;
output:125
[[:d igit:]]+$ represents characters that match the end of 1 or more digits
Select Regexp_substr (' @1psn/231_3253/abc ', ' [^[:d igit:]]+$ ') from dual;
Output:/ABC
[^[:d igit:]]+$ to match 1 or more characters that are not the end of a number
Select Regexp_substr (' Tom_Kyte@oracle.com ', ' [^@]+ ') from dual;
Output:tom_kyte
[^@]+ to match 1 or more characters that are not "@"
Select Regexp_substr (' 1psn/231_3253/abc ', ' [: alnum:]]* ', 1,2)
from dual;
Output:null
[: alnum:]]* to match 0 or more letters or numeric characters
Note: Because it matches 0 or more, the 2nd match here is "/" (matched 0 times) instead of "231", so the result is "Null"
1.2 Match repeat Occurrence
Find 2 consecutive lowercase letters
SELECT regexp_substr (' Republicc of AFRICAA ', ' ([A-z]) \1 ', 1, 1, ' I ')
from dual;
Output:cc
([A-z])
Indicates lowercase letter A-Z
\1
Indicates the number of consecutive occurrences of the preceding character
1
Represents the start of a match from the 1th character of the source string
1
1th occurrence of characters that match the result
I
Indicates case sensitivity
1.3 Other matching styles
Find web address information
SELECT regexp_substr (' Go to http://www.oracle.com/products and click on database ', ' http://' ([: alnum:]]+\.?) {3,4}/? ') Result
From dual
Output:http://www.oracle.com
which
http://
Represents a matching string "http://"
([: alnum:]]+\.?) Matches 1 or more letters or numeric characters, followed by 0 or 1 times comma characters
{3,4}
Matches the preceding characters at least 3 times, up to 4 times
/?
Indicates that a backslash character is matched 0 or 1 times
Extracts the third value in a CSV string
SELECT regexp_substr (' 1101,yokohama,japan,1.5.105 ', ' [^,]+ ', 1, 3) as output
from dual;
Output:japan
which
[^,]+
Matches 1 or more characters that are not commas
1
Indicates that the start of the first character of the source string finds a match
3
The string that represents the 3rd time to match
Note: This is usually used to implement the string of the biography line
--The biography line of the string
SELECT regexp_substr (' 1101,yokohama,japan,1.5.105 ', ' [^,]+ ', 1, level) as output
From dual
CONNECT by Level <= length (' 1101,yokohama,japan,1.5.105 ')-
Length (REPLACE (' 1101,yokohama,japan,1.5.105 ', ', ')) + 1;
Output:1101
Yokohama
Japan
1.5.105
This loops through the level to intercept the matched string.
In the following example, find out if the source string contains kid, kids, or kidding these three strings
SELECT case
When Regexp_like (' Why does a kid enjoy kidding and kids only? ')
' Kid (s|ding) *,
' I ') THEN
' Match Found '
ELSE
' No Match Found '
End as output
from dual;
Output:match Found
which
Kid
Represents a String kid
(s|ding) *
Matches 0 or more times the character "s" or "ding"
I
Indicates case insensitive
2.
Regexp_instr
The REGEXP_INSTR function uses regular expressions to return the start and end points of the search pattern. The syntax for REGEXP_INSTR is shown below. REGEXP_INSTR Returns an integer that indicates the start or end of the search pattern and returns 0 if no matching value is found.
Grammar:
--2.regexp_instr is the same as the InStr function, returns the string position
Regexp_instr (srcstr, pattern [, position [, occurrence [, return_option [, Match_option]]]
Like Regexp_substr, it also has variable pattern, position (starting position), occurrence and match_parameter; Here is a brief introduction to the role of the new parameter return_option, It allows the user to tell Oracle what to return when the pattern appears.
As shown in the following example:
--If return_option is 0, Oracle returns the position where the first character appears. This is the default value and is the same as InStr
SELECT regexp_instr (' Abc1def ',
' [[:d igit:]] ' output
from dual;
Output:4
-If Return_option is 1, Oracle returns the position of the next character following the occurrence of the searched character.
-for example, the following query returns the position after the first digit found in the string:
SELECT regexp_instr (' Abc1def ',
' [[:d igit:]] ', 1,1,1 output
from dual;
Output:5
Oracle Regular Expressions (regularexpression)
Metacharacters |
Character meaning |
Give an example to explain |
^ |
Matches the start position of the string, which is used in [], at which point it indicates that the character set is not accepted. |
^a: Matches a string starting with a [^a]: matches a string that does not contain a |
- |
When used in the A-M expression range; When used in the first character Hyphen string, such as [-ABC] |
|
$ |
Match character End position |
' a$ ': matches a string ending with a |
. |
Matches any single character except the newline character N. |
|
? |
Match the preceding subexpression 0 times or once |
TR (y (ing)?) : can represent try or trying |
* |
Match the preceding subexpression 0 or more times |
|
+ |
Matches the preceding subexpression one or more times |
|
() |
Marks the start and end position of a subexpression |
A (b) * can match Ab,abb,abbb,youabb (c|d) match C or D |
[] |
Mark a Bracket expression |
[CD] Matching select C or D is equivalent to (c|d). The match is given, [^CD] is a single character that matches the c,d. [A-z] denotes all lowercase letters |
{M,n} |
m= < occurrences <=n, ' {m} ' indicates that M times, ' {m,} ' indicates that at least m times appear. |
|
| |
The expression or relationship. Indicates a choice between two items |
|
Character clusters |
Character meaning |
|
[[: Alpha:]] |
Any letter. |
|
[[:d Igit:]] |
[[:d Igit:]] any number. |
|
[[: Alnum:]] |
Any letter and numbers |
|
[[: Space:]] |
Any white character. |
|
[[: Upper:]] |
Any uppercase letters. |
|
[[: Lower:]] |
Any lowercase letter. |
|
[[:p UNCT:]] |
Any punctuation. |
|
[[: Xdigit:]] |
Any number in the 16 binary equivalent to [0-9a-fa-f]. |
|
Oracle supports built-in functions for regular expressions
Name |
Grammar |
Note |
Regexp_like |
Regexp_like (Source_string, Pattern [, Match_parameter] ) |
Source_string: SOURCE string Pattern: Regular expressions Match_parameter: Match pattern (I: case-insensitive; c: case-sensitive; N: Allows the use of an operator that can match any string; M: Takes X as a string that contains multiple lines. |
Regexp_replace |
Regexp_replace (Source_string, Pattern [, Replace_string] [, Position] [, Occurtence] [, Match_parameter] ) |
Replace_string: String used for substitution Position: Start the search starting position Occurtence Specifies the substitution of the nth occurrence of the string Other ibid. |
Regexp_substr |
Regexp_substr (Source_string, pattern [, Position [, Occurrence [, Match_parameter]]] ) |
Position: Specifies the exact position in the string, which defaults to 1 Occurrence: Specifies which string should match, relative to other strings, during source string matching. For example Select Regexp_substr (' The ZIP code 80831 is for Falcon, CO ', ' [[: Alpha:]]{3,} ', 1, 3] from dual; The result is code rather than the or zip. |
Regexp_instr |
Regexp_instr (Source_string, Pattern [, Start_position [, Occurrence [, Return_option [, Match_parameter]]]] ) |
Start_position: Start Search Location Occurrence: The nth occurrence pattern, the default is 1 Return_option: Starting position of 0:pattern 1:pattern Next character start position Default is 0 |
Regexp_count |
Regexp_count (Source_string, Pattern [[, Start_position] [, Match_parameter]]) |
11g a new function that represents the number of occurrences of pattern in the original string Start_position: Where to start the search |
Oracle supports the example of regular expression functions.
regexp_substr: Used to extract part of a string.
Regexp_substr (' filed, second filed, Thirdfiled ', ', [^,]*, ']
Pattern is ', [^,]*, ' means, start comma, middle empty one, consecutive 0 or more non-comma characters, last or comma, so, return is ', second filed, '.
-------------------------------------------------------------------------------
Regexp_instr: The starting position used for pattern.
If you have a set of addresses, return to the location of the ZIP code
Regexp_instr (' Jone smith,10045 Berry lane,sanjoseph,ca
91234-1234 ', ' [[:d igit:]]{5} ([-[[:d igit:]]{4}])
Obviously the result position is 91234-1234 corresponding to the starting position.
-------------------------------------------------------------------------------
regexp_replace: To replace pattern with a string that needs to be replaced, the function is more flexible than the traditional replace function. For example:
' Jone Smith ' has three spaces, and ' Jone Smith ' has 2 spaces, to change the middle space to a space, use replace to use 2 times, but with regular expressions, we can design the pattern as
' () {2,} ' so
Regexp_replace (MC, ' () {2,} ', ')
-------------------------------------------------------------------------------
Regexp_like is the enhanced version of like, which can contain _ and% wildcard characters, and is used in where conditions.
The regular expression attribute---the back reference
The content that matches the expression is cached in the buffer and is numbered from the left-right to/digit (digit 1-9). And the subexpression is expressed in parentheses.
1. Application in Regexp_replace:
The string ' AA bb cc ' is written as ' cc BB AA ' can be designed as follows:
Regexp_replace (' AA bbcc ', ' () (. *) (. *) ', ' \3 \2 \1 ')
(. *) represents any combination of strings, three such combinations are separated by spaces, match the source string, and \1,\2,\3 stores the values of three-word expressions, so that we can use this to achieve the goal.
2. Application in REGEXP_SUBSTR:
Find repeated alphanumeric values separated by spaces
Regexp_substr (' The finaltest is the Implention ', ' ([[: alnum:]]+) ([[: space:]]+] \1 ')
Returns the result to be is. ([[: alnum:]]+) ([[: space:]]+] Returns a lot of strings, but adding a \1 means returning two identical strings consecutively, so that the duplicate string is found.
Supplements
The regular expression of Oracle
This article abstracts:
The 10g regular expression improves SQL flexibility.
Unrelated blank detection, or the decomposition of multiple regular components of the string, and so on.
10g four new functions that support regular expressions are: Regexp_like, Regexp_instr, Regexp_substr, and Regexp_replace.
Regular expressions replace the old percent semicolon (%) and wildcard character (_) characters.
Matches the start position of the input string, used in a bracket expression, which indicates that the character set is not accepted.
Match the preceding subexpression 0 times or once.
Currently, regular expressions have been widely used in many software applications, including *nix (Linux, UNIX, etc.), HP and other operating systems, Php,c#,java and other development environments.
Oracle 10g Regular expressions improve SQL flexibility. Effectively solves the data validity, the repetition word recognition, the independent blank detection, or decomposes several regular composition
strings, and so on.
The four new functions of Oracle 10g support regular expressions are: Regexp_like, Regexp_instr, Regexp_substr, and Regexp_replace.
They use POSIX regular expressions instead of the old percent semicolon (%) and wildcard character (_) characters.
Special characters:
' ^ ' matches the start position of the input string, used in a bracket expression, which indicates that the character set is not accepted.
' $ ' matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches ' n ' or ' R '.
'. ' matches any single character except the newline character N.
'? ' matches the preceding subexpression 0 times or once.
' * ' matches the preceding subexpression 0 or more times.
' + ' matches the preceding subexpression one or more times.
' () ' marks the beginning and end position of a subexpression.
' [] ' marks a bracket expression.
' {m,n} ' an exact number of occurrences,m=< occurrences <=n, ' {m} ' indicates that M times, ' {m,} ' indicates that at least m times appear.
' | ' indicates a choice between the two items. Example ' ^ ([a-z]+|[ 0-9]+) $ ' represents a string of all lowercase letters or numbers combined.
num matches num, where num is a positive integer. A reference to the match that was obtained.
A useful feature of regular expressions is that you can save a subexpression for later use, known as backreferencing. Allows for complex substitution capabilities
such as adjusting a pattern to a new position or indicating the position of the substituted character or word. The matched subexpression is stored in a temporary buffer.
The buffer is numbered from left to right, accessed by digital notation. The following example lists the name AA BB cc into
CC, BB, aa.
Select regexp_replace (' AA bb cc ', ' (. *) (. *) (. *) (. *) (. *) ', ' 3, 2, 1 ') from dual;
Regexp_replace (' Ellenhildismit
CC, BB, AA
' Escape character.
Character clusters:
[[: Alpha:]] any letter.
[[:d Igit:]] any number.
[[: Alnum:]] any letter or number.
[[: Space:]] any white character.
[[: Upper:]] any uppercase letter.
[[: Lower:]] any lowercase letter.
[[:p UNCT:]] any punctuation.
[[: Xdigit:]] Any number in 16, equivalent to [0-9a-fa-f].
Operation Precedence for various operators
Escape character
(), (?:), (? =), [] parentheses and square brackets
*, +,?, {n}, {n,}, {n,m} qualifier
^, $, anymetacharacter position and order
| "or" action
--Test data
CREATE TABLE Test (MC VARCHAR2 (60));
INSERT into test values (' 112233445566778899 ');
INSERT into test values (' 22113344 5566778899 ');
INSERT into test values (' 33112244 5566778899 ');
INSERT into test values (' 44112233 5566 778899 ');
INSERT into test values (' 5511 2233 4466778899 ');
INSERT into test values (' 661122334455778899 ');
INSERT into test values (' 771122334455668899 ');
INSERT into test values (' 881122334455667799 ');
INSERT into test values (' 991122334455667788 ');
INSERT into test values (' Aabbccddee ');
INSERT into test values (' Bbaaaccddee ');
INSERT into test values (' Ccabbddee ');
INSERT into test values (' Ddaabbccee ');
INSERT into test values (' EEAABBCCDD ');
INSERT into test values (' ab123 ');
INSERT into test values (' 123xy ');
INSERT into test values (' 007ab ');
INSERT into test values (' Abcxy ');
INSERT into test values (' The final test being ' how to find duplicate words. ');
Commit
First, Regexp_like
SELECT * FROM Test where regexp_like (MC, ' ^a{1,3} ');
SELECT * FROM Test where regexp_like (MC, ' a{1,3} ');
SELECT * FROM Test where regexp_like (MC, ' ^a.*e$ ');
SELECT * FROM Test where regexp_like (MC, ' ^[[:lower:]]|[ [:d Igit:]];
SELECT * FROM Test where regexp_like (MC, ' ^[[:lower:] ');
Select MC from Test Where regexp_like (MC, ' [^[:d igit:]] ');
Select MC from Test Where regexp_like (MC, ' ^[^[:d igit:]] ');
Second, Regexp_instr
Select Regexp_instr (MC, ' [:d igit:]]$ ') from test;
Select Regexp_instr (MC, ' [:d igit:]]+$ ') from test;
Select regexp_instr (' The price is $. ', ' $[[:d igit:]]+ ') from DUAL;
Select regexp_instr (' Onetwothree ', ' [^[[:lower:]]] ') from DUAL;
Select regexp_instr (',,,,, ', ' [^,]* ') from DUAL;
Select regexp_instr (',,,,, ', ' [^,] ') from DUAL;
Third, Regexp_substr
SELECT Regexp_substr (MC, ' [a-z]+ ') from test;
SELECT Regexp_substr (MC, ' [0-9]+ ') from test;
SELECT regexp_substr (' aababcde ', ' ^a.*b ') from DUAL;
Four, Regexp_replace
Select regexp_replace (' Joe Smith ', ' () {2,} ', ', ') as rx_replace from dual;
Select regexp_replace (' AA bb cc ', ' (. *) (. *) (. *) (. *) (. *) ', ' 3, 2, 1 ') from dual;
Sql> select * from test;
ID MC
-------------------- ------------------------------------------------------------
A AAAAA
A AAAAA
b bbbbb
Sql> SELECT * FROM Test where regexp_like (ID, ' B ', ' I '); --No data case sensitive
ID MC
-------------------- ------------------------------------------------------------
b bbbbb
#End