Oracle Regular expressions use the introduction of regular expressions with powerful, convenient and efficient text processing capabilities. Ability to add, delete, analyze, overlay, insert, and reshape various types of text and data. Oracle supports regular expressions starting from 10g below are some examples of how regular expressions are used to handle some of the common problems in work. 1. The Regexp_substr regexp_substr function uses a regular expression to specify the starting and ending points of the returned string, returning the same string as the VARCHAR2 or CLOB data in the source_string character set. Syntax:--1.REGEXP_SUBSTR is the same as the SUBSTR function, returning the truncated substring regexp_substr (srcstr, pattern [, position [, occurrence [, match_option]]) Note: srcstr source string
Pattern Regular Expression Style
Position start Match character position
Occurrence number of matches occurred
Match_option Matching options (case sensitive)
1.1 Intercept substrings from a string SELECT regexp_substr (' 1psn/231_3253/abc ', ' [[: alnum:]]+ ') from dual; OUTPUT:1PSN [[: alnum:]]+ indicates a match of 1 or more alphanumeric characters. SELECT regexp_substr (' 1psn/231_3253/abc ', ' [[: alnum:]]+ ', 1, 2) from dual; output:231 compared to the above example, two more parameter 1 means that the first character from the source string to find a match 2 means the 2nd match to the string (the default is "1", as in the example above) Select Regexp_substr (' @@/231_3253/abc ', ' @*[[:alnum:]]+ ') from dual; output:231 @* indicates a match of 0 or more @ [: alnum:]]+ to match 1 or more letters or numeric characters Note: Note the difference between "+" and "*" Select Regexp_substr (' [email Protected]/231_3253/abc ', ' @+[[:alnum:]]* ') from dual; Output: @ @+ indicates a match of 1 or more @ [: alnum:]]* to match 0 or more alphabetic or numeric characters select Regexp_substr (' [email protected]/231_3253/abc ', ' @ +[[:alnum:]]+ ') from dual; Output:null @+ represents a match of 1 or more @ [[: alnum:]]+ indicates a match of 1 or more alphabetic or numeric characters Select Regexp_substr (' @1psn/231_3253/abc125 ', ' [[:d igit:]]+ $ ') from dual; output:125 [[:d igit:]]+$ represents a character that matches 1 or more digits at the end of select Regexp_substr (' @1psn/231_3253/abc ', ' [^[:d igit:]]+$ ') from dual; Output:/abc [^[:d igit:]]+$ that matches 1 or more characters that are not the end of a number select Regexp_substr (' [email protected] ',' [^@]+ ') from dual; Output:tom_kyte [^@]+] matches 1 or more characters that are not "@" Select Regexp_substr (' 1psn/231_3253/abc ', ' [[: alnum:]]* ', ') from dual; Output:null [[: alnum:]]* means matching 0 or more letters or numeric characters note: because it matches 0 or more, the 2nd time here matches "/" (0 matches) instead of "231", so the result is "Null" 1.2 Matching recurring occurrences find a continuous 2 lowercase letters SELECT regexp_substr (' Republicc of AFRICAA ', ' ([A-z]) \1 ', 1, 1, ' I ') from dual; OUTPUT:CC ([A-z]) represents a lowercase letter A-Z \1 indicates that the number of consecutive times that match the preceding character 1 indicates that the 1th character from the source string starts to match 1 1th occurrences of the match result of the character I represents case-sensitive 1.3 Other matching Styles find Web page address information SELECT regexp_substr (' Go to http://www.oracle.com/products and click on Database ', ' http://([Alnum:]]+\.?] {3,4}/? ') RESULT from dual; output:http://www.oracle.com Where:/HTTP indicates that the matching string "http://" ([[Alnum:]]+\.?) matches 1 or more letters or numeric characters, followed by a 0 or 1-time comma character {3,4} Indicates that the preceding characters are matched at least 3 times, up to 4 times/? Represents a backslash character that matches 0 or 1 times
Extract the third value in a CSV string SELECT regexp_substr (' 1101,yokohama,japan,1.5.105 ', ' [^,]+ ', 1, 3) as output from dual; Output:japan where: [^,]+ means a match of 1 or more characters that are not a comma 1 means that the first character of the source string is searched to match 3 for the 3rd match to the string note: This is typically used to implement a string of biographies lines-the biographies line of the string SELECT regexp _substr (' 1101,yokohama,japan,1.5.105 ', ' [^,]+ ', 1, level] as output from dual CONNECT by level <= length ' 1101,yokohama , japan,1.5.105 ')-Length (REPLACE (' 1101,yokohama,japan,1.5.105 ', ', ')) + 1; Output:1101
Yokohama Japan 1.5.105 This loop intercepts the matched string by level.
In the following example, find out if the source string contains the three strings of kid, kids, or kidding, SELECT case when Regexp_like (' Why does a kid enjoy kidding with kids only? ', ' Kid (s|ding) * ', ' I ') then ' match Found ' ELSE ' No match Found ' END as Utput
from dual; Output:match Found Where: Kid means string Kid (s|ding) * means match 0 or more characters "s" or "ding" I means case insensitive 2. The REGEXP_INSTR regexp_instr function uses regular expressions to return the start and end points of the search pattern. The syntax for REGEXP_INSTR is as follows. REGEXP_INSTR Returns an integer that indicates the starting or ending position of the search pattern, and returns 0 if no matching value is found. Syntax:--2.REGEXP_INSTR is the same as the INSTR function, returning the string position regexp_instr (srcstr, pattern [, position [, occurrence [, return_option [, M Atch_option]]) As with REGEXP_SUBSTR, it also has variable pattern, position (starting position), occurrence, and match_parameter; Here is the main introduction to the new parameters Return_ option, which allows the user to tell Oracle what content to return when the pattern appears. As shown in the following example:--if return_option is 0, Oracle returns where the first character appears. This is the default value, with the same function as InStr SELECT regexp_instr (' abc1def ', ' [[:d igit:]] ') output from dual; Output:4--If return_option is 1, Oracle returns the position of the next character after the searched character appears. -for example, the following query returns the position after the first number found in the string: SELECT regexp_instr (' abc1def ', ' [[:d igit:]] ', 1,1,1) output from dual; output:5 oracle Regular Expression (regularexpression)
Metacharacters |
Character meaning |
Examples Show |
^ |
The starting position of the matching string, which is used in [], at which point it indicates that the character set is not accepted. |
^a: Matches a string starting with a [^a]: Match A string that does not contain a |
- |
When used in a-m to denote range; When used on the first character, represents a Even strings, such as [-ABC] |
|
$ |
Match character End position |
' a$ ': matches a string ending with a |
. |
Matches any single character other than the newline character N. |
|
? |
Match the preceding subexpression 0 or one time |
TR (y (ing)?) : Can indicate try or trying |
* |
Match previous sub-expression 0 or more times |
|
+ |
Match the preceding subexpression one or more times |
|
() |
Mark the start and end positions of a sub-expression |
A (b) * can match Ab,abb,abbb,youabb (c|d) match C or D |
[] |
Mark an expression in brackets |
[CD] Match select C or D is equivalent to (c|d). Matches the single-character, [^CD] is a single character outside the match c,d. [A-z] denotes all lowercase letters |
{M,n} |
m= < occurrences <=n, ' {m} ' indicates the presence of M times, ' {m,} ' indicating at least m times. |
|
| |
An expression or a relationship. Indicate a choice between two items |
|
Character families |
Character meaning |
|
[[: Alpha:]] |
Any letter. |
|
[[:d Igit:]] |
[[:d Igit:]] any number. |
|
[[: Alnum:]] |
Any alphanumeric |
|
[[: Space:]] |
Any whitespace characters. |
|
[[: Upper:]] |
Any uppercase letters. |
|
[[: Lower:]] |
Any lowercase letters. |
|
[[:p UNCT:]] |
Any punctuation. |
|
[[: Xdigit:]] |
Any 16-digit number, equivalent to [0-9a-fa-f]. |
|
Oracle built-in functions that support regular expressions
Name |
Grammar |
Note |
Regexp_like |
Regexp_like (Source_string, Pattern [, Match_parameter] ) |
Source_string: SOURCE string Pattern: Regular expressions Match_parameter: Match mode (i: Case-insensitive, C: case-sensitive, N: Allows the use of an operator that can match any string; M: Takes X as a string containing multiple lines. |
Regexp_replace |
Regexp_replace (Source_string, Pattern [, Replace_string] [, Position] [, Occurtence] [, Match_parameter] ) |
Replace_string: The string to replace Position: Starting position for Start search Occurtence Specifies the substitution of the nth occurrence string Other ibid. |
Regexp_substr |
Regexp_substr (Source_string, pattern [, Position [, Occurrence [, Match_parameter]] ) |
Position: Specifies the exact position in the string, which defaults to 1 Occurrence: Specifies which string should match relative to other strings during source string matching. For example Select Regexp_substr (' The ZIP code 80831 is for Falcon, CO ', ' [[: Alpha:]]{3,} ', 1, 3) from dual; The result is code rather than the or zip. |
Regexp_instr |
Regexp_instr (Source_string, Pattern [, Start_position [, Occurrence [, Return_option [, Match_parameter]]] ) |
Start_position: Start Search Location Occurrence: Nth occurrence of pattern, default = 1 Return_option: The starting position of the 0:pattern 1:pattern Next character start position Default is 0 |
Regexp_count |
Regexp_count (Source_string, Pattern [[, Start_position] [, Match_parameter]]) |
11g new function representing the number of occurrences of pattern in the original string Start_position: Where to start the search |
Oracle supports regular expression functions as an example.
regexp_substr: Used to extract a part of a string.
Regexp_substr (' first filed, second filed, Thirdfiled ', ', [^,]*, ')
Pattern is ', [^,]*, ' is represented by, start with a comma, a space in the middle, 0 or more non-comma characters in a row, and finally a comma, so that the return is ', second filed, '.
-------------------------------------------------------------------------------
Regexp_instr: The starting position for the pattern.
If there is a set of addresses, return to the location of the zip
Regexp_instr (' Jone smith,10045 Berry lane,sanjoseph,ca
91234-1234 ', ' [[:d igit:]]{5} ([-[[:d igit:]]{4}])? $ ')
Obviously the result position is the starting position of the 91234-1234 corresponding.
-------------------------------------------------------------------------------
regexp_replace: Used to replace the pattern with a string that needs to be replaced, the function is more flexible than the traditional replace function. For example:
' Jone Smith ' has three spaces, and ' Jone Smith ' has 2 spaces, to change the middle space to a space, use replace to use 2 times, instead of regular expressions, we can design the pattern as
' () {2,} ' this
Regexp_replace (MC, ' () {2,} ', ')
-------------------------------------------------------------------------------
Regexp_like is an enhanced version of like that can contain _ and% wildcard characters, which are used in the Where condition.
Regular expression attributes---back reference
The/digit (digit is 1-9) is numbered from left to right by caching the contents of the self-expression match into the buffer. The subexpression is expressed in parentheses.
1. Application in Regexp_replace:
The string ' AA bb cc ' is written in ' CC BB AA ' and can be designed as follows:
Regexp_replace (' AA bbcc ', ' (. *) (. *) (. *) ', ' \3 \2 \1 ')
(. *) represents any combination of strings, three such combinations are separated by a space, match the source string, \1,\2,\3 store the value of three words, respectively, so that we can use this to achieve the purpose.
2. Application in REGEXP_SUBSTR:
Find repeated alphanumeric values separated by spaces
Regexp_substr (' The finaltest is the Implention ', ' ([[: alnum:]]+) ([[: space:]]+] \1 ')
The returned result is an IS. ([[: alnum:]]+) ([[: space:]]+) returns many strings, but adding a \1 means returning two identical strings in a row, so the duplicate string is found.
Add
Regular Expressions for Oracle
Abstract: 10g regular expressions improve SQL flexibility. Unrelated whitespace detection, or decomposition of multiple regular strings, and so on.
10g the four new functions that support regular expressions are: Regexp_like, Regexp_instr, Regexp_substr, and Regexp_replace.
The regular expression replaces the old percent percent (%) and the wildcard character (_) characters. Matches the starting position of the input string, which is used in a square bracket expression, which indicates that the character set is not accepted. Matches the preceding subexpression 0 or one time. At present, the regular expression has been widely used in many software, including *nix (Linux, UNIX, etc.), HP and other operating systems, Php,c#,java and other development environments.
Oracle 10g Regular expressions improve SQL flexibility. Effectively solves the problem of data validity, repetition of word recognition, irrelevant blank detection, or decomposition of multiple regular strings.
The four new functions of the Oracle 10g support regular expressions are: Regexp_like, Regexp_instr, Regexp_substr, and Regexp_replace. They use POSIX regular expressions instead of the old percent (%) and wildcard character (_) characters.
Special character: ' ^ ' matches the starting position of the input string and is used in a square bracket expression, which indicates that the character set is not accepted. ' $ ' matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches ' n ' or ' R '. '. ' matches any single character except for line break N. '? ' matches the preceding subexpression 0 or one time. ' * ' matches the preceding subexpression 0 or more times. ' + ' matches the preceding sub-expression one or more times. ' () ' marks the start and end position of a subexpression. ' [] ' marks a bracket expression. ' {m,n} ' an exact occurrence of the frequency range,m=< occurrences <=n, ' {m} ' indicates the presence of M times, ' {m,} ' indicates at least m times. ' | ' indicates a choice between the two items. Example ' ^ ([a-z]+|[ 0-9]+) $ ' represents all lowercase letters or numbers combined into a string. num matches num, where num is a positive integer. A reference to the obtained match. A useful feature of regular expressions is the ability to save sub-expressions for later use, known as backreferencing. Allows for complex substitution capabilities such as adjusting a pattern to a new position or indicating the position of a substituted character or word. The matched subexpression is stored in a temporary buffer, and the buffer is numbered from left to right and accessed by a number symbol. The following example lists the names of the AA bb cc into CC, BB, aa. Select regexp_replace (' AA bb cc ', ' (. *) (. *) (. *) ', ' 3, 2, 1 ') from dual; Regexp_replace (' Ellenhildismit cc, BB, AA ' escape character.
Character families: [[: Alpha:]] any letter. [[:d Igit:]] any number. [[: Alnum:]] Any letters and numbers. [[: Space:]] any whitespace character. [[: Upper:]] any uppercase letters. [[: Lower:]] any lowercase letter. [[:p UNCT:]] any punctuation. [[: Xdigit:]] Any 16 binary number, equivalent to [0-9a-fa-f].
Operation Priority escape character for various operators (), (?:), (? =), [] parentheses and brackets *, +,?, {n}, {n,}, {n,m} qualifier ^, $, anymetacharacter position and order | "or" action
--Test Data CREATE TABLE test (MC VARCHAR2 (60));
INSERT into test values (' 112233445566778899 '); INSERT into test values (' 22113344 5566778899 '); INSERT into test values (' 33112244 5566778899 '); INSERT into test values (' 44112233 5566 778899 '); INSERT into test values (' 5511 2233 4466778899 '); INSERT into test values (' 661122334455778899 '); INSERT into test values (' 771122334455668899 '); INSERT into test values (' 881122334455667799 '); INSERT into test values (' 991122334455667788 '); INSERT into test values (' Aabbccddee '); INSERT into test values (' Bbaaaccddee '); INSERT into test values (' Ccabbddee '); INSERT into test values (' Ddaabbccee '); INSERT into test values (' EEAABBCCDD '); INSERT into test values (' ab123 '); INSERT into test values (' 123xy '); INSERT into test values (' 007ab '); INSERT into test values (' Abcxy '); INSERT into test values (' The final test was is-is-what to find duplicate words. ');
Commit
First, Regexp_like
SELECT * FROM Test where regexp_like (MC, ' ^a{1,3} '); SELECT * FROM Test where regexp_like (MC, ' a{1,3} '); SELECT * FROM Test where regexp_like (MC, ' ^a.*e$ '); SELECT * FROM Test where regexp_like (MC, ' ^[[:lower:]]|[ [:d Igit:]]; SELECT * FROM Test where regexp_like (MC, ' ^[[:lower:]] '); Select MC from Test Where regexp_like (MC, ' [^[:d igit:] '); Select MC from Test Where regexp_like (MC, ' ^[^[:d igit:] ');
Second, Regexp_instr
Select Regexp_instr (MC, ' [[:d igit:]]$ ') from test; Select Regexp_instr (MC, ' [[:d igit:]]+$ ') from test; Select regexp_instr (' The Price is ", ' $[[:d igit:]]+ ') from DUAL; Select regexp_instr (' Onetwothree ', ' [^[[:lower:]] ') from DUAL; Select regexp_instr (',,,,, ', ' [^,]* ') from DUAL; Select regexp_instr (',,,,, ', ' [^,] ') from DUAL;
Third, Regexp_substr
SELECT Regexp_substr (MC, ' [a-z]+ ') from test; SELECT Regexp_substr (MC, ' [0-9]+ ') from test; SELECT regexp_substr (' aababcde ', ' ^a.*b ') from DUAL;
Iv. Regexp_replace
Select regexp_replace (' Joe Smith ', ' () {2,} ', ', ') as rx_replace from dual; Select regexp_replace (' AA bb cc ', ' (. *) (. *) (. *) ', ' 3, 2, 1 ') from dual;
Sql> select * from test;
ID MC--------------------------------------------------------------------------------a AAAAA a AAAAA
b bbbbb
Sql> SELECT * FROM Test where regexp_like (ID, ' B ', ' I '); --No data case sensitivity
ID MC--------------------------------------------------------------------------------
b bbbbb
------------------------------------
Methods of matching Oracle regular expressions to Chinese
Method 1:select Regexp_replace (' ABC Autumn def ', '[' | | Chr (128) | | -' | | Chr (255) | | '] ', '-') from dual;Method 2 (replace Chinese characters and "\"): SELECT regexp_replace (ASCIISTR (' ABC Autumn-' "" ",.。[email protected]#$%^&^* () _+= def, \ '), ' \\[[:alnum:]]{4} ', ' X ') from DUAL; Test Example: Select Regexp_replace (' ABC Autumn-' "" ",.。[email protected]#$%^&^* () _+= def, \ ', ' [' | | Chr (128) | | -' | | Chr (255) | | '] ', ' X ') from dual UNION ALL SELECT Regexp_replace (ASCIISTR (' ABC Autumn-' "",.。[email protected]#$%^&^* () _+= def, \ '), ' \\[[:alnum:]]{4} ', ' X ') from DUAL;
The Regexp_substr,regexp_instr,regexp_replace,regexp_like of the Oracle regular expression