Introduction to Oracle Regular Expressions

Source: Internet
Author: User

The following examples illustrate how to use regular expressions to solve common problems in work.
1.
REGEXP_SUBSTR
The REGEXP_SUBSTR function uses a regular expression to specify the start and end points of the returned string, and returns the same string as the VARCHAR2 or CLOB data in the source_string character set.
Syntax:
-- 1. The REGEXP_SUBSTR function is the same as the SUBSTR function. The truncated substring is returned.
REGEXP_SUBSTR (srcstr, pattern [, position [, occurrence [, match_option])
Note:
Srcstr
Source string

Pattern
Regular Expression Style

Position
Start matching character location

Occurrence
Matching times

Match_option
Matching options (case sensitive)

1.1 extract a substring from a string
SELECT regexp_substr ('1psn/231_3253/abc', '[[: alnum:] +') FROM dual;
Output: 1PSN
[[: Alnum:] + indicates that one or more letters or numbers are matched.
SELECT regexp_substr ('1psn/231_3253/abc', '[[: alnum:] +', 1, 2) FROM dual;
Output: 231
Compared with the preceding example, two more parameters are provided.
1
Searches for matches starting from the first character of the source string.
2
Indicates the string that is matched for 2nd times (the default value is "1", as shown in the preceding example)
Select regexp_substr ('@/231_3253/abc',' @ * [[: alnum:] + ') from dual;
Output: 231
@ * Indicates matching 0 or multiple @
[[: Alnum:] + indicates that one or more letters or numbers are matched.
Note: Pay attention to the difference between "+" and "*".
Select regexp_substr ('1 @/231_3253/abc', '@ + [[: alnum:] *') from dual;
Output :@
@ + Indicates matching one or more @
[[: Alnum:] * Indicates matching 0 or multiple letters or numbers
Select regexp_substr ('1 @/231_3253/abc', '@ + [[: alnum:] +') from dual;
Output: Null
@ + Indicates matching one or more @
[[: Alnum:] + indicates that one or more letters or numbers are matched.
Select regexp_substr ('@ 1PSN/231_3253/ABc125', '[: digit:] + $') from dual;
Output: 125
[[: Digit:] + $ indicates matching the characters ending with one or more digits
Select regexp_substr ('@ 1PSN/231_3253/abc',' [^ [: digit:] + $ ') from dual;
Output:/ABc
[^ [: Digit:] + $ indicates matching one or more characters not ending with a number.
Select regexp_substr ('Tom _ Kyte@oracle.com ',' [^ @] + ') from dual;
Output: Tom_Kyte
[^ @] + Indicates matching one or more characters that are not "@"
Select regexp_substr ('1psn/231_3253/abc', '[[: alnum:] *', 1, 2)
From dual;
Output: Null
[[: Alnum:] * Indicates matching 0 or multiple letters or numbers
Note: Because there are 0 or more matches, the 2nd match here is "/" (0 matches) instead of "231", so the result is "Null"
1.2 duplicate matching
Search for 2 consecutive lowercase letters
SELECT regexp_substr ('republicc Of Africaa ',' ([a-z]) \ 1', 1, 1, 'I ')

FROM dual;
Output: cc
([A-z])
Lowercase letter a-z
\ 1
Indicates the number of consecutive times that match the previous character.
1
Indicates that the source string starts matching with 1st characters.
1
1st characters that match the matching results
I
Case Sensitive
1.3 Other matching styles
Search for webpage address information
SELECT regexp_substr ('go to http://www.oracle.com/products and click on database', 'HTTP: // ([: alnum:] + \.?) {3, 4 }/? ') RESULT

FROM dual
Output: http://www.oracle.com
Where:
Http ://
Indicates matching the string "http ://"
([[: Alnum:] + \.?) Match 1 or multiple letters or numbers, followed by 0 or 1 comma
{3, 4}
It indicates that the first character is matched at least three times and at most four times.
/?

It indicates that a backslash character is matched 0 times or 1 time.


Extract the third value from the csv string
SELECT regexp_substr ('2014, Yokohama, Japan, 1.5.105 ',' [^,] + ', 1, 3) AS output

FROM dual;
Output: Japan
Where:
[^,] +
Match one or more characters that are not comma
1
Searches for matches starting from the first character of the source string.
3
Indicates the string that is matched for 3rd Times.
Note: This is usually used to implement column-passing rows of strings.
-- String Columns
SELECT regexp_substr ('2014, Yokohama, Japan, 1.5.105 ',' [^,] + ', 1, LEVEL) AS output

FROM dual
Connect by level <= length ('2017, Yokohama, Japan, 1.5.105 ')-

Length (REPLACE ('1970, Yokohama, Japan, 1.5.105 ',', ') + 1;
Output: 1101

Yokohama
Japan
1.5.105
Here, we use LEVEL to capture matched strings cyclically.


In the following example, check whether the source string contains the kid, kids, or kidding strings.
SELECT CASE

WHEN regexp_like ('Why does a kid enjoy kidding with kids only? ',

'Kid (s | ding )*',

'I') THEN

'Match Found'

ELSE

'No Match Found'

End as output

FROM dual;
Output: Match Found
Where:
Kid
Indicates the string kid.
(S | ding )*
Matches 0 or multiple characters ("s" or "ding"
I
Case Insensitive
2.
REGEXP_INSTR
The REGEXP_INSTR function uses a regular expression to return the start and end points of the search mode. The REGEXP_INSTR syntax is as follows. REGEXP_INSTR returns an integer indicating the start or end position of the search mode. If no matching value is found, 0 is returned.
Syntax:
-- 2. The REGEXP_INSTR function is the same as the INSTR function and returns the string position.
REGEXP_INSTR (srcstr, pattern [, position [, occurrence [, return_option [, match_option])
Like REGEXP_SUBSTR, it also has the pattern, position (start position), occurrence, and match_parameter variables. Here we mainly introduce the role of the new parameter return_option, which allows users to tell Oracle, what content will be returned when the mode appears.
The following is an example:
-- If return_option is 0, Oracle returns the position where the first character appears. This is the default value, which is the same as INSTR.
SELECT regexp_instr ('abc1def ',
'[[: Digit:]') output

FROM dual;
Output: 4
-- If return_option is 1, Oracle returns the position of the next character after the searched character appears.
-- For example, the following query returns the position after the first digit found in the string:
SELECT regexp_instr ('abc1def ',
'[[: Digit:]', 1, 1) output

FROM dual;
Output: 5

Oracle Regular Expression (regularexpression)

Metacharacters

Character meaning

Example

^

The starting position of the matched string (used in [], which indicates that the character set combination is not accepted.

^ A: match the string starting with

[^ A]: match a string without

-

When a-m is used to indicate the range;

It is expressed when the first character is used.

A concatenation string, for example, [-abc].

 

$

Matching character end position

'A $ ': match the string ending with

.

Match any single character except linefeed n.

 

?

Matches the previous subexpression zero or once

Tr (y (ing )?) : Indicates try or trying.

*

Match the previous subexpression zero or multiple times

 

+

Match the previous subexpression once or multiple times

 

()

Mark the start and end positions of a subexpression

A (B) * can match

AB, abb, abbb, youabb

(C | d) Match c or d

[]

Mark a bracket expression

[Cd] matching c or d is equivalent

(C | d ). It matches a single character, and [^ cd] matches a single character other than c and d.

[A-z] indicates all lowercase letters

{M, n}

M = <number of occurrences <= n, '{m}' indicates m occurrences, and '{m,}' indicates at least m occurrences.

 

|

The link of the link. Specify an option between two items

 

Character Cluster

Character meaning

 

[[: Alpha:]

Any letter.

 

[[: Digit:]

[[: Digit:] any number.

 

[[: Alnum:]

Any letter and number

 

[[: Space:]

Any white characters.

 

[[: Upper:]

Any uppercase letter.

 

[[: Lower:]

Any lowercase letter.

 

[[: Punct:]

Any punctuation.

 

[[: Xdigit:]

Any hexadecimal number is equivalent to [0-9a-fA-F].

 

Oracle supports built-in functions of Regular Expressions

Name

Syntax

Remarks

REGEXP_LIKE

REGEXP_LIKE

(Source_string,

Pattern

[, Match_parameter]

)

Source_string:

Source string

Pattern:

Regular Expression

Match_parameter:

Matching mode (I: case-insensitive; c: case-sensitive; n: allows the use of operators that can match any string; m: uses x as a string containing multiple rows.

REGEXP_REPLACE

REGEXP_REPLACE

(Source_string,

Pattern

[, Replace_string]

[, Position]

[, Occurtence]

[, Match_parameter]

)

Replace_string:

String to be replaced

Position:

Start position of start search

Occurtence

Specifies to replace the nth occurrence string

Others are the same as above.

REGEXP_SUBSTR

REGEXP_SUBSTR

(Source_string, pattern

[, Position

[, Occurrence

[, Match_parameter]

)

Position:

Specifies the exact position in the string. The default value is 1.

Occurrence:

Specifies the string to which the source string matches other strings. For example

Select regexp_substr ('the zip code

80831 is for falcon, co ',

'[[: Alpha:] {3,}', 1, 3)

From dual;

The result is code rather than The or zip.

REGEXP_INSTR

REGEXP_INSTR

(Source_string,

Pattern

[, Start_position

[, Occurrence

[, Return_option

[, Match_parameter]

)

Start_position:

Start searching location

Occurrence:

The Nth occurrence of pattern. The default value is 1.

Return_option:

0: Start position of pattern

1: Start position of the next character in pattern

The default value is 0.

REGEXP_COUNT

REGEXP_COUNT

(Source_string,

Pattern

[[, Start_position]

[, Match_parameter])

The new 11G function indicates the number of times that pattern appears in the original string.

Start_position:

Start search location

 

Oracle supports regular expression functions.

Regexp_substr: Used to extract a part of a string.

Regexp_substr ('first filed, second filed, thirdfiled ',', [^,] *, ')

The Pattern is ', [^,] *,', which indicates to start with a comma. There is no space in the middle. There are 0 or more non-comma characters in a row, and the end is a comma. In this way, the return value is ', second filed ,'.

-------------------------------------------------------------------------------

Regexp_instr: The starting position of the pattern.

If there is a group of addresses, return the location of the zip code

Regexp_instr ('jone Smith, 10045 Berry Lane, SanJoseph, CA

91234-1234 ',' [[: digit:] {5} ([-[[: digit:] {4}])? $ ')

Obviously, the result position is the starting position corresponding to 91234-1234.

-------------------------------------------------------------------------------

Regexp_replace: Replace pattern with the string to be replaced. This function is more flexible than the traditional replace function. For example:

'Jone smith 'has three spaces, while 'jone smith' has two spaces. To change the space in the middle to a space, use replace twice, but use a regular expression, we can design pattern

'() {2,}' and so on

Regexp_replace (mc, '() {2 ,}','')

-------------------------------------------------------------------------------

Regexp_likeIs an enhanced version of like, can contain _ and % wildcards, used in the where condition.

Regular Expression Feature-backward reference

Cache the content matched by the self-expression to the buffer, numbers from the left to the right, and uses/digit (digit is 1-9 ). The subexpression is represented by parentheses.

1. Application in regexp_replace:

You can describe the 'aa bb CC' character string to 'CC bb AA' as follows:

Regexp_replace ('aa bbcc', '(. *)', '\ 3 \ 2 \ 1 ')

(. *) Represents any string combination. The three strings are separated by spaces and match with the source string. \ 1, \ 2, \ 3 stores the values of the three word expressions respectively, so that we can use this to achieve our goal.

2. apply it in regexp_substr:

Search for repeated alphanumeric values separated by Spaces

Regexp_substr ('the finaltest is the implention ',' ([[: alnum:] +) ([[: space:] +) \ 1 ')

The returned result is. ([[: Alnum:] +) ([[: space:] +) returns many strings, but adding one \ 1 indicates that two identical strings are returned consecutively, in this way, we can find the repeated strings.

Supplement

Oracle Regular Expression

Summary:
10g Regular Expressions improve SQL flexibility.
Irrelevant blank detection, or breaking down strings composed of multiple regular expressions.
10 Gb supports four new functions of Regular Expressions: REGEXP_LIKE, REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_REPLACE.
The regular expression replaces the old percent (%) and wildcard (_) characters.
Matches the start position of the input string, which is used in the square brackets expression. In this case, this character set is not accepted.
Match the previous subexpression zero or once.
At present, regular expressions have been widely used in many software applications, including * nix (Linux, Unix, etc.), HP and other operating systems, PHP, C #, Java and other development environments.

Oracle 10g Regular Expressions improve SQL flexibility. Effectively solves the problem of data validity, repeated word recognition, irrelevant blank detection, or decomposing multiple regular expressions.
.

Oracle 10 Gb supports four new functions of Regular Expressions: REGEXP_LIKE, REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_REPLACE.
They use POSIX Regular Expressions instead of the old percent (%) and wildcard (_) characters.

Special characters:
'^' Matches the start position of the input string and is used in the square brackets expression. In this case, this character set is not accepted.
'$' Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches 'n' or 'R '.
'.' Matches any single character except linefeed n.
'? 'Match the previous subexpression zero or once.
'*' Matches the previous subexpression zero or multiple times.
'+' Matches the previous subexpression once or multiple times.
'()' Indicates the start and end positions of a subexpression.
'[]' Indicates a bracket expression.
'{M, n}' indicates the exact number of occurrences. m = <number of occurrences <= n, '{m}' indicates m occurrences, '{m ,} 'indicates that at least m occurs.
'|' Indicates an option between the two items. Example '^ ([a-z] + | [0-9] +) $' indicates a string composed of all lowercase letters or numbers.
Num matches num, where num is a positive integer. References to the obtained matching.
A useful feature of a regular expression is that it can be used after the sub-expression is saved. It is called Backreferencing. It allows complex replacement capabilities.
For example, adjust a pattern to a new position or indicate the position of the character or word to be replaced. The matched subexpression is stored in the temporary buffer,
The buffer is numbered from left to right and accessed by numerical symbols. The following example shows how to change the name aa bb cc
Cc, bb, aa.
Select REGEXP_REPLACE ('aa bb CC', '(. *)', '3, 2, 1') FROM dual;
REGEXP_REPLACE ('ellenhildismit
Cc, bb, aa
''Escape character.

Character cluster:
[[: Alpha:] any letter.
[[: Digit:] any number.
[[: Alnum:] Any letter or number.
[[: Space:] any white characters.
[[: Upper:] Any uppercase letter.
[[: Lower:] Any lowercase letter.
[[: Punct:] Any punctuation marks.
[[: Xdigit:] Any hexadecimal number, which is equivalent to [0-9a-fA-F].

Operation priority of various operators
Escape Character
(),(? :),(? =), [] Parentheses and square brackets
*, + ,?, {N}, {n ,}, {n, m} qualifier
^, $, Anymetacharacter location and Sequence
| "Or" Operation

-- Test Data
Create table test (mc varchar2 (60 ));

Insert into test values ('20140901 ');
Insert into test values ('2017 22113344 ');
Insert into test values ('2017 33112244 ');
Insert into test values ('2014 44112233 5566 778899 ');
Insert into test values ('2014 5511 2233 4466778899 ');
Insert into test values ('20140901 ');
Insert into test values ('20140901 ');
Insert into test values ('20140901 ');
Insert into test values ('20140901 ');
Insert into test values ('aabbccddee ');
Insert into test values ('bbaaaccddee ');
Insert into test values ('ccabbddee ');
Insert into test values ('ddaabbccee ');
Insert into test values ('eeaabbccdd ');
Insert into test values ('ab123 ');
Insert into test values ('123xy ');
Insert into test values ('007ab ');
Insert into test values ('abcxy ');
Insert into test values ('the final test is how to find duplicate words .');

Commit;

1. REGEXP_LIKE

Select * from test where regexp_like (mc, '^ a {1, 3 }');
Select * from test where regexp_like (mc, 'a {1, 3 }');
Select * from test where regexp_like (mc, '^ a. * e $ ');
Select * from test where regexp_like (mc, '^ [[: lower:] | [[: digit:]');
Select * from test where regexp_like (mc, '^ [: lower:]');
Select mc FROM test Where REGEXP_LIKE (mc, '[^ [: digit:]');
Select mc FROM test Where REGEXP_LIKE (mc, '^ [^ [: digit:]');

Ii. REGEXP_INSTR

Select REGEXP_INSTR (mc, '[[: digit:] $') from test;
Select REGEXP_INSTR (mc, '[[: digit:] + $') from test;
Select REGEXP_INSTR ('the price is $400. ',' $ [[: digit:] + ') from dual;
Select REGEXP_INSTR ('onetwothree ',' [^ [: lower:] ') from dual;
Select REGEXP_INSTR (',', '[^,] *') from dual;
Select REGEXP_INSTR (',', '[^,]') from dual;

Iii. REGEXP_SUBSTR

SELECT REGEXP_SUBSTR (mc, '[a-z] +') FROM test;
SELECT REGEXP_SUBSTR (mc, '[0-9] +') FROM test;
SELECT REGEXP_SUBSTR ('ababcde', '^ a. * B') FROM DUAL;

Iv. REGEXP_REPLACE

Select REGEXP_REPLACE ('Joe Smith ',' () {2,} ',', ') AS RX_REPLACE FROM dual;
Select REGEXP_REPLACE ('aa bb CC', '(. *)', '3, 2, 1') FROM dual;

SQL> select * from test;

ID MC
--------------------------------------------------------------------------------
A AAAAA
A aaaaa

B bbbbb

SQL> select * from test where regexp_like (id, 'B', 'I'); -- case insensitive

ID MC
--------------------------------------------------------------------------------

B bbbbb

# End

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.