Back reference A useful feature of regular expressions is the ability to store the subexpression for later reuse, which is also referred to as a back reference (outlined in table 10). It allows for complex substitution functions, such as swapping patterns in new locations or displaying repeated occurrences of words or letters. The matching part of the subexpression is saved in the temporary buffer. The buffer is numbered from left to right and accessed using the \digit symbol, where digit is a number between 1 and 9 that matches the digit subexpression, which is displayed with a set of parentheses. The following example shows the conversion of the name Ellen Hildi Smith to Smith, Ellen Hildi by reference to each subexpression by number. SELECT Regexp_replace ( ' Ellen hildi Smith ', ' (. *) (. *) (. *) ', ' \3, \1 \2 ') From dual Regexp_replace (' EL ------------------ Smith, Ellen Hildi The SQL statement shows the three separate subexpression enclosed in parentheses. Each individual subexpression contains a matching meta character (.) and follows the * metacharacters, indicating that any character (other than a newline character) must match 0 or more times. A space separates each subexpression, and a space must also match. Parentheses create a subexpression that gets the value, and can be referenced with \digit. The first subexpression is assigned to \1, the second \2, and so on. These back references are used in the last parameter of the function (\3, \1 \2), which effectively returns the replacement substring and arranges them in the desired format (including commas and spaces). Table 11 details the various components of the regular expression. A back reference is useful for replacing, formatting, and substituting values, and you can use them to find adjacent values. The following example shows the use of the REGEP_SUBSTR function to find any repeated alphanumeric values that are separated by spaces. The displayed results give a substring that identifies the repeated occurrence of the word is. SELECT Regexp_substr ( ' The final test is the implementation ', ([[: alnum:]]+) ([[: space:]]+) \1 ') as substr From dual SUBSTR ------ IS is Matching parameter options You may have noticed that the regular expression operators and functions contain an optional matching parameter. This parameter controls whether case sensitive, line break matching, and retention of multiple lines of input. The practical application of regular expressions Not only can you use regular expressions in queues, you can also use regular expressions anywhere you use SQL operators or functions, such as in Pl/sql languages. You can write triggers that take advantage of regular expression functionality to validate, generate, or extract values. The next example shows how you can apply the Regexp_like operator to data validation in a column check constraint. It verifies the correct social Security number format when inserting or updating. Social Security numbers in formats such as 123-45-6789 and 123456789 for this column constraint is an acceptable value. Valid data must start with three digits, followed by a hyphen, plus two digits and a hyphen, and finally four digits. The other expression allows only 9 consecutive digits. The vertical bar symbol (|) separates the options. ALTER TABLE Students ADD CONSTRAINT Stud_ssn_ck CHECK (Regexp_like (SSN, ' ^ ([[:d igit:]]{3}-[[:d igit:]]{2}-[[:d igit:]]{4}| [[:d Igit:]] {9}) $')) The characters at the beginning or end of the ^ and $ indicate are not acceptable. Make sure that your regular expression is not divided into multiple lines or contains any unnecessary spaces unless you want the format to be so and match accordingly. Table 12 illustrates the various components of the regular expression sample. Compare regular expressions to existing functionality Regular expressions have several advantages over common like operators and INSTR, SUBSTR, and REPLACE functions. These traditional SQL functions are not convenient for pattern matching. Only the like operator matches by using the% and _ characters but like does not support the repetition of expressions, complex substitutions, character ranges, word characters and POSIX character classes, and so on. In addition, the new regular expression function allows you to detect repeated occurrences of word and pattern swapping. The examples here give you an overview of the regular expression areas and how you can use them in your applications. To actually enrich your toolkit Because regular expressions help solve complex problems, they are very powerful. Some of the features of regular expressions are difficult to emulate with traditional SQL functions. When you understand the underlying building blocks of this slightly cryptic language, regular expressions will become an integral part of your toolkit (not only in SQL but also in other programming locales). In order for your patterns to be correct, while attempts and errors are sometimes necessary, the simplicity and strength of regular expressions are unquestionable. Alice Rischert (ar280@yahoo.com) is the chairman of the database application development and design direction of the computer Technology and application Department of Columbia University. She wrote the Oracle SQL Interactive manual version 2nd (Prentice hall,2002) and the upcoming Oracle SQL sample (Prentice hall,2003). Rischert has over 15 years of experience as a database designer, DBA, and project director in Fortune 100 companies, and she has been using Oracle products since Oracle version 5. Table 1: Locating meta characters
Metacharacters |
Description |
^ |
To position an expression at the beginning of a line |
$ |
Position an expression to the end of a line |
Table 2: quantifiers or repeat Operators
Quantifiers |
Description |
* |
Match 0 or more times |
? |
Match 0 times or 1 times |
+ |
Match 1 or more times |
{m} |
Exactly match m times |
{m,} |
Match at least m times |
{m, n} |
Match at least m times but not more than n times |
Table 3: Predefined POSIX character classes
Character class |
Description |
[: Alpha:] |
Alphabetic characters |
[: Lower:] |
Lowercase alphabetic characters |
[: Upper:] |
Uppercase characters |
[:d Igit:] |
Digital |
[: Alnum:] |
Alpha-Numeric characters |
[: Space:] |
White-space characters (prohibit printing), such as carriage returns, line breaks, vertical tabs, and page breaks |
[:p UNCT:] |
Punctuation characters |
[: Cntrl:] |
control characters (prohibit printing) |
[:p rint:] |
printable characters |
Table 4: Substitution matching and grouping of expressions
Metacharacters |
Description |
| |
Replace |
Separate substitution options, usually used with the group operator () |
( ) |
Group |
Group the subexpression into a substitution unit, quantifier unit, or back reference unit (see the "Back Reference" section) |
[Char] |
Character List |
Represents a character characters; most of the metacharacters in a list of characters (except character classes, ^ and-metacharacters) are understood as text |
Table 5:regexp_like operator
Grammar |
Description |
Regexp_like (source_string, pattern [, Match_parameter]) |
Source_string supports character data types (char, VARCHAR2, CLOB, NCHAR, NVARCHAR2, and NCLOB, but does not include LONG). The pattern parameter is another name for the regular expression. Match_parameter allows optional parameters such as handling line breaks, retaining multiline formatting, and providing control over case sensitivity. |
Table 6:regexp_instr function
Grammar |
Description |
Regexp_instr (source_string, pattern [, Start_position [, Occurrence [, Return_option [, Match_parameter]]] |
The function looks for pattern and returns the first position of the schema. You can optionally specify the start_position you want to start your search. The occurrence parameter defaults to 1 unless you specify that you want to find a pattern that appears next. The default value of Return_option is 0, which returns the starting position of the pattern, and a value of 1 returns the starting position of the next character that matches the criteria. |
Table 7: Description of 5 digit plus 4-bit zip code expression
Grammar |
Description |
|
Blanks that must match |
[:d Igit:] |
POSIX Digital Classes |
] |
End of character list |
{5} |
Word characters just repeat 5 times |
( |
The beginning of a child expression |
- |
A literal hyphen, because it is not a character list within the range metacharacters |
[ |
The beginning of a character list |
[:d Igit:] |
POSIX [:d igit:] Class |
[ |
The beginning of a character list |
] |
End of character list |
{4} |
Word characters just repeat 4 times |
) |
End parenthesis, ending subexpression |
? |
? Quantifiers match the grouped subexpression 0 or 1 times to make the 4-bit code optional |
$ |
Locates Metacharacters, indicating end of line |
Table 8:regexp_substr function
Grammar |
Description |
Regexp_substr (source_string, pattern [, position [, occurrence [, Match_parameter]]] |
regexp_substr function returns a substring of the matching pattern. |
Table 9:regexp_replace function
Grammar |
Description |
Regexp_replace (source_string, pattern [, replace_string [, Position [, occurrence, [match_parameter]]] |
The function replaces the matching pattern with a specified replace_string, allowing for complex "search and replace" operations. |
Table 10: Back reference meta characters
Metacharacters |
Description |
\digit |
Back slash |
followed by a number between 1 and 9, the backslash matches the first digit subexpression enclosed in parentheses. (Note: A backslash has another meaning in a regular expression that, depending on the context, may also represent an Escape character.) |
Table 11: Description of pattern Exchange regular expressions
Regular expression Items |
Description |
( |
The beginning of the first subexpression |
. |
Match any single character except for line breaks |
* |
Repeat the operator that matches the previous. Metacharacters 0 to N times |
) |
The end of the first subexpression; match results in \1 In this example, the result is Ellen. ) |
|
Whitespace that must exist |
( |
The beginning of the second subexpression |
. |
Match any single character except the line feed |
* |
Repeat the operator that matches the previous. Metacharacters 0 to N times |
) |
The end of the second subexpression; match results in \2 In this example, the result is hildi. ) |
|
Blank |
( |
The beginning of the third subexpression tr> |
. |
Match any single character except for line breaks |
* |
Repeat the operator that matches the previous. Metacharacters 0 to N times |
) |
The end of the third subexpression; match results in \3 (In this case, the result is Smith.) ) |
Table 12: Social Security numbers description of regular expressions
Regular expression Items |
Description |
^ |
The beginning character (the regular expression cannot have any leading characters until it matches.) ) |
( |
Start the subexpression and list the | Meta-character-separated replaceable options |
[ |
The beginning of a character list |
[:d Igit:] |
POSIX Digital Classes |
] |
End of character list |
{3} |
Word characters just repeat 3 times |
- |
Hyphens |
[ |
The beginning of a character list |
[:d Igit:] |
POSIX Digital Classes |
] |
End of character list |
{2} |
Word characters just repeat 2 times |
- |
Another hyphen |
[ |
The beginning of a character list |
[:d Igit:] |
POSIX Digital Classes |
] |
End of character list |
{4} |
Word characters just repeat 4 times |
| |
Replaces the metacharacters; ends the first option and begins the next alternate expression |
[ |
The beginning of a character list |
[:d Igit:] |
POSIX Digital Classes |
] |
End of character list |
{9} |
Word characters just repeat 9 times |
) |
Ends the parentheses, ending the child expression group for substitution |
$ |
Locates metacharacters, indicating line endings; no extra characters to fit the pattern |
|