Oracle10G: PL/SQL Regular Expressions (regular expressions) Manual

Source: Internet
Author: User
Tags character classes list of character classes printable characters alphanumeric characters
A new feature of OracleDatabase10g greatly improves your ability to search and process character data. This feature is a regular expression used to describe the text mode. It has appeared in many programming languages and a large number of UNIX utilities for a long time.

A new feature of Oracle Database 10 Gb greatly improves your ability to search and process character data. This feature is a regular expression used to describe the text mode. It has appeared in many programming languages and a large number of UNIX utilities for a long time.

The implementation of Oracle regular expressions is in the form of various SQL functions and a WHERE clause operator. If you are not familiar with regular expressions, this article will show you about this new extremely powerful yet mysterious feature. Readers who are familiar with regular expressions can learn how to apply this function in the Oracle SQL environment.

What is a regular expression?
A regular expression is composed of one or more character characters and/or metacharacters. In the simplest format, a regular expression only consists of characters, such as a regular expression cat. It is read as c followed by letters a and t. This pattern matches strings such as cat, location, and catalog. Metacharacters provide algorithms to determine how Oracle processes the characters that comprise a regular expression. When you understand the meanings of metacharacters, you will understand that regular expressions are very powerful for searching and replacing specific text data.

Verify data, identify duplicate keywords, detect unnecessary spaces, or analyze strings as part of many applications of regular expressions. You can use them to verify the format of the phone number, zip code, email address, social security number, IP address, file name, and path name. In addition, you can find patterns such as HTML tags, numbers, dates, or anything that fits any pattern in any text data and replace them with other patterns.

Use regular expressions for Oracle Database 10 GB
You can use the recently introduced Oracle SQL REGEXP_LIKE operator, REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_REPLACE functions to play the role of a regular expression. You will understand how this new function supplements the LIKE operator and the INSTR, SUBSTR, and REPLACE functions. In fact, they are similar to existing operators, but now they have added powerful pattern matching functions. The searched data can be a simple string or a large number of text stored in the database character column. Regular Expressions allow you to search, replace, and verify data in a way you have never imagined before, and provide high flexibility.

Basic examples of Regular Expressions
Before using this new feature, you need to understand the meaning of metacharacters. The period (.) matches any character in a regular expression (except for line breaks ). For example, the regular expression a. B matches the string with the first letter a, followed by any other single character (except the line break), followed by the letter B. The axb, xaybx, and abba strings match each other, because this mode is hidden in the string. If you want to precisely match a string starting with a and ending with B, you must locate the regular expression. The delimiters (^) metacharacters indicate the beginning of a row, while the dollar sign ($) indicates the end of a row (see Table 1: See appendix 4th ). Therefore, the regular expression ^ a. B $ matches the string aab, abb, or axb. This method matches a_ B with a similar pattern provided by the LIKE operator, where (_) is a single character wildcard.

By default, a separate character or character list in a regular expression only matches once. To indicate a character that appears multiple times in a regular expression, you can use a quantizer, also known as a repetition operator .. If you want to get a matching pattern that starts from letter a and ends with letter B, your regular expression looks like this: ^ a. * B $. * The metacharacters repeat the previous metacharacters (.) for zero, one, or more times. The equivalent pattern of the LIKE operator is a % B, in which the percent sign (%) is used to indicate zero, one or multiple occurrences of any character.

Table 2 provides a complete list of repeated operators. Note that it contains special repeated options, which provide greater flexibility than the existing LIKE wildcard. If you enclose an expression in parentheses, this effectively creates a subexpression that can be repeated for a certain number of times. For example, a regular expression B (an) * a matches ba, bana, banana, yourbananasplit, and so on.

Oracle Regular Expression implementation supports POSIX (Portable Operating System Interface) character classes, see the content listed in table 3. This means that the character type you want to search for can be very special. Suppose you want to write a LIKE condition that only looks for non-letter characters-The WHERE clause as the result may become very complicated inadvertently.

POSIX character classes must be included in a list of characters indicated by square brackets. For example, the regular expression [[: lower:] matches a lowercase letter character, while [[: lower:] {5} matches five consecutive lowercase letter characters.

In addition to the POSIX character class, you can place individual characters in a character list. For example, the regular expression ^ AB [cd] ef $ matches the string abcef and abdef. Select c or d.

Except for the delimiters (^) and hyphens (-), most metacharacters in the character list are considered as text. Regular Expressions seem complicated because some metacharacters have multiple meanings that depend on the context. ^ Is such a metacharacter. If you use it as the first character of a character list, it represents the non. Therefore, [^ [: digit:] searches for matching modes that contain any non-numeric characters, and ^ [[: digit:] searches for matching modes that start with numbers. A hyphen (-) indicates a range. The regular expression [a-m] matches any letter from a to m. However, if it is the first character in a line (for example, in [-afg]), it represents a hyphen.

In the previous example, parentheses are used to create subexpressions. They allow you to enter replacement metacharacters to enter replacement options separated by vertical bars (|.

For example, the regular expression t (a | e | I) n allows replacement of three possible characters between letters t and n. Match modes include words such as tan, ten, tin, and Pakistan, but do not include teen, mountain, or tune. The regular expression t (a | e | I) n can also be expressed as a character list t [aei] n. Table 4 summarizes these metacharacters. Although there are more metacharacters, This concise overview is enough to understand the regular expressions used in this article.

The REGEXP_LIKE operator describes the regular expression functions used in Oracle databases. Table 5 lists the REGEXP_LIKE syntax.

The WHERE clause of the following SQL query shows the REGEXP_LIKE operator, which searches for the pattern that satisfies the regular expression [^ [: digit:] in the ZIP column. It retrieves rows whose ZIP column values in the ZIPCODE table contain any non-numeric characters.

SELECT zip FROM zipcode WHERE REGEXP_LIKE (zip, '[^ [: digit:]')


The example of this regular expression is only composed of metacharacters. Specifically, it is a POSIX character class digit separated by colons and square brackets. The second square brackets (as shown in [^ [: digit:]) contain a list of character classes. As mentioned above, this is because you can only use the POSIX character class to build a character list.

This function returns the starting position of a mode, so its function is very similar to the INSTR function. The syntax of the new REGEXP_INSTR function is given in Table 6. The main difference between the two functions is that REGEXP_INSTR allows you to specify a mode instead of a specific search string; therefore, it provides more functions. The following example uses REGEXP_INSTR to return the starting position of the five-digit ZIP code mode in the string Joe Smith, 10045 Berry Lane, San Joseph, and CA 91234. If the regular expression is written as [[: digit:] {5}, you will get the starting position of the house number instead of the zip code, because 10045 is the first time that five consecutive numbers appear. Therefore, you must position the expression at the end of the row. As shown in $, this function displays the starting position of the ZIP Code regardless of the number of the house number.

SELECT REGEXP_INSTR ('Joe Smith, 10045 Berry Lane, San Joseph, CA 100 ',
'[[: Digit:] {5} $') AS rx_instr FROM dual


Write more complex models
Let's expand the zip code pattern in the previous example to include an optional four-digit pattern. Your mode may now look like this: [[: digit:] {5} (-[[: digit:] {4 })? $. If your source string ends with a 5-bit postal code or a 5-bit + 4-bit postal code, you will be able to display the start position of this mode.

SELECT REGEXP_INSTR ('Joe Smith, 10045 Berry Lane, San Joseph, CA 91234-1234 ',
'[[: Digit:] {5} (-[[: digit:] {4 })? $ ') AS starts_at FROM dual


In this example, the subexpression (-[[: digit:] {4}) in the ARC will press? Indicates that the operator is repeated zero or once. In addition, attempts to use traditional SQL functions to achieve the same results are even a challenge for SQL experts. To better illustrate the different components of this regular expression example, table 7 contains a description of a single text and metacharacters.

Similar to the SUBSTR function, the REGEXP_SUBSTR function is used to extract a part of a string. Table 8 shows the syntax of the new function. In the following example, the string matching mode [^,] * will be returned. This regular expression searches for a comma followed by a space. Then, follow the instructions in [^,] * to search for zero or more characters that are not commas, and finally find another comma. This pattern looks a bit like a comma-separated value string.

SELECT REGEXP_SUBSTR ('first field, second field, third field ',', [^,] *, ') FROM dual

, Second field,

Let's first take a look at the traditional replace SQL function, which replaces one string with another. Assume that your data contains unnecessary spaces in the body, and you want to replace them with a single space. Using the REPLACE function, You Need To accurately list the number of spaces you want to REPLACE. However, the number of extra spaces may not be the same everywhere in the body. The following example contains three spaces between Joe and Smith. The REPLACE function parameter specifies to REPLACE two spaces with one space. In this case, an extra space is left between Joe and Smith of the original string.

Select replace ('Joe Smith ', '','') AS replace FROM dual

Joe Smith

The REGEXP_REPLACE function advances the replacement function. Its syntax is listed in Table 9. The following query replaces any two or more spaces with a single space. () A subexpression contains a single space. It can be repeated twice or more times according to the instructions of {2.
SELECT REGEXP_REPLACE ('Joe Smith ',' () {2,} ', '') AS RX_REPLACE FROM dual

Joe Smith

A useful feature of backward referencing regular expressions is to store subexpressions for future reuse. This is also known as backward referencing (which is outlined in table 10 ). It allows complex replacement functions, such as switching mode on a new location or displaying duplicate words or letters. The matching part of the subexpression is saved in the temporary buffer. The buffer is numbered from left to right and accessed using the \ digit Symbol. digit is a number between 1 and 9, which matches the digit subexpression, the subexpression is displayed in parentheses.

The following example shows how to change the name Ellen Hildi Smith to Smith and Ellen Hildi by referencing Each subexpression by number.

'Ellen Hildi Smith ',
'(. *)', '\ 3, \ 1 \ 2 ')
FROM dual

Smith, Ellen Hildi

The SQL statement displays three separate subexpressions enclosed in parentheses. Each separate sub-expression contains a match metacharacters (.) followed by * metacharacters, indicating that all characters (except line breaks) must match zero or more times. Spaces separate subexpressions, and spaces must also be matched. Parentheses create a subexpression to obtain the value and can be referenced by \ digit. The first subexpression is assigned \ 1, the second \ 2, and so on. These backward references are used in the last parameter (\ 3, \ 1 \ 2) of this function. This function effectively returns the replacement substring, and arrange them in the expected format (including commas and spaces ). Table 11 details the components of the regular expression.

Backward reference is very useful for replacement, formatting, and replacement of values, and you can use them to find adjacent values. The following example shows how to use the REGEP_SUBSTR function to find any repeated alphanumeric values separated by spaces. The displayed result is a substring that identifies the duplicate word is.
'The final test is The implementation ',
'([[: Alnum:] +) ([[: space:] +) \ 1') AS substr
FROM dual


Matching Parameter options
You may have noticed that regular expression operators and functions contain an optional matching parameter. This parameter controls whether it is case sensitive, line break matching, and multi-row input is retained.

Practical application of Regular Expressions
You can not only use regular expressions in the queue, but also use regular expressions wherever SQL operators or functions are used (for example, in PL/SQL languages. You can write a trigger that uses the regular expression function to verify, generate, or extract values.

The following example demonstrates how you can use the REGEXP_LIKE operator in a column check constraint for data verification. It verifies the correct format of the social insurance number when inserting or updating it. Social insurance numbers in formats such as 123-45-6789 and 123456789 are acceptable for such column constraints. Valid data must start with three numbers followed by a hyphen, followed by two numbers and a hyphen, and finally four numbers. The other expression only allows 9 consecutive numbers. The vertical line symbol (|) separates the options.

Alter table students
Add constraint stud_ssn_ck CHECK
'^ ([[: Digit:] {3}-[[: digit:] {2}-[[: digit:] {4} | [[: digit:] {9}) $ '))

Characters starting or ending with ^ and $ are unacceptable. Make sure that your regular expression is not divided into multiple rows or contains any unnecessary spaces, unless you want the format to be so matched accordingly. Table 12 describes the components of the regular expression example.

Compare regular expressions with existing functions
Regular expressions have several advantages over common LIKE operators and INSTR, SUBSTR, and REPLACE functions. These traditional SQL functions do not facilitate pattern matching. Only the LIKE operator matches by the % and _ characters, but the LIKE operator does not support expression duplication, complex replacement, character range, Character List, POSIX character class, and so on. In addition, the new regular expression function allows you to detect repeated word and pattern exchanges. The example here provides an overview of the regular expression field and how you can use them in your application.

Enrich your toolkit
Because regular expressions help solve complex problems, they are very powerful. Some functions of regular expressions are difficult to be imitated using traditional SQL functions. When you understand the basic build blocks of this slightly mysterious language, regular Expressions will become an indispensable part of your toolkit (not only in the SQL environment but also in other programming languages ). Although attempts and errors are sometimes necessary to make your various patterns correct, the conciseness and power of regular expressions are unquestionable.
For the appendix, see the following page:
Table 1: locate metacharacters
^ Position the expression to the beginning of a row
$ Locate the expression to the end of a row

Table 2: quantifiers or repeated operator quantifiers
* Match 0 times or more times
? Match 0 times or 1 time
+ Match once or more
{M} exactly matches m times
{M,} matches at least m times
{M, n} matches at least m times but does not exceed n times

Table 3: predefined POSIX character class description
[: Alpha:] letter
[: Lower:] lowercase letter
[: Upper:] uppercase letters
[: Digit:] Number
[: Alnum:] alphanumeric characters
[: Space:] blank characters (printing prohibited), such as carriage returns, line breaks, vertical tabs, and page breaks
[: Punct:] punctuation
[: Cntrl:] control characters (printing prohibited)
[: Print:] printable characters

Table 4: expression replacement matching and group metacharacters
| Replacement separator replacement option, usually used together with the grouping operator ()
() Grouping grouping subexpressions into a replacement unit, quantizer unit, or backward reference unit (see "Back Reference)
The [char] character list represents a character list. Most metacharacters (except character classes, ^, and-metacharacters) in a character list are understood as text.

Table 5: REGEXP_LIKE operator syntax
REGEXP_LIKE (source_string, pattern
[, Match_parameter]) source_string supports character data types (CHAR, VARCHAR2, CLOB, NCHAR, NVARCHAR2, and NCLOB, but not LONG ). The pattern parameter is another name of the regular expression. Match_parameter allows optional parameters (such as handling line breaks, retaining multi-row formatting, and providing case-sensitive control ).

Table 6: REGEXP_INSTR function syntax
REGEXP_INSTR (source_string, pattern
[, Start_position
[, Occurrence
[, Return_option
[, Match_parameter]) This function searches for pattern and returns the first position of the pattern. You can specify the start_position you want to start searching. The default occurrence parameter is 1, unless you specify the mode you want to find. The default value of return_option is 0. It returns the starting position of the mode. If the value is 1, it returns the starting position of the next character that meets the matching conditions.

Table 7: explanation syntax for 5-digit plus 4-digit postal code expressions
Required Blank
[: Digit:] POSIX digital class
] End of the Character List
The {5} character list appears exactly five times
(Starting with a subexpression
-A text hyphen, because it is not a range metacharacters in the Character List
[Starting with the character list
[: Digit:] POSIX [: digit:] class
[Starting with the character list
] End of the Character List
The {4} character list appears exactly four times
) End parentheses, ending subexpressions
? ? The quantifiers match the grouping expression 0 or once, so that the four digits of code are optional.
$ Locate metacharacters to indicate the end of a line

Table 8: REGEXP_SUBSTR function syntax
REGEXP_SUBSTR (source_string, pattern
[, Position [, occurrence
[, Match_parameter]) The REGEXP_SUBSTR function returns a substring in the matching mode.

Table 9: REGEXP_REPLACE function syntax
REGEXP_REPLACE (source_string, pattern
[, Replace_string [, position
[, Occurrence, [match_parameter]) This function replaces the matching mode with a specified replace_string to allow complex "search and replace" operations.

Table 10: Back-to-reference metacharacters
The \ digit backslash is followed by a number between 1 and 9. The backslash matches the digit subexpression enclosed in parentheses.
(Note: The backslash has another meaning in the regular expression, depending on the context, it may also represent the Escape character.

Table 11: Regular Expression expression description
(Start of the first subexpression
. Match any single character except line breaks
* Repeat the operator to match the previous. metacharacters 0 to n times.
) The end of the first subexpression. The matching result is in \ 1.
(In this example, the result is Ellen .)
Required Blank
(Start of the second subexpression
. Match any single character except line breaks
* Repeat the operator to match the previous. metacharacters 0 to n times.
) The end of the second subexpression. The matching result is in \ 2.
(In this example, the result is Hildi .)
(Starting with the third subexpression
. Match any single character except line breaks
* Repeat the operator to match the previous. metacharacters 0 to n times.
) The end of the third subexpression. The matching result is in \ 3.
(In this example, the result is Smith .)

Table 12: Description of Regular Expressions for Social Insurance Numbers
^ The first character of the line (the regular expression cannot have any leading character before matching .)
(Start the subexpression and list the replaceable options separated by | metacharacters.
[Starting with the character list
[: Digit:] POSIX digital class
] End of the Character List
The {3} character list exactly appears three times
[Starting with the character list
[: Digit:] POSIX digital class
] End of the Character List
The {2} character list appears exactly twice
-Another hyphen
[Starting with the character list
[: Digit:] POSIX digital class
] End of the Character List
The {4} character list appears exactly four times
| Replace metacharacters; end the first option and start the next replace expression
[Starting with the character list
[: Digit:] POSIX digital class
] End of the Character List
The {9} character list appears exactly nine times
) End parentheses. End the child expression group to be replaced.
$ Locate metacharacters to indicate the end of the line; no additional characters can match the pattern

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.