Oracle 10g:pl/sql Regular expression (regular expression) manual _oracle

Source: Internet
Author: User
Tags alphabetic character character classes html tags numeric list of character classes posix printable characters alphanumeric characters

Oracle's formal expression is implemented in the form of various SQL functions and a WHERE clause operator. If you are unfamiliar with regular expressions, this article will give you an idea of this new, incredibly powerful, yet seemingly mysterious feature. Readers who are already familiar with regular expressions can learn how to apply this functionality in an Oracle SQL language environment.

What is a regular expression?
A regular expression consists of one or more character literal and/or meta characters. In the simplest form, regular expressions consist only of character words, such as regular expression cat. It was read as the letter C, followed by the letters A and T, which matched strings such as cat, location and catalog. Metacharacters provides an algorithm to determine how Oracle handles characters that make up a regular expression. When you understand the meaning of various meta characters, you will realize that regular expressions are very powerful for finding and replacing specific text data.

Validating data, recognizing the presence of duplicate keywords, detecting unnecessary spaces, or parsing strings are just a subset of the many applications of regular expressions. You can use them to verify the format of phone numbers, postal codes, e-mail addresses, social Security numbers, IP addresses, file names, and path names. In addition, you can look for patterns such as HTML tags, numbers, dates, or anything that conforms to any pattern in any text data, replacing them with other patterns.

Use regular expressions with Oracle Database 10g
You can use the latest introduced Oracle SQL regexp_like operators and Regexp_instr, REGEXP_SUBSTR, and Regexp_replace functions to play the role of regular expressions. You will appreciate how this new feature complements the LIKE operator and the INSTR, SUBSTR, and REPLACE functions. In fact, they are similar to the existing operators, but now add a powerful pattern-matching function. The data being searched can be either a simple string or a large amount of text stored in the database character columns. Regular expressions allow you to search, replace, and validate data in a way that you have never thought of before, and provide a high degree of flexibility.

Basic examples of regular expressions
Before using this new feature, you need to understand the meaning of some meta characters. a period (.) matches any character in a regular expression (except for line breaks). For example, a regular expression a.b matches a string that first contains the letter A, followed by any of the other individual characters (except for line breaks), followed by the letter B. String Axb, Xaybx, and ABBA are all matched, because this pattern is hidden in the string. If you want to accurately match a three-letter string that starts with a and ends with B, you must position the regular expression. The caret (^) metacharacters indicate the beginning of a line, while the dollar sign ($) indicates the end of a line (see table 1: The schedule is at page 4th). Therefore, the regular expression ^a.b$ matches the string aab, ABB, or AXB. This is compared to the similar pattern matching a_b provided by the LIKE operator, where (_) is the word wildcard wildcard.

By default, a single character or list of characters in a regular expression matches only once. To indicate a character that appears multiple times in a regular expression, you can use a quantifier, which is also called a repeating operator. If you want to get a matching pattern that starts with the letter A and ends with the letter B, your regular expression looks like this: ^a.*b$. * Metacharacters Repeat the preceding metacharacters (.) to indicate a match 0 times, one time, or more. The equivalent mode of the LIKE operator is A%b, which uses the percent sign (%) to indicate that any character appears 0 times, one time, or more than once.

Table 2 shows a complete list of repeat operators. Note that it contains special duplicate options that enable greater flexibility than the existing like wildcard characters. If you enclose an expression in parentheses, this effectively creates a subexpression that can repeat a certain number of times. For example, regular expression B (an) *a matches BA, Bana, Banana, yourbananasplit, etc.

Oracle's regular expression implementations support POSIX (Portable operating system Interface) character classes, as listed in table 3. This means that the type of character you are looking for can be very special. Suppose you want to write a like condition that finds only non-alphanumeric characters-the WHERE clause as a result might inadvertently become very complex.

The POSIX character class must be contained in a list of characters indicated by square brackets ([]). For example, the regular expression [[: Lower:]] matches a lowercase alphabetic character, while [[: Lower:]]{5} matches five consecutive lowercase alphabetic characters.

In addition to the POSIX character class, you can place individual characters in a list of characters. For example, regular expressions ^ab[cd]ef$ match strings ABCEF and Abdef. C or D must be selected.

Most metacharacters in a list of characters are considered text, except for the caret (^) and hyphen (-) characters. Regular expressions can look complex because some metacharacters have multiple meanings that vary with context. ^ is such a meta character. If you use it as the first character of a word characters, it represents the characters of a word. Therefore, [^[:d igit:]] looks for a pattern that contains any non-numeric characters, and ^[[:d Igit:] To find a matching pattern that starts with a number. A hyphen (-) indicates a range in which the regular expression [a-m] matches any letter between the letter A and the letter M. However, if it is the first character in a character line (as in [-AFG]), it represents a hyphen.

An example earlier describes the use of parentheses to create a subexpression that allows you to enter alternate options by entering replacement metacharacters, which are separated by vertical bars (|).

For example, regular expression T (a|e|i) n allows three possible characters to be replaced between letters T and N. Matching patterns include words such as Tan, ten, tin, and Pakistan, but do not include teen, mountain, or tune. As an alternative, regular expression T (a|e|i) n can also be represented as a list of characters t[aei]n. The meta characters are summarized in table 4. Although there are more meta characters, this concise overview is sufficient to understand the formal expressions used in this article.

Regexp_like operator
The regexp_like operator introduces you to regular expression functionality when used in an Oracle database. Table 5 lists the syntax for Regexp_like.

The following SQL query's WHERE clause shows the regexp_like operator, which searches the ZIP column for a pattern that satisfies the regular expression [^[:d igit:]. It retrieves those ZIP column values in the ZIPCODE table that contain any non-numeric character rows.

SELECT zip from ZipCode WHERE regexp_like (Zip, ' [^[:d igit:]] ')

Zip
-----
Ab123
123xy
007ab
Abcxy

The example of this formal expression is composed only of metacharacters, and more specifically POSIX character class digit separated by colons and brackets. The second set of square brackets (as shown in [^[:d Igit:]) includes a list of character classes. As mentioned earlier, you need to do this because you can only use the POSIX character classes to build a list of characters.

Regexp_instr function
This function returns the starting position of a pattern, so its function is very similar to the INSTR function. The syntax for the new REGEXP_INSTR function is given in table 6. The main difference between the two functions is that regexp_instr lets you specify a pattern rather than a specific search string; So it provides more functionality. The following example uses REGEXP_INSTR to return a string to the starting position of the five-bit ZIP code mode in CA 91234, Joe Smith, 10045 Berry Lane, San Joseph. If the regular expression is written as [[:d igit:]]{5}, you will get the starting position of the number instead of the ZIP code, since 10045 is the first time that five consecutive digits appear. Therefore, you must position the expression at the end of the line, as shown in the $ metacharacters, which displays the starting position of the postal code, regardless of the number of numbers.

SELECT regexp_instr (' Joe Smith, 10045 Berry Lane, San Joseph, CA 91234 ',
' [[:d igit:]]{5}$ '] as rx_instr from dual

Rx_instr
----------
45


Writing more complex patterns
Let's expand on the ZIP code pattern in the previous example to include an optional four-bit digital pattern. Your pattern may now look like this: [[:d igit:]]{5} (-[[:d igit:]]{4})? $. If your source string ends with a 5-bit postal code or a 5-bit + 4-bit postal code, you will be able to display the starting position of the pattern.

SELECT regexp_instr (' Joe Smith, 10045 Berry Lane, San Joseph, CA 91234-1234 ',
' [[:d igit:]]{5} (-[[:d igit:]]{4})? $ ') as Starts_at from dual

Starts_at
----------
44

In this example, the subexpression in parentheses (-[[:d igit:]]{4}) is pressed? The repeat operator's instructions are repeated 0 times or once. In addition, attempting to achieve the same results with traditional SQL functions is even a challenge for SQL experts. To better illustrate the different components of this regular expression example, table 7 contains a description of a single literal and metacharacters.

Regexp_substr function
The REGEXP_SUBSTR function, similar to the SUBSTR function, is used to extract part of a string. Table 8 shows the syntax for this new function. In the following example, the matching pattern [^,]* string will be returned. The regular expression searches for a comma followed by a space, and then, as instructed by [^,]*, searches for 0 or more characters that are not commas, and finally finds another comma. This pattern looks a bit like a comma-delimited string of values.

SELECT regexp_substr (' The ' field, second field, third field ', ', [^,]*, ') from dual

Regexp_substr (' FIR
------------------
, second field,


Regexp_replace function
Let's first look at the traditional replace SQL function, which replaces a string with another string. Suppose your data has unnecessary spaces in the body, and you want to replace them with a single space. With the Replace function, you need to accurately list how many spaces you want to replace. However, the number of extra spaces may not be the same throughout the body. The following example has three spaces between Joe and Smith. The parameters of the Replace function specify that you want to replace two spaces with a single space. In this case, the result leaves an extra space between Joe and Smith in the original string.

SELECT replace (' Joe Smith ', ', ', ') as REPLACE from dual

REPLACE
---------
Joe Smith

The Regexp_replace function advances the substitution function a step further, and its syntax is listed in Table 9. The following query replaces any two or more spaces with a single space. () A subexpression contains a single space that can be repeated two or more times as instructed by {2.}
SELECT regexp_replace (' Joe Smith ', ' () {2,} ', ') as Rx_replace from dual

Rx_replace
----------
Joe Smith

A useful feature of a back-referencing regular expression is the ability to store the subexpression for later reuse, which is also referred to as a back reference (outlined in table 10). It allows for complex substitution functions, such as swapping patterns in new locations or displaying repeated occurrences of words or letters. The matching part of the subexpression is saved in the temporary buffer. The buffer is numbered from left to right and accessed using the \digit symbol, where digit is a number between 1 and 9 that matches the digit subexpression, which is displayed with a set of parentheses.

The following example shows the conversion of the name Ellen Hildi Smith to Smith, Ellen Hildi by reference to each subexpression by number.

SELECT Regexp_replace (
' Ellen hildi Smith ',
' (. *) (. *) (. *) ', ' \3, \1 \2 ')
From dual

Regexp_replace (' EL
------------------
Smith, Ellen Hildi

The SQL statement shows the three separate subexpression enclosed in parentheses. Each individual subexpression contains a matching meta character (.) and follows the * metacharacters, indicating that any character (other than a newline character) must match 0 or more times. A space separates each subexpression, and a space must also match. Parentheses create a subexpression that gets the value, and can be referenced with \digit. The first subexpression is assigned to \1, the second \2, and so on. These back references are used in the last parameter of the function (\3, \1 \2), which effectively returns the replacement substring and arranges them in the desired format (including commas and spaces). Table 11 details the various components of the regular expression.

A back reference is useful for replacing, formatting, and substituting values, and you can use them to find adjacent values. The following example shows the use of the REGEP_SUBSTR function to find any repeated alphanumeric values that are separated by spaces. The displayed results give a substring that identifies the repeated occurrence of the word is.
SELECT Regexp_substr (
' The final test is the implementation ',
' ([[: alnum:]]+) ([[: space:]]+) \1 ') as substr
From dual

SUBSTR
------
IS is


Matching parameter options
You may have noticed that the regular expression operators and functions contain an optional matching parameter. This parameter controls whether case sensitive, line break matching, and retention of multiple lines of input.

The practical application of regular expressions
Not only can you use regular expressions in queues, you can also use regular expressions anywhere you use SQL operators or functions, such as in Pl/sql languages. You can write triggers that take advantage of regular expression functionality to validate, generate, or extract values.

The next example shows how you can apply the Regexp_like operator to data validation in a column check constraint. It verifies the correct social Security number format when inserting or updating. Social Security numbers in formats such as 123-45-6789 and 123456789 for this column constraint is an acceptable value. Valid data must start with three digits, followed by a hyphen, plus two digits and a hyphen, and finally four digits. The other expression allows only 9 consecutive digits. The vertical bar symbol (|) separates the options.

ALTER TABLE Students
ADD CONSTRAINT Stud_ssn_ck CHECK
(Regexp_like (SSN,
' ^ ([[:d igit:]]{3}-[[:d igit:]]{2}-[[:d igit:]]{4}| [[:d Igit:]] {9}) $'))

The characters at the beginning or end of the ^ and $ indicate are not acceptable. Make sure that your regular expression is not divided into multiple lines or contains any unnecessary spaces unless you want the format to be so and match accordingly. Table 12 illustrates the various components of the regular expression sample.

Compare regular expressions to existing functionality
Regular expressions have several advantages over common like operators and INSTR, SUBSTR, and REPLACE functions. These traditional SQL functions are not convenient for pattern matching. Only the like operator matches by using the% and _ characters but like does not support the repetition of expressions, complex substitutions, character ranges, word characters and POSIX character classes, and so on. In addition, the new regular expression function allows you to detect repeated occurrences of word and pattern swapping. The examples here give you an overview of the regular expression areas and how you can use them in your applications.

To actually enrich your toolkit
Because regular expressions help solve complex problems, they are very powerful. Some of the features of regular expressions are difficult to emulate with traditional SQL functions. When you understand the underlying building blocks of this slightly cryptic language, regular expressions will become an integral part of your toolkit (not only in SQL but also in other programming locales). In order for your patterns to be correct, while attempts and errors are sometimes necessary, the simplicity and strength of regular expressions are unquestionable.
Schedule See next page:
Table 1: Locating meta characters
Metacharacters description
^ to position an expression at the beginning of a line
$ to position an expression at the end of a line

Table 2: Description of quantifiers or repeated operator quantifiers
* Match 0 or more times
? Match 0 times or 1 times
+ match 1 or more times
{m} exactly matches m times
{m,} matches at least m times
{m, n} matches at least m times but not more than n times


Table 3: Predefined POSIX character class descriptions
[: Alpha:] alphabetic characters
[: Lower:] Lowercase alphabetic characters
[: Upper:] Capital Letter Characters
[:d Igit:] Number
[: Alnum:] alpha-numeric characters
[: space:] whitespace characters (prohibit printing), such as carriage returns, line breaks, vertical tabs, and page breaks
[:p UNCT:] punctuation character
[: Cntrl:] control character (prohibit printing)
[:p rint:] printable characters


Table 4: Substitution matching and grouping meta character descriptions for expressions
| Replace the separator substitution option, which is usually used with the group operator ()
() groups Group A subexpression into a replacement unit, quantifier unit, or a back reference unit (see the "Back Reference" section)
The [Char] character characters represents a word characters; most of the metacharacters in a list of characters (except character classes, ^ and-metacharacters) are interpreted as text


Table 5:regexp_like operator Syntax description
Regexp_like (source_string, pattern
[, Match_parameter]) Source_string supports character data types (char, VARCHAR2, CLOB, NCHAR, NVARCHAR2, and NCLOB, but does not include LONG). The pattern parameter is another name for the regular expression. Match_parameter allows optional parameters such as handling line breaks, retaining multiline formatting, and providing control over case sensitivity.


Table 6:regexp_instr function Syntax description
Regexp_instr (source_string, pattern
[, Start_position
[, Occurrence
[, Return_option
[, Match_parameter]]] The function looks for pattern and returns the first position of the schema. You can optionally specify the start_position you want to start your search. The occurrence parameter defaults to 1 unless you specify that you want to find a pattern that appears next. The default value of Return_option is 0, which returns the starting position of the pattern, and a value of 1 returns the starting position of the next character that matches the criteria.


Table 7: Description of 5 digit plus 4-bit zip code expression
Blanks that must match
[:d Igit:] POSIX Digital Classes
] The end of the word characters
{5} character characters just repeated 5 times
(The beginning of a child expression
-a literal hyphen, because it is not a character list within the range metacharacters
[Beginning of character characters
[:d Igit:] POSIX [:d igit:] Class
[Beginning of character characters
] The end of the word characters
{4} character characters just repeated 4 times
) ends the parentheses, ending the subexpression
? ? Quantifiers match the grouped subexpression 0 or 1 times to make the 4-bit code optional
$ positional metacharacters, indicating end of line


table 8:regexp_substr function Syntax Description
Regexp_substr (source_string, pattern
[, position [, occurrence
[, Match_parameter]]] regexp_substr function returns a substring of the matching pattern.


table 9:regexp_replace function Syntax Description
Regexp_replace (source_string, pattern
[, replace_string [, Position
[, occurrence, [match_parameter]]] The function replaces the matching pattern with a specified replace_string, allowing for complex "search and replace" operations.


Table 10: metacharacters character description of the back reference meta Word
The \digit backslash is followed by a number from 1 to 9, and the backslash matches the first digit subexpression enclosed in parentheses.
(Note: A backslash has another meaning in a regular expression that, depending on the context, may also represent an Escape character.)


Table 11: pattern-swapping regular expression description of the regular expression project description
(beginning of the first subexpression  
.) matches any single character, except a newline character, with a
* repeat operator, matching the. metacharacters 0 To the end of the first subexpression of n-th
), and the matching result is obtained in \1
(in this case, the result is Ellen.)
  White space  
that must exist (the beginning of the second subexpression  
. Matches any single character  
* Repeat operator other than a newline character, matching the. metacharacters 0 to N times
) second subexpression The result of the match is obtained in \2
(in this case, the result is hildi.)
  Blank  
(the beginning of the third subexpression  
. Matches any single character  
* Repeat operator, except for line breaks), matching the end of the third subexpression of the. metacharacters 0 to n times
; The result is obtained in \3
(in this case, the result is Smith.) )


Table 12: Social Security numbers Description of regular expressions regular expression item Description
^ start character (regular expression cannot have any leading characters until it matches.)
(Start the subexpression and list the replaceable options separated by | metacharacters
[the beginning of the character characters
[:d igit:] POSIX number class  
] characters end  
{3} Word characters just repeat 3 times &NB Sp
-hyphen  
[beginning of character characters  
[:d igit:] POSIX number class
] characters end  
{2} Word characters exactly repeat 2  
-another hyphen & nbsp
[beginning of character characters  
[:d igit:] POSIX Numeric class  
] characters end  
{4} Word characters exactly repeat 4 times  
| replace metacharacters; end the first option and start the next An alternate expression
[beginning of character characters  
[:d igit:] POSIX Numeric class
] characters end of
{9} Word characters exactly repeat 9  
) closing parenthesis, ending the expression group for substitution $ locates the metacharacters, indicating the end of the line; no additional characters can conform to the pattern  

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.