Use regular expressions to write better SQL statements

Source: Internet
Author: User
Tags character classes control characters list of character classes posix printable characters alphanumeric characters

What is a regular expression?

A regular expression is composed of one or more character characters and/or metacharacters. In the simplest format, a regular expression only consists of characters, such as a regular expression cat. It is read as C followed by letters A and T. This pattern matches strings such as CAT, location, and catalog. Metacharacters provide algorithms to determine how Oracle processes the characters that comprise a regular expression. When you understand the meanings of metacharacters, you will understand that regular expressions are very powerful for searching and replacing specific text data.

Verify data, identify duplicate keywords, detect unnecessary spaces, or analyze strings as part of many applications of regular expressions. You can use them to verify the format of the phone number, zip code, email address, social security number, IP address, file name, and path name. In addition, you can find patterns such as HTML tags, numbers, dates, or anything that fits any pattern in any text data and replace them with other patterns.

Use regular expressions for Oracle Database 10 GB

You can use the recently introduced Oracle SQL regexp_like operator, regexp_instr, regexp_substr, and regexp_replace functions to play the role of a regular expression. You will understand how this new function supplements the like operator and the instr, substr, and replace functions. In fact, they are similar to existing operators, but now they have added powerful pattern matching functions. The searched data can be a simple string or a large number of text stored in the database character column. Regular Expressions allow you to search, replace, and verify data in a way you have never imagined before, and provide high flexibility.

Basic examples of Regular Expressions

Before using this new feature, you need to understand the meaning of metacharacters. The period (.) matches any character in a regular expression (except for line breaks ). For example, the regular expression A. B matches the string with the first letter A, followed by any other single character (except the line break), followed by the letter B. The AXB, xaybx, and ABBA strings match each other, because this mode is hidden in the string. If you want to precisely match a string starting with a and ending with B, you must locate the regular expression. The unsigned (^) metacharacters indicate the beginning of a row, while the dollar sign ($) indicates the end of a row (see table 1 ). Therefore, the regular expression ^ A. B $ matches the string AAB, ABB, or AXB. This method matches a_ B with a similar pattern provided by the like operator, where (_) is a single character wildcard.

By default, a separate character or character list in a regular expression only matches once. To indicate a character that appears multiple times in a regular expression, you can use a quantizer, also known as a repetition operator .. If you want to get a matching pattern that starts from letter A and ends with letter B, your regular expression looks like this: ^ A. * B $. * The metacharacters repeat the previous metacharacters (.) for zero, one, or more times. The equivalent pattern of the like operator is a % B, in which the percent sign (%) is used to indicate zero, one or multiple occurrences of any character.

Table 2 provides a complete list of repeated operators. Note that it contains special repeated options, which provide greater flexibility than the existing like wildcard. If you enclose an expression in parentheses, this effectively creates a subexpression that can be repeated for a certain number of times. For example, a regular expression B (an) * A matches Ba, Bana, banana, yourbananasplit, and so on.

Oracle Regular Expression implementation supports POSIX (Portable Operating System Interface) character classes, see the content listed in table 3. This means that the character type you want to search for can be very special. Suppose you want to write a like condition that only looks for non-letter characters-The WHERE clause as the result may become very complicated inadvertently.

POSIX character classes must be included in a list of characters indicated by square brackets. For example, the regular expression [[: lower:] matches a lowercase letter character, while [[: lower:] {5} matches five consecutive lowercase letter characters.

In addition to the POSIX character class, you can place individual characters in a character list. For example, the regular expression ^ AB [cd] EF $ matches the string abcef and abdef. Select C or D.

Except for the delimiters (^) and hyphens (-), most metacharacters in the character list are considered as text. Regular Expressions seem complicated because some metacharacters have multiple meanings that depend on the context. ^ Is such a metacharacter. If you use it as the first character of a character list, it represents the non. Therefore, [^ [: digit:] searches for matching modes that contain any non-numeric characters, and ^ [[: digit:] searches for matching modes that start with numbers. A hyphen (-) indicates a range. The regular expression [a-m] matches any letter from A to M. However, if it is the first character in a line (for example, in [-AFG]), it represents a hyphen.

In the previous example, parentheses are used to create subexpressions. They allow you to enter replacement metacharacters to enter replacement options separated by vertical bars (|.

For example, the regular expression T (A | E | I) N allows replacement of three possible characters between letters T and N. Match modes include words such as tan, ten, tin, and Pakistan, but do not include teen, mountain, or tune. The regular expression T (A | E | I) n can also be expressed as a character list T [AEI] n. Table 4 summarizes these metacharacters. Although there are more metacharacters, This concise overview is enough to understand the regular expressions used in this article.

Regexp_like Operator

The regexp_like operator describes the regular expression functions used in Oracle databases. Table 5 lists the regexp_like syntax.

The where clause of the following SQL query shows the regexp_like operator, which searches for the pattern that satisfies the regular expression [^ [: digit:] in the zip column. It retrieves rows whose zip column values in the zipcode table contain any non-numeric characters.

SELECT zip FROM zipcode WHERE REGEXP_LIKE(zip, '[^[:digit:]]')   ZIP  -----  ab123  123xy  007ab  abcxy

The example of this regular expression is only composed of metacharacters. Specifically, it is a POSIX character class digit separated by colons and square brackets. The second square brackets (as shown in [^ [: digit:]) contain a list of character classes. As mentioned above, this is because you can only use the POSIX character class to build a character list.

Regexp_instr Function

This function returns the starting position of a mode, so its function is very similar to the instr function. The syntax of the new regexp_instr function is given in Table 6. The main difference between the two functions is that regexp_instr allows you to specify a mode instead of a specific search string; therefore, it provides more functions. The following example uses regexp_instr to return the starting position of the five-digit ZIP code mode in the string Joe Smith, 10045 berry Lane, San Joseph, and Ca 91234. If the regular expression is written as [[: digit:] {5}, you will get the starting position of the house number instead of the zip code, because 10045 is the first time that five consecutive numbers appear. Therefore, you must position the expression at the end of the row. As shown in $, this function displays the starting position of the ZIP Code regardless of the number of the house number.

SELECT REGEXP_INSTR('Joe Smith, 10045 Berry Lane, San Joseph, CA 91234',     '[[:digit:]]{5}$') AS rx_instr FROM dual   RX_INSTR  ----------  45 
Write more complex models

Let's expand the zip code pattern in the previous example to include an optional four-digit pattern. Your mode may now look like this: [[: digit:] {5} (-[[: digit:] {4 })? $. If your source string ends with a 5-bit postal code or a 5-bit + 4-bit postal code, you will be able to display the start position of this mode.

SELECT REGEXP_INSTR('Joe Smith, 10045 Berry Lane, San Joseph, CA 91234-1234',  ' [[:digit:]]{5}(-[[:digit:]]{4})?$') AS starts_at FROM dual   STARTS_AT  ----------  44 

In this example, the subexpression (-[[: digit:] {4}) in the ARC will press? Indicates that the operator is repeated zero or once. In addition, attempts to use traditional SQL functions to achieve the same results are even a challenge for SQL experts. To better illustrate the different components of this regular expression example, table 7 contains a description of a single text and metacharacters.

Regexp_substr Function

Similar to the substr function, the regexp_substr function is used to extract a part of a string. Table 8 shows the syntax of the new function. In the following example, the string matching mode [^,] * will be returned. This regular expression searches for a comma followed by a space. Then, follow the instructions in [^,] * to search for zero or more characters that are not commas, and finally find another comma. This pattern looks a bit like a comma-separated value string.

SELECT REGEXP_SUBSTR('first field, second field , third field',  ', [^,]*,') FROM dual   REGEXP_SUBSTR('FIR  ------------------  , second field , 
Regexp_replace Function

Let's first take a look at the traditional replace SQL function, which replaces one string with another. Assume that your data contains unnecessary spaces in the body, and you want to replace them with a single space. Using the replace function, You Need To accurately list the number of spaces you want to replace. However, the number of extra spaces may not be the same everywhere in the body. The following example contains three spaces between Joe and Smith. The replace function parameter specifies to replace two spaces with one space. In this case, an extra space is left between Joe and Smith of the original string.

SELECT REPLACE('Joe Smith',' ', ' ') AS replace FROM dual   REPLACE  ---------  Joe Smith 

The regexp_replace function advances the replacement function. Its syntax is listed in Table 9. The following query replaces any two or more spaces with a single space. () A subexpression contains a single space. It can be repeated twice or more times according to the instructions of {2.

SELECT REGEXP_REPLACE('Joe Smith',  '( ){2,}', ' ') AS RX_REPLACE FROM dual  RX_REPLACE  ----------  Joe Smith 

Backward reference

A useful feature of regular expressions is the ability to store subexpressions for future reuse; this is also known as backward reference (which is outlined in table 10 ). It allows complex replacement functions, such as switching mode on a new location or displaying duplicate words or letters. The matching part of the subexpression is saved in the temporary buffer. The buffer is numbered from left to right and accessed using the/Digit Symbol. digit is a number between 1 and 9, which matches the digit subexpression, the subexpression is displayed in parentheses.

The following example shows how to change the name Ellen Hildi Smith to Smith and Ellen Hildi by referencing Each subexpression by number.

SELECT REGEXP_REPLACE(  'Ellen Hildi Smith',  '(.*) (.*) (.*)', '/3, /1 /2')  FROM dual   REGEXP_REPLACE('EL  ------------------  Smith, Ellen Hildi 

The SQL statement displays three separate subexpressions enclosed in parentheses. Each separate sub-expression contains a match metacharacters (.) followed by * metacharacters, indicating that all characters (except line breaks) must match zero or more times. Spaces separate subexpressions, and spaces must also be matched. Parentheses create a subexpression to obtain the value and can be referenced by/digit. The first subexpression is assigned a value of/1, the second value of/2, and so on. These backward references are used in the last parameter (/3,/1/2) of this function. This function effectively returns the replacement substring, and arrange them in the expected format (including commas and spaces ). Table 11 details the components of the regular expression.

Backward reference is very useful for replacement, formatting, and replacement of values, and you can use them to find adjacent values. The following example shows how to use the regep_substr function to find any repeated alphanumeric values separated by spaces. The displayed result is a substring that identifies the duplicate word is.

SELECT REGEXP_SUBSTR(  'The final test is is the implementation',  '([[:alnum:]]+)([[:space:]]+)/1') AS substr  FROM dual   SUBSTR  ------  is is 
Matching Parameter options

You may have noticed that regular expression operators and functions contain an optional matching parameter. This parameter controls whether it is case sensitive, line break matching, and multi-row input is retained.

Practical application of Regular Expressions

You can not only use regular expressions in the queue, but also use regular expressions wherever SQL operators or functions are used (for example, in PL/SQL languages. You can write a trigger that uses the regular expression function to verify, generate, or extract values.

The following example demonstrates how you can use the regexp_like operator in a column check constraint for data verification. It verifies the correct format of the social insurance number when inserting or updating it. Social insurance numbers in formats such as 123-45-6789 and 123456789 are acceptable for such column constraints. Valid data must start with three numbers followed by a hyphen, followed by two numbers and a hyphen, and finally four numbers. The other expression only allows 9 consecutive numbers. The vertical line symbol (|) separates the options.

ALTER TABLE students  ADD CONSTRAINT stud_ssn_ck CHECK  (REGEXP_LIKE(ssn,  '^([[:digit:]]{3}-[[:digit:]]{2}-[[:digit:]]{4}|[[:digit:]]{9})$'))

Characters starting or ending with ^ and $ are unacceptable. Make sure that your regular expression is not divided into multiple rows or contains any unnecessary spaces, unless you want the format to be so matched accordingly. Table 12 describes the components of the regular expression example.

Compare regular expressions with existing functions

Regular expressions have several advantages over common like operators and instr, substr, and replace functions. These traditional SQL functions do not facilitate pattern matching. Only the like operator matches by the % and _ characters, but the like operator does not support expression duplication, complex replacement, character range, Character List, POSIX character class, and so on. In addition, the new regular expression function allows you to detect repeated word and pattern exchanges. The example here provides an overview of the regular expression field and how you can use them in your application.

Enrich your toolkit

Because regular expressions help solve complex problems, they are very powerful. Some functions of regular expressions are difficult to be imitated using traditional SQL functions. When you understand the basic build blocks of this slightly mysterious language, regular Expressions will become an indispensable part of your toolkit (not only in the SQL environment but also in other programming languages ). Although attempts and errors are sometimes necessary to make your various patterns correct, the conciseness and power of regular expressions are unquestionable.

Alice rischert(Ar280@yahoo.com) is the chair of the Department of Computer Technology and Application of Database Application Development and Design at Columbia University. She compiled the Oracle SQL interactive manual version 2nd (Prentice Hall, 2002) and the forthcoming Oracle SQL example (Prentice Hall, 2003 ). Rischert has over 15 years of experience as a Database Designer, DBA, and project leader in Fortune 100 companies, and has been using Oracle products since Oracle version 5.

Table 1: locate metacharacters

Metacharacters Description
^ Position the expression to the beginning of a line
$ Position the expression to the end of a row

Table 2: quantifiers or repeated Operators

Quantifiers Description
* Matches 0 times or more times
? Match 0 times or 1 time
+ Match once or more
{M} Exactly matchMTimes
{M ,} At least matchMTimes
{M, n} At least matchMTimes but no moreNTimes

Table 3: predefined POSIX character classes

Character class Description
[: Alpha:] Letter
[: Lower:] Lowercase letter
[: Upper:] Uppercase letter
[: Digit:] Number
[: Alnum:] Alphanumeric characters
[: Space:] Blank characters (printing prohibited), such as carriage returns, line breaks, vertical tabs, and page breaks
[: Punct:] Punctuation
[: Cntrl:] Control characters (printing prohibited)
[: Print:] Printable characters

Table 4: expression replacement matching and grouping

Metacharacters Description
| Replace Separate replacement options, usually with grouping operators()Use together
() Group Group A subexpression into a replacement unit, quantizer unit, or backward reference unit (see "Backward reference)
[Char] Character List Represents a character list; most metacharacters in a character list (except character classes,^And-Except for metacharacters.

Table 5:Regexp_likeOperator

Syntax Description
Regexp_like (source_string, Pattern
[, Match_parameter])
Source_stringSupports character data types (Char, varchar2, clob, nchar, nvarchar2AndNclob, But does not includeLong).PatternThe parameter is another name of the regular expression.Match_parameterOptional parameters are allowed (such as handling line breaks, retaining multiline formatting, and case-sensitive control ).

Table 6:Regexp_instrFunction

Syntax Description
Regexp_instr (source_string, Pattern
[, Start_position
[, Occurrence
[, Return_option
[, Match_parameter])
Search for this functionPatternAnd returns the first position of the mode. You can specifyStart_position.OccurrenceThe default parameter value is 1, unless you specify the mode you want to search.Return_optionThe default value is 0. It returns the starting position of the mode. If the value is 1, it returns the starting position of the next character that meets the matching conditions.

Table 7: 5-digit plus 4-digit postal code expression

Syntax Description
Required Blank
[: Digit:] POSIX number
] End of the Character List
{5} The character list appears exactly five times
( Starting with a subexpression
- A text concatenation because it is not a range metacharacters in the Character List
[ Start of the Character List
[: Digit:] POSIX[: Digit:]Class
[ Start of the Character List
] End of the Character List
{4} The character list appears exactly four times
) Ending parentheses, ending subexpressions
? ?The quantifiers match the grouping expression 0 or once, so that the four digits of code are optional.
$ Locate metacharacters to indicate the end of a line

Table 8:Regexp_substrFunction

Syntax Description
Regexp_substr (source_string, Pattern
[, Position [, occurrence
[, Match_parameter])
Regexp_substrThe function returns a substring of the matching mode.

Table 9:Regexp_replaceFunction

Syntax Description
Regexp_replace (source_string, Pattern
[, Replace_string [, position
[, Occurrence, [match_parameter])
This function uses a specifiedReplace_stringTo replace the matching pattern, allowing complex "search and replace" operations.

Table 10: Back-to-reference metacharacters

Metacharacters Description
/Digit Backslash Followed by a number between 1 and 9, the backslash matches the digit subexpression enclosed in parentheses.
(Note: The backslash has another meaning in the regular expression, depending on the context, it may also represent the escape character.

Table 11: schema exchange Regular Expressions

Regular Expression Project Description
( Start of the first subexpression
. Match any single character except line breaks
* Repeat operator, matching the previous.0NTimes
) The end of the first subexpression. The matching result is/1
(In this example, the result isEllen.)
Required Blank
( Start of the second subexpression
. Match any single character except line breaks
* Repeat operator, matching the previous.0NTimes
) The end of the second subexpression. The matching result is/2
(In this example, the result isHildi.)
Blank
( Start of the third subexpression
. Match any single character except line breaks
* Repeat operator, matching the previous.0NTimes
) The end of the third subexpression. The matching result is/3
(In this example, the result isSmith.)

Table 12: Description of regular expressions of Social Insurance Numbers

Regular Expression Project Description
^ The first character of the line (the regular expression cannot have any leading character before matching .)
( Start the subexpression and list replaceable options separated by | metacharacters
[ Start of the Character List
[: Digit:] POSIX number
] End of the Character List
{3} The character list exactly appears three times
- Hyphen
[ Start of the Character List
[: Digit:] POSIX number
] End of the Character List
{2} The character list appears exactly twice
- Another hyphen
[ Start of the Character List
[: Digit:] POSIX number
] End of the Character List
{4} The character list appears exactly four times
| Replace metacharacters; end the first option and start the next replacement expression
[ Start of the Character List
[: Digit:] POSIX number
] End of the Character List
{9} The character list appears exactly nine times
) End parentheses, ending the child expression group used for replacement
$ Locate metacharacters to indicate the end of a line. No additional characters can match the pattern.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.