How to implement regular expressions in DB2

Source: Internet
Author: User
Tags perl regular expression expression engine

What is a regular expression?

Regular expressions are used to find and replace patterns in strings. The regular expression is defined in a certain syntax, and the regular expression engine uses this syntax and compares it to a string. The engine returns an indication whether the string matches the syntax, that is, whether the string contains substrings that can be derived from the syntax. In addition, the engine is able to return matching substrings. The term mode (pattern) is used to denote the syntax.

The most basic pattern consists of only a single letter. When compared to this pattern, the string containing the letter is a "match". For example, if the pattern is "a", then the string "ABCD" is a match, and the string "xyz" is not. The powerful function of regular expressions comes from predefined operators (also known as metacharacters), which can represent patterns in a small amount of space. Depending on the dialect and supported features, different meta characters can be used. Typically, some of the available characters are as follows:

| ― Two options

[]― Group

*― multiple occurrences (also matches 0 occurrences)

+― multiple occurrences (at least one time)

? -Random occurrences

\\\\― back Slash

Different systems implement various extensions of common regular expressions. The regular expressions used in the programming language Perl support further abbreviations. The libraries used in this article implement these extensions. Here is an excerpt of some of the abbreviations that can be used in the Perl regular expression language:

\\s― any whitespace character

\\w― any alphanumeric character

\\d― any numeric character

Another more advanced example is the pattern "[a-z]* = ([0-9]|0x00);". The string that matches this pattern contains a substring consisting of several uppercase letters, followed by a space, an equal sign, another space, and then a number or string "0x00". The last character of the substring must be a semicolon. With Perl, this pattern can be expressed as "\\w* = (\\d|0x00);". "NM = 0x00;" and "X = 7;" is two strings that can match the pattern. But the string "Z = 123;" Cannot match, because 123 is made up of three numbers.

String matching in DB2 (DB2 certified DB2 Training)

In addition to Extender, DB2 also allows several functions and operators for text comparisons. But those functions and operators have limitations on the functionality used for pattern matching, or they can complicate queries that might use them. Here is a brief excerpt of several available features:

= or predicate: Compares two strings per character to be equal.

Like predicate: Uses the wildcard character's basic pattern match.

LOCATE function: Finds substrings in a string.

Although you can also use the SQL operator to represent patterns "[a-z]* = ([0-9]|0x00);", that can be cumbersome. For example, the predicate used in the WHERE clause of the following SELECT statement matches the part after the equal sign in the string "str", as shown in Listing 1:

Listing 1. Using the like matching pattern

SELECT Str

From strtable

WHERE (str like '% = 0;% ' or str like '% = 1;% ' or str like '% = 2;% ')

Or str like '% = 3;% ' or str like '% = 4;% ' or str like '% = 5;% '

Or str like '% = 7;% ' or str like '% = 7;% ' or str like '% = 8;% '

Or str like '% = 9;% ' OR str like '% = 0x00;% ')

This increases the complexity of predicates that can match the "[a-z]*" sub-pattern, which can be done using a function that iterates through the entire string and makes a character-by-word comparison, but you find that using built-in functionality is lengthy and complex.

Sample Scenario

Let's define the following list (listing 2) and insert a few lines:

Listing 2. Create our sample Table

CREATE TABLE strtable (C1 INTEGER, str VARCHAR (500));

INSERT into Strtable VALUES (1, ' some text; '),

(2, ' variable = 1234; '),

(3, ' var2 = ' ' string variable '; '),

(4, ' xyz = '),

(5, ' myVar = 0x00; '),

(6, ' # comment '),

(7, ' ABC = def ');

This list and its data are used for all of the following examples.

SELECT * from Strtable;

C1 STR

----------- ------------------------------

1 some text;

2 variable = 1234;

3 var2 = ' string variable ';

4 xyz =

5 MyVar = 0x00;

6 # Comment

7 ABC = def

7 record (s) selected.

Implementing Pattern matching functions

You can use the extensible mechanism of DB2 to work with UDFs within SQL statements to significantly improve this situation. By defining a UDF named Regex1 (which takes patterns and strings as input parameters), the WHERE clause in Listing 1 can now be written as shown in Listing 3:

Listing 3. Using the Regex UDF to simplify pattern matching

SELECT Str

From strtable

WHERE regex1 (' \\w* = (\\d|0x00); ', str) = 1

In this example, a regular expression with a Perl extension is used to match the complete pattern, not just the partial pattern of the like predicate given in Listing 1. As you can see, it is much easier to use a function to write predicates for the pattern than to use the like predicate to represent the same semantics.

Implement UDF

In my example implementation, I chose the existing pattern matching engine named PCRE (Perl-compatible regular expression, perl-compatible regular expression). The engine provides a C API for handling patterns and performing matching. The "missing part" between the engine and the SQL language used in the query is a UDF. The UDF consists of two parts:

Creates (or registers) the function's CREATE FUNCTION statement in the database.

The body of the function that implements the wrapper for the C API call for the regular expression matching engine

Listing 4 shows the SQL statement that was used to create the function.

Listing 4. Registering the REGEX1 function

CREATE FUNCTION regex1 (Pattern VARCHAR (2048), String CLOB (10M))

RETURNS INTEGER

Specific regexsimple

EXTERNAL NAME ' Regexudf!regexpsimple '

LANGUAGE C

PARAMETER STYLE Db2sql

Deterministic

Not FENCED

RETURNS null on NULL INPUT

NO SQL

NO EXTERNAL ACTION

Allow PARALLEL;

Note: See DB2 SQL Reference for the detailed meaning of all clauses. You can modify the length of the parameter to suit your needs. I have shown here that some values do not have any intention of recommending their use.

The second part consists of a small section of C code, which implements the UDF entry point. During query execution, DB2 calls this entry point for each row to match the pattern. The example in Listing 5 lists the code. Refer to the documentation for the Pcre library for a description of the pcre_* functions and macros. For the compilation of C code and the construction of shared libraries, refer to DB2 Application Development Guide.

Listing 5. Implementing the C code of the rege1x UDF entry point

#include

#include

void Regexpsimple (

Input parameters

Sqludf_varchar *pattern, Sqludf_clob *str,

Output

Sqludf_integer *match,

NULL indicators

Sqludf_nullind *pattern_ind, Sqludf_nullind *str_ind,

Sqludf_nullind *match_ind,

Sqludf_trail_args)

{

Pcre *re = NULL;

const char *error = NULL;

int erroffset = 0;

int rc = 0;

We assume successful return

*match_ind = 0;

Compile the pattern to its internal representation

Re = pcre_compile (pattern, 0/* Default options */, &error,

&erroffset, NULL);

if (re = = NULL) {

snprintf (SQLUDF_MSGTX, "Regexp compilation failed at"

"Offset%d:%s\\n", Erroffset, error);

strcpy (Sqludf_state, "38900");

(*pcre_free) (re);

Return

}

Match the string againts the pattern

rc = Pcre_exec (Re, NULL, Str->data, str->length, 0,

0/* Default options */, NULL, 0);

Switch (RC) {

Case Pcre_error_nomatch:

*match = 0;

Break

Case Pcre_error_badoption:

snprintf (SQLUDF_MSGTX, "an unrecognized bit is set in the"

"Options argument");

strcpy (Sqludf_state, "38901");

Break

Case Pcre_error_nomemory:

snprintf (SQLUDF_MSGTX, "Not enough memory available.");

strcpy (Sqludf_state, "38902");

Break

Default

if (RC < 0) {

snprintf (SQLUDF_MSGTX, "A regexp match Error"

"Occured:%d", RC);

strcpy (Sqludf_state, "38903");

}

else {

*match = 1;

}

Break

}

Cleanup

(*pcre_free) (re);

Return

}

Usage examples

The following query attempts to find all the strings that contain the comment text from table strtable. Comments begin with "#", so the pattern is "#" followed by non-empty text.

SELECT C1, str

From strtable

WHERE regex1 (' #\\s*\\w+ ', str) = 1;

The result contains only rows with C1 = 6.

C1 STR

----------- -------------------------

6 # comment;

1 record (s) selected.

In the second example, we tried to find the string in the form of an assignment, which is "text = text". To further narrow down the range, we only look for those values that are assigned to the right end. The hexadecimal notation is treated as a valid numeric value.

SELECT C1, str

From strtable

WHERE regex1 (' \\w+\\s*=\\s* (\\d+|0x\\d\\d) ', str) = 1;

Except for two rows with a C1 of 2 or 5, the other row does not contain the value assignment and therefore does not appear in the result:

C1 STR

----------- -------------------------

2 variable = 1234;

5 MyVar = 0x00;

2 record (s) selected.

How to implement regular expressions in DB2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.