SQL Server Character Handling problems similar to regular expressions,

Source: Internet
Author: User

SQL Server Character Handling problems similar to regular expressions,

SQL Serve provides simple character fuzzy match functions, such as like and patindex. However, it is not enough for some character processing scenarios. The following problems are encountered:

1. How many times does one character/string appear?

2. Position of the nth occurrence of the same character

3. Multiple consecutive identical characters are merged into one character

4. Whether it is a valid IP address, ID card number, mobile phone number, etc.

1. How many times does one character/string appear?

Replace the same character with an empty string to calculate

declare @text varchar(1000)declare @str varchar(10)set @text = 'ABCBDBE'set @str = 'B'select len(@text) - len(replace(@text,@str,''))

The same string is still replaced. Because it contains multiple characters, Division is required after method 1 is replaced. If method 2 is replaced with one character, no replacement is required.

-- Method 1 declare @ text varchar (1000) declare @ str varchar (10) set @ text = 'abbbcbbbdbbbe' set @ str = 'bbb' select (len (@ text) -len (replace (@ text, @ str, '')/len (@ str) -- method 2 declare @ text varchar (1000) declare @ str varchar (10) set @ text = 'abbbcbbbdbbbe' set @ str = 'bbb' select len (replace (@ text, @ str, @ str + '_')-len (@ text)

2. Position of the nth occurrence of the same character/string

The character location function of SQL SERVER is CHARINDEX:

CHARINDEX ( expressionToFind , expressionToSearch [ , start_location ] )

You can start searching from the specified location, but cannot obtain the nth occurrence location. You need to write an SQL statement to supplement it. There are several ideas as follows:

1. User-Defined Function, adding a count for charindex each time in the loop until it is N

if object_id('NthChar','FN') is not null  drop function NthcharGOcreate function NthChar(@source_string as nvarchar(4000), @sub_string  as nvarchar(1024),@nth      as int) returns int as begin   declare @postion int   declare @count  int   set @postion = CHARINDEX(@sub_string, @source_string)   set @count = 0   while @postion > 0   begin     set @count = @count + 1     if @count = @nth     begin       break     end    set @postion = CHARINDEX(@sub_string, @source_string, @postion + 1)   End   return @postion end GO--select dbo.NthChar('abcabc','abc',2)--4

2. Use CTE to operate the entire table field. Add a count for charindex each time in recursion until it is N

If object_id ('tempdb .. # t') is not null drop table # Tcreate table # T (source_string nvarchar (4000) insert into # T values (n' ') insert into # T values (n' ') declare @ sub_string nvarchar (1024) declare @ nth intset @ sub_string = n' our 'set @ nth = 2; with T (source_string, starts, pos, nth) as (select source_string, 1, charindex (@ sub_string, source_string), 1 from # t union all select source_string, pos + 1, charindex (@ sub_string, source_string, pos + 1), nth + 1 from T where pos> 0) select source_string, pos, nthfrom Twhere pos <> 0 and nth = @ nthorder by source_string, starts -- source_string pos nth -- We 3 2

3. Use a digital table to create a charindex at different starting points.

-- Numbers/tally tableIF EXISTS (select * from dbo. sysobjects where id = object_id (n' [dbo]. [Numbers] ') and OBJECTPROPERTY (id, N 'isusertable') = 1) drop table dbo. numbers -- ==== Create and populate the Tally table on the fly select top 1000000 IDENTITY (int,) AS number INTO dbo. numbers FROM master. dbo. syscolumns sc1, master. dbo. syscolumns sc2 -- ==== Add a Primary Key to maximize performance alter table dbo. numbers add constraint PK_numbers_number primary key clustered (number) -- ==== Allow the general public to use it grant select on dbo. numbers to public -- create the number table once. You do not need TO create DECLARE @ source_string nvarchar (4000), @ sub_string nvarchar (1024 ), @ nth intSET @ source_string = 'abcabcvvvvabc' SET @ sub_string = 'abc' SET @ nth = 2; with t as (SELECT ROW_NUMBER () OVER (order by number) AS nth, number AS [Position In String] FROM dbo. numbers n WHERE n. number <= LEN (@ source_string) and charindex (@ sub_string, @ source_string, n. number)-number = 0 ---- OR -- and substring (@ source_string, number, LEN (@ sub_string) = @ sub_string) SELECT * from t where nth = @ nth

4. Use cross apply with charindex. This method is applicable when N is small. because the number of cross apply operations increases with N, the statement must be modified accordingly.

declare @T table(source_string nvarchar(4000))insert into @T values('abcabc'),('abcabcvvvvabc')declare @sub_string nvarchar(1024)set @sub_string = 'abc'select source_string,    p1.pos as no1,    p2.pos as no2,    p3.pos as no3from @Tcross apply (select (charindex(@sub_string, source_string))) as P1(Pos)cross apply (select (charindex(@sub_string, source_string, P1.Pos+1))) as P2(Pos)cross apply (select (charindex(@sub_string, source_string, P2.Pos+1))) as P3(Pos)

5. There are built-in functions in SSIS, but the T-SQL does not

--FINDSTRING in SQL Server 2005 SSISFINDSTRING([yourColumn], "|", 2),--TOKEN in SQL Server 2012 SSISTOKEN(Col1,"|",3)

Note: It is not difficult to find that these methods are similar to the logic of string splitting, but they are positioning and intercepting. If you want to obtain one or more characters of the nth character, with the N position, you can use the substring to intercept it;

3. Multiple consecutive identical characters are merged into one character

The most common solution is to combine multiple consecutive spaces into one space. There are two solutions:

1. It is easy to think of using multiple replace

However, the number of replace requests is unknown, so you have to repeat it multiple times.

-- Replace two consecutive spaces with one, and then loop until charindex does not check two consecutive spaces declare @ str varchar (100) set @ str = 'abc abc kljlk kljkl 'while (charindex ('', @ str)> 0) begin select @ str = replace (@ str ,'','') endselect @ str

2. Split the string by Space

After each split string trim or replace is connected with a space, it is a bit cumbersome. If you do not write a code example, you can refer to "position where the nth occurrence occurs" for how to split the string ";

4. Whether it is a valid IP address, ID card number, mobile phone number, etc.

Strings like IP, ID card, mobile phone number, and so on often have their own specific rules. It can be determined by using the substring to perform bitwise OR segment-by-segment operations, but the SQL statement method often has poor performance, we recommend that you try the regular expression function. For more information, see.

5. Regular Expression Functions

1. Oracle

Starting from 10 Gb, you can use regular expressions in queries. It is implemented by some functions that support regular expressions:

Oracle 10 gregexp_likeregexp_replaceregexp_regexp_substroracle 11g (new) REGEXP_COUNT

Oracle uses the REGEXP function to handle the above issues:

(1) the number of occurrences of the same character/string

select length(regexp_replace('123-345-566', '[^-]', '')) from dual;select REGEXP_COUNT('123-345-566', '-') from dual; --Oracle 11g 

(2) Position of the nth occurrence of the same character/string

ORACLE instr can directly find the location without regular expressions:

instr('source_string','sub_string' [,n][,m])

N indicates that the search starts from the nth character. The default value is 1, and m indicates the appearance of the nth character. The default value is 1.

select instr('abcdefghijkabc','abc', 1, 2) position from dual; 

(3) multiple consecutive identical characters are merged into one character

select regexp_replace(trim('agc f  f '),'\s+',' ') from dual; 

(4) whether it is a valid IP address, ID card number, mobile phone number, etc.

-- Whether it is valid ipwith ipas (SELECT '10. limit 30.40 'ip_address FROM dual union allselect '. b. c. d 'ip_address FROM dual union allselect '256. 123.0.254 'ip_address FROM dual union allselect '255. limit limit 255 'ip_address FROM dual) SELECT * from ipwhere REGEXP_LIKE (ip_address, '^ ([0-9] {1} | [0-9] [0-9] | 1 [0-9] [0-9] | 2 [0-9- 4] [0-9] | 25 [0-5]) \.) {3} ([0-9] {1} | [0-9] [0-9] | 1 [0-9] [0-9] | 2 [0-9- 4] [0-9] | 25 [0-5]) $ '); -- whether it is a valid ID card or mobile phone number.

2. SQL Server

Currently, the latest version of SQL Server 2017 does not support the REGEXP function. You need to use the common CLR extension. The following shows how to implement REG_REPLACE for CLR:

-- 1. Enable clr exec sp_configure 'show advanced options', '1' GORECONFIGUREGOEXEC sp_configure 'clr enabled', '1' GORECONFIGUREGOEXEC sp_configure 'show advanced options', '0'; GO

2. Create an Assembly

-- 3. create function [dbo]. [regex_replace] (@ input [nvarchar] (4000), @ pattern [nvarchar] (4000), @ replacement [nvarchar] (4000) RETURNS [nvarchar] (4000) with execute as caller, returns null on null inputas external name [RegexUtility]. [RegexUtility]. [RegexReplaceDefault] GO--4. use regex_replace to replace multiple spaces with one space. select dbo. regex_replace ('agc f F', '\ s + ','');

Note: Through CLR to achieve more REGEXP functions, if there is a high-level language development capabilities, you can develop themselves; or directly use some open source contributions, such as: http://devnambi.com/2016/sql-server-regex/

Summary:

1. Non-regular SQL statements are applicable to different databases;

2. Regular Expression rules (pattern) in different development languages, there are many syntaxes are the same, usually follow the rules of tools such as perl or linux shell sed;

3. In terms of performance, General SQL judgment> REGEXP function> Custom SQL function.

Summary

The above section describes the Character Handling problem of SqlServer similar to regular expressions. I hope it will be helpful to you. If you have any questions, please leave a message and I will reply to you in a timely manner. Thank you very much for your support for the help House website!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.