SQL Server截取字串和處理中文技巧

來源:互聯網
上載者:User

一 環境介紹

SQL  ServerPRINT @@VERSIONMicrosoftSQLServer2012-11.0.2100.60(X64)Feb10201219:39:15Copyright(c)MicrosoftCorporationEnterpriseEdition:Core-basedLicensing(64-bit)onWindowsNT6.1(Build7601:ServicePack1)作業系統------------------System Information------------------Operating System: Windows 7 Ultimate 64-bit (6.1, Build 7601) Service Pack 1 (7601.win7sp1_gdr.130828-1532)System Model: Aspire E1-471GProcessor: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz (4 CPUs), ~2.6GHzMemory: 4096MB RAM


二 實現功能從一大堆有包含中文字元和編號的字串中過濾出編號。


三 實現類比
首先,我們準備測試資料,注意,這裡的資料全部都是類比資料,無實際含義。語句如下:
CREATE TABLE #temp(   name VARCHAR(80));INSERT INTO #tempVALUES     ('五道口店3059');INSERT INTO #tempVALUES     ('五羊邨店3060');INSERT INTO #tempVALUES     ('楊家屯店3061');INSERT INTO #tempVALUES     ('十裡堤店3062');INSERT INTO #tempVALUES     ('中關村店3063');INSERT INTO #tempVALUES     ('麗秀店3064');INSERT INTO #tempVALUES     ('石門店3065');INSERT INTO #tempVALUES     ('黃村店3066');INSERT INTO #tempVALUES     ('東圃店3067');INSERT INTO #tempVALUES     ('天河店3068');INSERT INTO #tempVALUES     ('人民路廣場3069');INSERT INTO #tempVALUES     ('社區中心3070');INSERT INTO #tempVALUES     ('珠海市3071');INSERT INTO #tempVALUES     ('麗都3072');INSERT INTO #tempVALUES     ('曉月3073');INSERT INTO #tempVALUES     ('舊區3074');INSERT INTO #tempVALUES     ('新城3075');INSERT INTO #tempVALUES     ('水井溝3076'); 


然後,我們觀察資料,發現這些資料都有規律,編號是數字,佔4個字元。數字前麵包含店、場、心、市、都、月、區、城、溝共9個字元。
我們試著採用SQL Server內建的函數Substring、Charindex、Rtrim、Ltrim過濾掉出現次數最多(店)的字串。
語句如下:
SELECT Rtrim(Ltrim(Substring(name, Charindex('店', name) + 1, Len(name)))) AS nameINTO   #t1FROM   #temp 

以下是這幾個函數的使用說明:


Substring

Returns the part of a character expression that starts at the specified position and has the specified length. The position parameter and the length parameter must evaluate to integers.


Syntax

SUBSTRING(character_expression, position, length)


Arguments

character_expression

Is a character expression from which to extract characters.

position

Is an integer that specifies where the substring begins.

length

Is an integer that specifies the length of the substring as number of characters.


Result Types

DT_WSTR


Charindex
Searches an expression for another expression and returns its starting position if found.

Syntax

CHARINDEX ( expressionToFind ,expressionToSearch [ , start_location ] ) 


Arguments
expressionToFind
Is a character expression that contains the sequence to be found. expressionToFind is limited to 8000 characters.
expressionToSearch
Is a character expression to be searched.
start_location
Is an integer or bigint expression at which the search starts. If start_location is not specified, is a negative number, or is 0, the search starts at the beginning of expressionToSearch.

Return Types
bigint if expressionToSearch is of the varchar(max), nvarchar(max), or varbinary(max) data types; otherwise, int.

Rtrim
Returns a character expression after removing trailing spaces.

RTRIM does not remove white space characters such as the tab or line feed characters. Unicode provides code points for many different types of spaces, but this function recognizes only the Unicode code point 0x0020. When double-byte character set (DBCS) strings are converted to Unicode they may include space characters other than 0x0020 and the function cannot remove such spaces. To remove all kinds of spaces, you can use the Microsoft Visual Basic .NET RTrim method in a script run from the Script component.

Syntax
RTRIM(character expression)
              
Arguments
character_expression
Is a character expression from which to remove spaces.

Result Types
DT_WSTR

Ltrim
Returns a character expression after removing leading spaces.

LTRIM does not remove white-space characters such as the tab or line feed characters. Unicode provides code points for many different types of spaces, but this function recognizes only the Unicode code point 0x0020. When double-byte character set (DBCS) strings are converted to Unicode they may include space characters other than 0x0020 and the function cannot remove such spaces. To remove all kinds of spaces, you can use the Microsoft Visual Basic .NET LTrim method in a script run from the Script component.

Syntax
LTRIM(character expression)
              
Arguments
character_expression
Is a character expression from which to remove spaces.

Result Types
DT_WSTR

好了,我們查看處理完後的結果,可以看到包含店的字串已經全部過濾出編號。
SELECT * FROM #t13059306030613062306330643065306630673068人民路廣場3069社區中心3070珠海市3071麗都3072曉月3073舊區3074新城3075水井溝3076


接著我們依次處理包含場、心、市、都、月、區、城、溝的字串,語句和處理結果如下:
SELECT *FROM   #t1WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 人民路廣場3069社區中心3070珠海市3071麗都3072曉月3073舊區3074新城3075水井溝3076SELECT Rtrim(Ltrim(Substring(name, Charindex('場', name) + 1, Len(name)))) AS nameINTO   #t2FROM   #t1SELECT *FROM   #t2WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 社區中心3070珠海市3071麗都3072曉月3073舊區3074新城3075水井溝3076SELECT Rtrim(Ltrim(Substring(name, Charindex('心', name) + 1, Len(name)))) AS nameINTO   #t3FROM   #t2SELECT *FROM   #t3WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 珠海市3071麗都3072曉月3073舊區3074新城3075水井溝3076SELECT Rtrim(Ltrim(Substring(name, Charindex('市', name) + 1, Len(name)))) AS nameINTO   #t4FROM   #t3SELECT *FROM   #t4WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 麗都3072曉月3073舊區3074新城3075水井溝3076SELECT Rtrim(Ltrim(Substring(name, Charindex('都', name) + 1, Len(name)))) AS nameINTO   #t5FROM   #t4SELECT *FROM   #t5WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 曉月3073舊區3074新城3075水井溝3076SELECT Rtrim(Ltrim(Substring(name, Charindex('月', name) + 1, Len(name)))) AS nameINTO   #t6FROM   #t5SELECT *FROM   #t6WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 舊區3074新城3075水井溝3076SELECT Rtrim(Ltrim(Substring(name, Charindex('區', name) + 1, Len(name)))) AS nameINTO   #t7FROM   #t6SELECT *FROM   #t7WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 新城3075水井溝3076SELECT Rtrim(Ltrim(Substring(name, Charindex('城', name) + 1, Len(name)))) AS nameINTO   #t8FROM   #t7SELECT *FROM   #t8WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 水井溝3076SELECT Rtrim(Ltrim(Substring(name, Charindex('溝', name) + 1, Len(name)))) AS nameINTO   #t9FROM   #t8SELECT *FROM   #t9WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN --無記錄


這是最終的處理結果,過濾出編號後,我就可以利用這些編號和資料庫表進行關聯,獲得想要的資料。
SELECT *INTO   #resultFROM   #t9SELECT *FROM   #result name305930603061306230633064306530663067306830693070307130723073307430753076SELECT s.xxx,       s.xxxFROM   xx s       JOIN #result r         ON s.xxx = r.nameWHERE  s.xxx = 0; 


四 總結本文過濾編號實際上核心代碼就兩個,第一個是利用SQL Server的內建函數過濾出指定編號,語句如下:
SELECT Rtrim(Ltrim(Substring(name, Charindex('店', name) + 1, Len(name)))) AS nameINTO   #t1FROM   #temp 


第二個是判斷是否包含中文,語句如下:
SELECT *FROM   #t1WHERE  name LIKE N'%[一-龥]%' COLLATE Chinese_PRC_BIN 


在工作中,發現和總結這些小技巧會讓你的工作事半功倍。

Good Luck!


聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.