轉自:atgc的部落格
http://atgc.itpub.net/category/22412/38862
寫一函數,準確地判斷欄位是否含有漢字或者提取漢字等
從表裡提取漢字, 需要考慮字元集, 不同的字元集漢字的編碼有所不同
這裡以GB2312為例, 寫一函數準確地從表裡提取簡體漢字.
假設資料庫字元集編碼是GB2312, 環境變數(註冊表或其它)的字元集也是GB2312編碼
並且儲存到表裡的漢字也都是GB2312編碼的
那麼也就是漢字是雙位元組的,且簡體漢字的編碼範圍是
B0A1 - F7FE
換算成10進位就是
B0 A1 F7 FE
176,161 - 247,254
我們先看一下asciistr函數的定義
Non-ASCII characters are converted to the form xxxx, where xxxx represents a UTF-16 code unit.
但是這並不表示以 "" 開始的字元就是漢字了
舉例如下
SQL> select * from test;<br />NAME<br />--------------------<br />,啊OO10哈<br />你好aa<br />大家好aa/<br />☆大海123<br />★ABC
create or replace function get_chinese(p_name in varchar2,<br /> p_chinese in varchar2)<br /> return varchar2 as<br /> v_code varchar2(30000) := '';<br /> v_chinese varchar2(4000) := '';<br /> v_non_chinese varchar2(4000) := '';<br /> v_comma pls_integer;<br /> v_code_q pls_integer;<br /> v_code_w pls_integer;<br />begin<br /> if p_name is not null then<br /> select replace(substrb(dump(p_name, 1010),<br /> instrb(dump(p_name, 1010), 'ZHS16GBK:')),<br /> 'ZHS16GBK: ',<br /> '')<br /> into v_code<br /> from dual<br /> where rownum = 1;<br /> for i in 1 .. length(p_name) loop<br /> if lengthb(substr(p_name, i, 1)) = 2 then<br /> v_comma := instrb(v_code, ',');<br /> v_code_q := to_number(substrb(v_code, 1, v_comma - 1));<br /> v_code_w := to_number(substrb(v_code,<br /> v_comma + 1,<br /> abs(instrb(v_code, ',', 1, 2) -<br /> v_comma - 1)));<br /> if v_code_q >= 176 and v_code_q <= 247 and v_code_w >= 161 and<br /> v_code_w <= 254 then<br /> v_chinese := v_chinese || substr(p_name, i, 1);<br /> else<br /> v_non_chinese := v_non_chinese || substr(p_name, i, 1);<br /> end if;<br /> v_code := ltrim(v_code, '1234567890');<br /> v_code := ltrim(v_code, ',');<br /> else<br /> v_non_chinese := v_non_chinese || substr(p_name, i, 1);<br /> end if;<br /> v_code := ltrim(v_code, '1234567890');<br /> v_code := ltrim(v_code, ',');<br /> end loop;<br /> if p_chinese = '1' then<br /> return v_chinese;<br /> else<br /> return v_non_chinese;<br /> end if;<br /> else<br /> return '';<br /> end if;<br />end;
這裡第5條記錄有一個實心的五角星
然後用asciistr函數轉換一下試試
SQL> select name,asciistr(name) from test;<br />NAME ASCIISTR(NAME)<br />-------------------- ----------------------<br />,啊OO10哈 ,554AOO1054C8<br />你好aa 4F60597Daa<br />大家好aa/ 59275BB6597Daa/<br />☆大海123 260659276D77123<br />★ABC 2605ABC
我們看到最後一條記錄的實心五角星也是 ""開頭的
此時我們就不能用asciistr(欄位)是否存在 "" 來判斷是否含有漢字了.
我的函數如下,基本思路是判斷字元的編碼是否在GB2312規定的漢字編碼範圍之內
create or replace function get_chinese(p_name in varchar2) return varchar2 as<br /> v_code varchar2(30000) := '';<br /> v_chinese varchar2(4000) := '';<br /> v_comma pls_integer;<br /> v_code_q pls_integer;<br /> v_code_w pls_integer;<br />begin<br /> if p_name is not null then<br /> select replace(substrb(dump(p_name, 1010),<br /> instrb(dump(p_name, 1010), 'ZHS16GBK:')),<br /> 'ZHS16GBK: ',<br /> '')<br /> into v_code<br /> from dual<br /> where rownum = 1;<br /> for i in 1 .. length(p_name) loop<br /> if lengthb(substr(p_name, i, 1)) = 2 then<br /> v_comma := instrb(v_code, ',');<br /> v_code_q := to_number(substrb(v_code, 1, v_comma - 1));<br /> v_code_w := to_number(substrb(v_code,<br /> v_comma + 1,<br /> abs(instrb(v_code, ',', 1, 2) -<br /> v_comma - 1)));<br /> if v_code_q >= 176 and v_code_q <= 247 and v_code_w >= 161 and<br /> v_code_w <= 254 then<br /> v_chinese := v_chinese || substr(p_name, i, 1);<br /> end if;<br /> v_code := ltrim(v_code, '1234567890');<br /> v_code := ltrim(v_code, ',');<br /> end if;<br /> v_code := ltrim(v_code, '1234567890');<br /> v_code := ltrim(v_code, ',');<br /> end loop;<br /> return v_chinese;<br /> else<br /> return '';<br /> end if;<br />end;
好,現在來執行一些語句
SQL> select * from test;<br />NAME<br />--------------------<br />,啊OO10哈<br />你好aa<br />大家好aa/<br />☆大海123<br />★ABC
5 rows selected.
1. 列出有漢字的記錄
SQL> select name from test where length(get_chinese(name))>0;<br />NAME<br />--------------------<br />,啊OO10哈<br />你好aa<br />大家好aa/<br />☆大海123<br />4 rows selected.
2. 列出有漢字的記錄,並且只列出漢字
SQL> select get_chinese(name) from test where length(get_chinese(name))>0;<br />GET_CHINESE(NAME)<br />---------------------------------------------------------------------------<br />啊哈<br />你好<br />大家好<br />大海<br />4 rows selected.
需要說明的是GB2312共有6763個漢字,即72*94-5=6763
我這裡是計算72*94,沒有減去那5個,那五個是空的。等查到了再減去
============
改寫這個函數,可以提取非漢字或者漢字
該函數有兩個參數,第一個表示要提取的字串,第二個是1,表示提取漢字,是非1,表示提取非漢字
create or replace function get_chinese<br />(<br />p_name in varchar2,<br />p_chinese in varchar2<br />) return varchar2<br />as<br />v_code varchar2(30000) := '';<br />v_chinese varchar2(4000) := '';<br />v_non_chinese varchar2(4000) := '';<br />v_comma pls_integer;<br />v_code_q pls_integer;<br />v_code_w pls_integer;<br />begin<br />if p_name is not null then<br />select replace(substrb(dump(p_name,1010),instrb(dump(p_name,1010),'ZHS16GBK:')),'ZHS16GBK: ','') into v_code from dual where rownum=1;<br />for i in 1..length(p_name) loop<br />if lengthb(substr(p_name,i,1))=2 then<br />v_comma := instrb(v_code,',');<br />v_code_q := to_number(substrb(v_code,1,v_comma-1));<br />v_code_w := to_number(substrb(v_code,v_comma+1,abs(instrb(v_code,',',1,2)-v_comma-1)));<br />if v_code_q>=176 and v_code_q<=247 and v_code_w>=161 and v_code_w<=254 then<br />v_chinese := v_chinese||substr(p_name,i,1);<br />else<br />v_non_chinese := v_non_chinese||substr(p_name,i,1);<br />end if;<br />v_code := ltrim(v_code,'1234567890');<br />v_code := ltrim(v_code,',');<br />else<br />v_non_chinese := v_non_chinese||substr(p_name,i,1);<br />end if;<br />v_code := ltrim(v_code,'1234567890');<br />v_code := ltrim(v_code,',');<br />end loop;<br />if p_chinese = '1' then<br />return v_chinese;<br />else<br />return v_non_chinese;<br />end if;<br />else<br />return '';<br />end if;<br />end;<br />/
.--------------------------------------------------------------------------------
SQL> select * from a;<br />NAME<br />--------------------<br />我們啊、<br />他(艾呀)是★們<br />他的啊@<br />SQL> select get_chinese(name,1) from a;<br />GET_CHINESE(NAME,1)<br />-----------------------------------------<br />我們啊<br />他艾呀是們<br />他的啊<br />SQL> select get_chinese(name,0) from a;<br />GET_CHINESE(NAME,0)<br />-----------------------------------------<br />、<br />()★<br />@<br />SQL>