Reprinted from: http://blog.csdn.net/atgc/article/details/2036799
Thanks to atgc, this problem has been studied for a day and finally solved !!
When extracting Chinese characters from the table, you need to consider character sets. Different character sets have different Chinese characters encoded.
Taking gb2312 as an example, write a function to accurately extract simplified Chinese characters from the table.
Assume that the database character set encoding is gb2312, and the environment variable (registry or other) Character Set is also gb2312 encoding.
The Chinese characters saved to the table are also gb2312 encoded.
That is, the Chinese character is dubyte, And the encoding range of simplified Chinese characters is
B0a1-f7fe
Convert to the 10th hexadecimal format
B0 A1 F7 fe
176,161-247,254
Let's take a look at the definition of the asciistr function.
Non-ASCII characters are converted to the form/xxxx, where XXXX represents a UTF-16 code unit.
However, this does not indicate that the character starting with "/" is a Chinese character.
Example:
SQL> select * from test;
Name
--------------------
Ah oo10 ha
Hello AA
Hello everyone, AA/
☆Sea 123
★ABC
The first record contains a solid pentagram.
Use the asciistr function to convert the data.
SQL> select name, asciistr (name) from test;
Name asciistr (name)
------------------------------------------
Ah oo10,/554aoo10/54c8
Hello AA/4f60/597daa
Hello everyone, AA // 5927/5 bb6/597daa/
☆Sea 123/2606/5927/6 d77123
★ABC/2605abc
We can see that the solid pentagram in the last record starts "/".
In this case, we cannot use asciistr (field) to determine whether there is.
My function is as follows. The basic idea is to determine whether the character encoding is within the Chinese character encoding range specified by gb2312.
create or replace function get_chinese(p_name in varchar2) return varchar2as v_code varchar2(30000) := ''; v_chinese varchar2(4000) := ''; v_comma pls_integer; v_code_q pls_integer; v_code_w pls_integer;begin if p_name is not null then select replace(substrb(dump(p_name,1010),instrb(dump(p_name,1010),'ZHS16GBK:')),'ZHS16GBK: ','') into v_code from dual where rownum=1; for i in 1..length(p_name) loop if lengthb(substr(p_name,i,1))=2 then v_comma := instrb(v_code,','); v_code_q := to_number(substrb(v_code,1,v_comma-1)); v_code_w := to_number(substrb(v_code,v_comma+1,abs(instrb(v_code,',',1,2)-v_comma-1))); if v_code_q>=176 and v_code_q<=247 and v_code_w>=161 and v_code_w<=254 then v_chinese := v_chinese||substr(p_name,i,1); end if; v_code := ltrim(v_code,'1234567890'); v_code := ltrim(v_code,','); end if; v_code := ltrim(v_code,'1234567890'); v_code := ltrim(v_code,','); end loop; return v_chinese; else return ''; end if;end;
Okay. Run some statements now.
SQL> select * from test;
Name
--------------------
Ah oo10 ha
Hello AA
Hello everyone, AA/
☆Sea 123
★ABC
5 rows selected.
1. List records with Chinese characters
SQL> select name from test where length (get_chinese (name)> 0;
Name
--------------------
Ah oo10 ha
Hello AA
Hello everyone, AA/
☆Sea 123
4 rows selected.
2. List records with Chinese characters and only list Chinese Characters
SQL> select get_chinese (name) from test where length (get_chinese (name)> 0;
Get_chinese (name)
---------------------------------------------------------------------------
Aha
Hi!
Hello everyone
Sea
4 rows selected.
It must be noted that gb2312 has a total of 6763 Chinese characters, I .e. 72*94-5 = 6763
Here I calculate 72*94, without subtracting the five, and the five are empty. And then subtract
================
Rewrite this function to extract non-Chinese or Chinese Characters
This function has two parameters. The first parameter indicates the string to be extracted, the second parameter is 1, indicating that the Chinese character is extracted, and the non-1 parameter indicates that the non-Chinese character is extracted.
create or replace function get_chinese( p_name in varchar2, p_chinese in varchar2) return varchar2as v_code varchar2(30000) := ''; v_chinese varchar2(4000) := ''; v_non_chinese varchar2(4000) := ''; v_comma pls_integer; v_code_q pls_integer; v_code_w pls_integer;begin if p_name is not null then select replace(substrb(dump(p_name,1010),instrb(dump(p_name,1010),'ZHS16GBK:')),'ZHS16GBK: ','') into v_code from dual where rownum=1; for i in 1..length(p_name) loop if lengthb(substr(p_name,i,1))=2 then v_comma := instrb(v_code,','); v_code_q := to_number(substrb(v_code,1,v_comma-1)); v_code_w := to_number(substrb(v_code,v_comma+1,abs(instrb(v_code,',',1,2)-v_comma-1))); if v_code_q>=176 and v_code_q<=247 and v_code_w>=161 and v_code_w<=254 then v_chinese := v_chinese||substr(p_name,i,1); else v_non_chinese := v_non_chinese||substr(p_name,i,1); end if; v_code := ltrim(v_code,'1234567890'); v_code := ltrim(v_code,','); else v_non_chinese := v_non_chinese||substr(p_name,i,1); end if; v_code := ltrim(v_code,'1234567890'); v_code := ltrim(v_code,','); end loop; if p_chinese = '1' then return v_chinese; else return v_non_chinese; end if; else return ''; end if;end;
SQL> select * From;
Name
--------------------
We,
He (AI) is★Are
His/ah @
SQL> select get_chinese (name, 1) from;
Get_chinese (name, 1)
-----------------------------------------
We
He Aiah, you are.
His.
SQL> select get_chinese (name, 0) from;
Get_chinese (name, 0)
-----------------------------------------
,
()★
/@
SQL>
But later, I found that this function could not scan very clearly, as shown below: (but common words can be scanned out)
If you need to filter these distinctive words, you can:
The to_single_byte function can be used to forcibly convert the full-width to a half-width before computation. If the characters are in Chinese, the system will recognize that the dual-byte cannot be distinguished:
Select * From zk. cm_customer where lengthb (to_single_byte (cust_name)-lengthc (to_single_byte (cust_name)> = 2;