What happened to the Oracle conversion character set operation?

Last Update:2018-09-01 Source: Internet

Author: User

Tags printable characters

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The database current character set is Al32utf8, if you intend to change the character set to ZHS16GBK, execute the following command:

"ALTER DATABASE national CHARACTER SET internal_use ZHS16GBK"

Can you achieve the desired goal?

Let's take a look at what happens at the database level by executing the above command.

In a database with a character set of Al32utf8, create a table that inserts some Chinese characters and some printable characters into two fields, respectively.

1. Create a table

CREATE TABLE TEST

(

ID number,--record No.

Name VARCHAR2 (,--) Inserts a randomly generated kanji string. The method of generation is shown in annex I.

Key VARCHAR2 (60) – Insert a randomly generated printable string

)

2. Insert 50,000 records

Declare

Leng number;

Begin

for x in 1: 5

Loop

for y in 1: 10000

Loop

leng:=ceil(Dbms_random.value (0,ten));

INSERT INTO test Values (Y,gen_hanzi (Leng), dbms_random.string (' P ', Leng));

End loop;

commit;

End loop;

End;

3. Randomly view some record values, and the encoding under that character set

Query the UTF8 code of Chinese character "Lei",

It is known that the database is currently encoding Chinese characters using UTF8 .

4. Database execution Conversion Character set command

After conversion, query the character set of the current database:

At the data dictionary level, the character set of the database has been Al32utf8 converted to ZHS16GBK.

The data dictionary level has been adjusted, so does the character encoding be adjusted according to the new character set?

As you can see, all dump values have not changed-that is, at the disk storage level, the information has not changed. Because the store is still using UTF8 encoding, but decoding on the use of GBK decoding, so, the Chinese characters are all decoded error (that is, not get the correct Chinese character symbol). For printable characters, the decoding error does not occur because the characters have the same encoded values in two different character sets.

From the above, if the target character set is not a superset of the original character set, because the original encoding has not changed, there is a high likelihood of decoding errors resulting in garbled behavior. On top of that, we see the encoded values of the relevant Chinese characters under the UTF8 character set, so how are these characters encoded under the GBK character set?

Insert the records for this table into a database that uses the ZHS16GBK character set to view the record values and the encoded values:

As you can see, the GBK character set is 2 bytes encoded for Chinese characters, while UTF8 is 3 bytes encoded for Chinese characters.

We have another problem--after modifying the character set, there is no change to the character encoding stored before the change, what about the characters stored later?

6. Inserting Chinese characters and printable characters into the database after modifying the character set

Insert 100 records to distinguish it from previous records, id>10000.

Declare

Leng number;

Begin

for y in 10000. 10100

Loop

leng:=ceil(Dbms_random.value (0,ten));

INSERT INTO test Values (Y,gen_hanzi (Leng), dbms_random.string (' P ', Leng));

End loop;

commit;

End;

View the inserted value, along with its encoding:

From the 7th record of the encoding length, you can initially determine its use of GBK encoded values. Let's check it out:

Conclusion:

1. The command that performs character set conversion modifies the data dictionary

2. The command to perform character set conversion is not re-encoded with the new character set for previously stored characters. Because if the target character set is not a superset of the original character set, there may be garbled characters (especially kanji) after the conversion.

3. After the command for character set conversion is executed, the characters entered later are encoded with the new character set.

Attachments--"stored procedures for generating random-length kanji strings"

Generate stored procedures for random-length kanji strings:

Create or replace function Gen_hanzi (max_length number) return VARCHAR2 as

Leng number;

Hanzi varchar2(1 char);

Hanzis varchar2(4000): =";

Begin

Leng:=ceil (Dbms_random.value (0, max_length));

for x in 1: Leng

Loop hanzi:=unistr (' \ '| | Trim(To_char (Ceil (Dbms_random.value (19968,40869)),' XXXX ' ));