Oracle Character Set garbled nature Verification

Source: Internet
Author: User

I have been wondering why the character set of the database is consistent with the character set of the client, but when the database is inserted into the table, it becomes garbled. Today I saw a predecessor in the group explaining this problem, therefore, an experiment was followed to verify the results and found the mysteries:

1) if the character set of the database is UTF8, Oracle will not directly insert any conversions into the data.
2) If the character set of the database is ZHS16GBK, Oracle will extract the string sent from the client by UTF8 based on the internal MAP and convert it to ZHS16GBK.
3) If you specify that NLS_LANG is utf8, but the entered code is zhs16gbk, Oracle will not perform any conversion and directly store the character encoding of ZHS16GBK into the database. -- This is garbage-in -- garbage-out

4) if the database character is AL32UTF8 and you specify NLS_LANG as ZHS16GBK, but you actually enter the UTF8 character, then, oracle will convert the UTF8 character you entered into the database as the ZHS16GBK character to UTF8. in this case, garbled characters may occur.

5) The previous client character set must be consistent with the server character set or the superset will not be garbled. This conclusion is one-sided. In this experiment, the relationship between GBK and utf8 is not a superset, however, it is displayed as normal after being saved.


Conclusion:

1 .) database character set (set during creation, but do not update props $ later) 2 .) client Character Set NLS_LANG (the environment variable echo $ NLS_LANG you set on the database machine) 3 .) the personal tool is connected to the server, and the character set by the tool (such as putty/securecrt and various SSH clients) ensures that the client Character Set NLS_LANG is consistent with the character set displayed by the personal tool, and this character set can be converted to the database character set.

[Oracle @ hxy ~] $ Sqlplus/as sysdba

SQL * Plus: Release 10.2.0.1.0-Production on Wed Mar 26 10:53:59 2014

Copyright (c) 1982,200 5, Oracle. All rights reserved.


Connected:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0-64bit Production
With the Partitioning, OLAP and Data Mining options

SQL> col parameter for a30
SQL> col value for a30

The statement for searching the data character set is as follows:
SQL> select * from nls_database_parameters;

PARAMETER VALUE
------------------------------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET ZHS16GBK
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH. MI. SSXFF AM

PARAMETER VALUE
------------------------------------------------------------
NLS_TIMESTAMP_FORMAT DD-MON-RR HH. MI. SSXFF AM
NLS_TIME_TZ_FORMAT HH. MI. SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH. MI. SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 10.2.0.1.0

20 rows selected.

SQL> exit


Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0-64 bit Production
With the Partitioning, OLAP and Data Mining options

Tutorial 1: database character set, client character set, and personal tool character set are consistent

1) set NLS_LANG to ZHS16GBK

[Oracle @ hxy ~] $ Export NLS_LANG = AMERICAN_AMERICA.ZHS16GBK

2) set the code of the personal tool to ZHS16GBK.

3) connect to the database and insert data
[Oracle @ hxy ~] $ Sqlplus/as sysdba

SQL> insert into t2 values ('zhs16gbk', 'zhs16gbk', 'China ');

SQL> select * from t2;

NLS_LANG INPUT_CHARSET C1
------------------------------------------------------------
ZHS16GBK ZHS16GBK China

SQL> select c1, dump (c1, 16) from t2;

C1 DUMP (C1, 16)
----------------------------------------------------------------------------------------------------
Chinese Typ = 1 Len = 4: d6, d0, b9, fa ZHS16GBK encoding is 2 bits

At this time, the encoding is normal. If the character set of the database is ZHS16GBK, Oracle will not directly insert any conversion into the data.

4) set the coding of personal tools to UTF8, and then insert data into the database.

SQL> insert into t2 values ('zhs16gbk', 'utf8', 'China ');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from t2;

NLS_LANG INPUT_CHARSET C1
------------------------------------------------------------
ZHS16GBK ZHS16GBK? Why? Garbled characters are displayed here.
The data inserted after ZHS16GBK UTF8 China is displayed normally

SQL> select c1, input_charset, dump (c1, 16) from t2;

C1 INPUT_CHARSET DUMP (C1, 16)
----------------------------------------------------------------------------------------------------
? Why? ZHS16GBK Typ = 1 Len = 4: d6, d0, b9, fa

UTF8 Typ = 1 Len = 6: e4, b8, ad, e5, 9b, bd

Use the dump function to check whether the stored encoding length has changed to three UTF-8 characters.

There is no error in database storage, but iterm2 interprets the UTF8 code according to GB2312 and plays it on the screen. Obviously, there is a problem with the encoding length.

It can be concluded that if the character set of the database is ZHS16GBK, Oracle will extract the string sent from the client by UTF8 based on the internal MAP and convert it to ZHS16GBK, therefore, the displayed result is normal, but the encoding of the stored data has changed.

Experiment 2.
~~~~~~~~~~~~~

A) set the character set of the personal tool to GB2312.
B) Set NLS_LANG = american_america.AL32UTF8.


[Oracle @ hxy ~] $ Export NLS_LANG = american_america.AL32UTF8

[Oracle @ hxy ~] $ Sqlplus/as sysdba

SQL * Plus: Release 10.2.0.1.0-Production on Wed Mar 26 10:57:12 2014

Copyright (c) 1982,200 5, Oracle. All rights reserved.


Connected:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0-64bit Production
With the Partitioning, OLAP and Data Mining options

SQL> desc t2
Name Null? Type
-----------------------------------------------------------------------------
NLS_LANG VARCHAR2 (20)
INPUT_CHARSET VARCHAR2 (20)
C1 VARCHAR2 (20)

SQL> insert into t2 values ('utf8', 'zhs16gbk', 'China ');

SQL> select c1, dump (c1, 16) from t2;

SQL> insert into t2 values ('utf8', 'zhs16gbk, 'China ');

SQL> commit;

Commit complete.


SQL> select * from t2;

NLS_LANG INPUT_CHARSET C1
Certificate ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
UTF8 ZHS16GBK limit ??
ZHS16GBK ZHS16GBK
ZHS16GBK UTF8 cannot exceed limit?
All become garbled.

SQL> select c1, input_charset, dump (c1, 16) from t2;

C1 INPUT_CHARSET DUMP (C1, 16)
---------------------------------------------------------------------------------------------------------------------------------------------------
Why ?? ZHS16GBK Typ = 1 Len = 4: a3, bf, 3f, 3f

Juan Xiao ZHS16GBK Typ = 1 Len = 4: d6, d0, b9, fa

Why does it happen? UTF8 Typ = 1 Len = 6: e4, b8, ad, e5, 9b, bd
The above marked yellow encoding is obviously incorrect. This is called garbage-in -- garbage-out, which is the most deceptive setting.

Modify the character set of the personal tool back to the same settings as NLS_LANG --- UTF8 will cause problems.

SQL> select c1, input_charset, dump (c1, 16) from t2;

C1 INPUT_CHARSET DUMP (C1, 16)
-----------------------------------------------------------------------------------------------------------------------------------
??? <= ZHS16GBK Typ = 1 Len = 4: a3, bf, 3f, 3f

ZHS16GBK Typ = 1 Len = 4: d6, d0, b9, fa

Juan UTF8 Typ = 1 Len = 6: e4, b8, ad, e5, 9b, bd
This encoding cannot be properly displayed and garbled, Which is deceptive. It is often prone to errors in daily work, but it is difficult to find problems. Be careful with this encoding.

Experiment 3.

Personal tool: UTF8

NLS_LANG: american_america.UTF8

SQL> insert into t2 values ('utf8', 'utf8', 'China ');

SQL & gt; set line 200
SQL> select c1, input_charset, dump (c1, 16) from t2;

C1 INPUT_CHARSET DUMP (C1, 16)
--------------------------------------------------------------------------------------------------------------
??? ZHS16GBK Typ = 1 Len = 4: a3, bf, 3f, 3f

UTF8 Typ = 1 Len = 4: d6, d0, b9, fa

ZHS16GBK Typ = 1 Len = 4: d6, d0, b9, fa
Juan UTF8 Typ = 1 Len = 6: e4, b8, ad, e5, 9b, bd

We can see that as long as the character set of the personal tool is consistent with that of the nls_lang, and the database character set and client character set can be converted to each other, no garbled characters will appear,

No garbled characters are not the client Character Set mentioned earlier and must be consistent with the database character set.

3. About the character set of export/import.

A) The setting of NLS_LANG during export determines the character set in the local DMP file.
B) Character Set conversion during import is divided into three steps:

B .1 set the character set for reading DMP files. Generally, there are 2 ~ 3 bytes. you can modify the character set by changing the values of these two bytes before 10 Gb. however, after 10G and 11G, the character set still exists elsewhere and is basically not modified.
B .2 converts the characters in the DMP file to the character set by NLS_LANG during import.
B .3 convert characters from NLS_LANG to database word sets during import.

$ Export NLS_LANG = AMERICAN_AMERICA.ZHS16GBK

$ Exp \ "/as sysdba \" file = demo. dmp tables = t2;

Export: Release 11.2.0.4.0-Production on Sun Mar 23 19:50:24 2014

Copyright (c) 1982,201 1, Oracle and/or its affiliates. All rights reserved.


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0-64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set
Server uses AL32UTF8 character set (possible charset conversion)

About to export specified tables via Conventional Path...
.. Exporting table T2 3 rows exported


$ Cat demo. dmp | od-x | head


0000000 0303 4554 5058 524f 3a54 3156 2e31 3230 <<== 0354
0000020 302e 0a30 5344 5359 520a 4154 4c42 5345
0000040 running a 3931 0a32 0a30 3237 0354 A 030a
0000060 0769 00d0 0001 0000 0000 0000 0000
0000100 2020 2020 2020 2020 2020 2020 2020
*
0000140 2020 2020 2020 2020 7553 206e 614d 2072
0000160 3332 3120 3a39 3035 323a 2035 3032 3431
0000200 6564 6f6d 642e 706d 0000 0000 0000
0000220 0000 0000 0000 0000 0000 0000 0000


SQL> select nls_charset_name (to_number ('20140901', 'xxxxx') from dual;

NLS_CHARSET_NAME (TO_NUMBER ('20140901', 'xxxxx'
----------------------------------------
ZHS16GBK

/*
Select to_char (nls_charset_id ('zhs16gbk'), 'xxx') from dual;
In the command status of vi:
: %! Xxd -- convert the current text to hexadecimal format.
: %! Od -- convert the current text to hexadecimal format.
: %! Xxd-c 12 -- convert the current text to a hexadecimal format and display 12 bytes per line.
: %! Xxd-r -- convert the current file back to the text format.
*/

If you use an SQL script, pay attention to the script encoding. /**/The block is the hexadecimal code command for viewing the file with vi.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.