Oracle Character Set Conversion

Last Update:2016-09-08 Source: Internet

Author: User

Tags sqlplus

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

These days in the work encountered a character garbled problem, found in the cmd window sqlplus directly update a Chinese and use @ Call a file for the same update, the storage results are different. For a moment of confusion, it is not clear how Oracle handles each character set. Through some data and experimental summaries, the system learns about the character sets in Oracle.

I. Basics of character sets:
On the network has been a lot of users on the character set has been studied, personally think there are several good sites can refer to
http://blog.csdn.net/tianlesoftware/article/details/4915223
Http://www.itstreets.com/post/34.html
http://www.oraclefans.cn/forum/showblog.jsp?rootid=3303

Two. Conversion of the Oracle character set

Here are some experiments to learn about Oracle's handling of various character sets
There are 3 concepts involved: The character set of the database server, the character set of the client and the character set of the operating system
In the experimental environment: the character set of the database server is Al32utf8

1. When the character set of the client and server side is consistent
Experiment one: The client character set is also Al32utf8, and the operating system's character set is GB2312
[Email protected]>update c_language set languagename= ' china ' where id = ' ZH-CN ';

1 row updated.

[Email Protected]>commit;

Commit complete.

[Email Protected]>select dump (languagename, 1016) from c_language where id = ' ZH-CN ';

DUMP (languagename,1016)
-----------------------------------------------------------------------------------------

Typ=1 len=4 CHARACTERSET=AL32UTF8:D6,D0,B9,FA

[Email protected]>

Using the DUMP function to view the storage in the database in 16, you can see the "China" 2 words, the memory encoding in the database is: D6,D0,B9,FA

First build a file T1.txt, write "China" The 2 words, in GB2312 (codepage for 936) format to save.
Using the type command in the cmd window, you can see that the text is displayed correctly at this time in the cmd window.

V:\>chcp
Active Code page:936

V:\>type E:\t1.txt
China
V:\>

Then using the Winhex tool to view the T1.txt text, you can see that its 16 binary encoding is: D6,D0,B9,FA

Experiment two: Use @ in the Sqlplus environment to invoke a SQL file, the contents of the file is the same sentence as the experimental SQL, the file format is UTF8
The character set of the server and client is still Al32utf8, and the operating system's codepage is 936

[Email Protected]>select dump (languagename, 1016) from c_language where id = ' ZH-CN ';

DUMP (languagename,1016)
---------------------------------------------------------------------------------------

Typ=1 len=6 CHARACTERSET=AL32UTF8:E4,B8,AD,E5,9B,BD

[Email protected]>

Also build a text file T2.txt, saved in UTF8 format. Then using Winhex view, you can see its 16 binary encoding: E4, B8, AD, E5, 9B, BD

with these 2 experiments, it is clear that Oracle does not perform storage conversions when the character sets of the client and database server are consistent. As in the cmd window,
"China", with the default GB2312 encoding of the operating system, and "China", which is encoded in UFT8 format, will be stored directly on the Oracle server.

However, we will find that the memory encoded in UTF8 format, the result of query in CMD is garbled. As follows:

[Email Protected]>host chcp
Active Code page:936

[Email protected]>col languagename format A30
[Email Protected]>select * from c_language where id= ' ZH-CN ';

ID LanguageName
-------------------- ------------------------------
ZH-CN Juan Ricoh
[Email protected]>

This is because the codepage of my machine cmd window is 936 by default, and the character set of the client and server side is the same, Oracle will not do any conversion of the saved character encoding
Directly back, but the database inside the code is stored by UTF8, and the cmd window character encoding is GB2312, so is the display garbled.

We can change the cmd window codepage to UTF8 (codepage 65001), then we can see the normal "China" 2 words. As follows:
（How to modify codepage, you can refer to: http://xiangqinghu1988.blog.163.com/blog/static/58822991201222232456746/）

Active Code page:65001

[Email Protected]>select * from c_language where id = ' ZH-CN ';

ID LanguageName
-------------------- ------------------------------
ZH-CN China

[Email protected]>

therefore, when Oracle does not perform character conversions, that is, when the character set of the client and server side is consistent, if garbled, It is inconsistent to indicate the client's environment encoding (such as the encoding of the CMD window) and the character encoding of the Oracle Database Store (the cmd window is updated directly with Sqlplus and calls to an UTF8 file format). If you want to view, you need to modify the client's environment encoding. For example, change the codepage of CMD.

2. When the character set on the client and server side is inconsistent
Experiment three: Setting the client's character set to ZHS16GBK, and updating the record directly inside the Sqlplus as follows:

V:\>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

V:\>sqlplus [email protected]

Sql*plus:release 11.2.0.1.0 Production on Thu Mar 22 15:29:43 2012

Copyright (c) 1982, Oracle. All rights reserved.

Enter Password:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0-64bit Production
With the partitioning, OLAP, Data Mining and Real application testing options

[Email protected]>update c_language set languagename= ' china ' where id = ' ZH-CN ';

1 row updated.

[Email protected]>commit;

Commit complete.

[Email protected]>col languagename format A30
[Email Protected]>select * from c_language where id = ' ZH-CN ';

ID LanguageName
---------------------------------------- ------------------------------
ZH-CN China

[Email Protected]>select dump (languagename, 1016) from c_language where id = ' ZH-CN ';

DUMP (languagename,1016)
-----------------------------------------------------------------------------------------

Typ=1 len=6 CHARACTERSET=AL32UTF8:E4,B8,AD,E5,9B,BD

[Email protected]>

The results are interesting, the database is no longer stored in the experiment as the D6,D0,B9,FA, but the very clever into the UTF8 format of the storage. If we open another one
Sqlplus window, its client character set is Al32utf8, at this time to look at this record as follows, found garbled

V:\>sqlplus [email protected]

Sql*plus:release 11.2.0.1.0 Production on Thu Mar 22 15:36:56 2012

Copyright (c) 1982, Oracle. All rights reserved.

Enter Password:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0-64bit Production
With the partitioning, OLAP, Data Mining and Real application testing options

[Email protected]>col languagename format A30
[Email Protected]>select * from c_language where id = ' ZH-CN ';

ID LanguageName
-------------------- ------------------------------
ZH-CN Juan Ricoh

[Email protected]>

If we set the CMD codepage to UTF8 format, we can see that it is normal to display (at this point the client and the server-side character set are required to match)

Active Code page:65001
[Email Protected]>select * from c_language where id = ' ZH-CN ';

ID LanguageName
-------------------- ------------------------------
ZH-CN China

[Email protected]>

Note that the result of this experiment is quite coincidental, if the character set of the Sqlplus client is not set to ZHS16GBK, but set to other character sets, then the data stored in Oracle is not necessarily UTF8 encoded.
in fact, if the character set of the client and the service side is inconsistent, the Oracle layer will convert the character encoding, how to convert it is not clear to me, looking for a master in-depth study.

In general, if you encounter garbled characters in practice, you need to know the relationship between the Oracle client character set, the server-side character set, and the operating system character set.
1. If the character set on the client and server side is consistent, Oracle does not encode the conversion. Stored directly in the encoding format associated with the storage operating system. At the time of the query, it is not converted, but instead directly returns the encoding stored in the database
2. If the character set on the client and server side is inconsistent, Oracle encodes the stored characters at the bottom. When the query returns, it is also converted once.

Oracle Character Set Conversion

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More