Thoroughly understand the Oracle Character Set

Source: Internet
Author: User
The following is a detailed analysis of the character set in Oracle. For more information, see Characterset. It is a collection of all abstract characters supported by the system. A character is a general term for all types of texts and symbols, including Chinese characters, punctuation marks, graphical symbols, and numbers. Common Character sets include ASCII and ZHS16GB23.

The following is a detailed analysis of the Character set in Oracle. For more information, see Character set. It is a collection of all abstract characters supported by the system. A character is a general term for all types of texts and symbols, including Chinese characters, punctuation marks, graphical symbols, and numbers. Common Character sets include ASCII and ZHS16GB23.

The following is a detailed analysis of the character set in Oracle. For more information, see

Basic Concepts
Character set: a collection of all abstract characters supported by the system. A character is a general term for all types of texts and symbols, including Chinese characters, punctuation marks, graphical symbols, and numbers. Common Character sets include ASCII, ZHS16GB231280, and ZHS16GBK.

Character Encoding (Character Encoding): a set of rules that can be used for a set of natural language characters (such as the alphabet or syllable table ), it is paired with another set (such as computer code. That is, the correspondence between the Symbol Set and the digital system is established. Similar to character sets, common character encodings include ASCii, ZHS16GBK, ZHT16BIG5, and ZHS32GB18030.
Character sets are actually character sets. character encoding means how to convert these characters into bytes for saving, reading, and transmitting.

Unicode: contains almost all the characters available to humans. It is constantly increasing every year and can be seen as a general character set. It unifies all the characters in the world and encodes them in a unified manner, so that character incompatibility and character conversion will no longer occur.
It has three encoding methods:
1. UTF-32 encoding: fixed to use 4 bytes to represent a character, there is a space utilization efficiency problem.
2. UTF-16 encoding: two bytes are used for the more than 60000 characters that are commonly used, and four bytes are used for the rest.
3. UTF-8 encoding: compatible with ASCII encoding. Two bytes are used for Latin and Greek. Other common characters including Chinese characters use three bytes. The remaining rarely used characters use four bytes.

Basic Principles of Oracle Character Set
Before understanding the basic principles of the Oracle character set, you must first distinguish the following three concepts:
1. Oracle Database Server Character Set: Which character encoding is used to store characters in Oracle? You can use the following statement to check the settings of the database character set.

The Code is as follows:


SQL> select * from v $ nls_parameters where parameter = 'nls _ CHARACTERSET ';
PARAMETER VALUE
-----------------------------------------------
NLS_CHARACTERSET AL32UTF8


2. client operating system character set: the character encoding used by the client operating system to store characters.
For Windows, you can use the chcp command to obtain the code page ):

The Code is as follows:


C: Usersxianzhu> chcp
Active code page: 936


According to the code page, find the character set corresponding to the official Microsoft documentation "National Language Support (NLS) API Reference.
For Linux, set the character set in/etc/sysconfig/i18n:

The Code is as follows:


LANG = "zh_CN.GB2312" (specify the character set of the current operating system)
SUPPORTED = "zh_CN.GB2312" (specify the character set SUPPORTED by the current operating system)
SYSFONT = "lat0-sun16" (specifies the font of the current operating system)


3. Client NLS_LANG parameter: this parameter is used to indicate to Oracle the character set of the client operating system.
With the above three basic concepts, I will explain the basic principles of Oracle Character Set conversion:
1. Set the client's NLS_LANG to the character set of the client's Operating System
2. If the database character set is equal to NLS_LANG, no conversion is made when the database and client transmit characters.
3. If they are not the same, they need to be converted between different character sets. Only when the character set of the client operating system is a subset of the database character set can the conversion be correct. Otherwise, garbled characters may occur.
Analysis of several common situations
Let's take a look at an example first, and then look at the nature of the phenomenon. We will analyze this example.
This example is as follows:

The Code is as follows:


1. database character set is Unicode (UTF-8 encoding)
Our database version is 10.2.0.4.0, and the database character set is:
SQL> select * from v $ nls_parameters where parameter = 'nls _ CHARACTERSET ';
PARAMETER VALUE
----------------------------------------------------------------------
NLS_CHARACTERSET AL32UTF8
2. The client operating system character set is code page 936 (character set is ZHS16GBK)
You can use chcp to obtain the windows code page)
C: Documents and Settingsa105024Desktop> chcp
Active code page: 936
3. Create a test table
SQL> create table test (id number, var varchar2 (30 ));
Table created.
4. insert data
Here we start two sessions in the same operating system. The NLS_LANG of session1 is set to the same as the database character set (that is, AL32UTF8 ):
C: Documents and Settingsa105024Desktop> set nls_lang = Simplified Chinese_China.AL32UTF8
Connect to the database and insert a piece of data:
Session_1> insert into test values (1, 'China ');
1 row created.
Session_1> commit;
Commit complete.
Set the NLS_LANG of session2 to the same as the client operating system (ZHS16GBK ):
C: Documents and Settingsa105024Desktop> set nls_lang = Simplified Chinese_China.ZHS16GBK
Connect to the database and insert a piece of data:
Session_2> insert into test values (2, 'China ');
1 row created.
Session_2> commit;
Commit complete.
5. Execute the query
Execute the query on session 1:
Session_1> select * from test;
ID VAR
-------------------------------
1 China
2 Juan
Execute the query on session 2:
Session_2> select * from test;
ID VAR
------------------------------
1 ???
2 China

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.