ORACLE Character Set Basics

Last Update:2014-05-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Concept description

ORACLE databases are divided into national character set and database character set. Both of them must be set during database creation. The National Character Set is mainly used for field data of the NCHAR, NVARCHAR, and NCLOB types. The database character set is widely used for field data of the CHAR, VARCHAR, CLOB, and LONG types;

The Character Set Name of ORACLE generally consists of the following parts: language or region, bit digits indicating a character, standard Character Set Name (optional, S or C, indicating the server or client ). The UTF8 and UTFE character sets in ORACLE do not comply with this rule, and the others are basically in this format. NLS_LANG = <Language >_< Territory>. <Clients Characterset>

Set nls_lang = AMERICAN_AMERICA.UTF8

Set nls_lang = SIMPLIFIED CHINESE_AMERICA.UTF8

NLS (National Language Support) National Language Support. NLS is a very powerful feature of databases. It controls many aspects of data, such as how data is stored. Generally, it controls the following two aspects:

Encoding of persistent storage of text data on disks

Transparently converts data from one character set to another.

Assume that you use the WE8ISO8859P1 character set in the database to store 8-bit data, but some of your customers use a 7-bit character set. For example, the US7ASCII Character Set conversion process usually modifies the data, however, you often map a large character set (in this example, the 8-bit Character Set) to a smaller character set (in this example, the 7-bit character set ). This is a lossy conversion (lossy conversion), and the character will be modified, because: a smaller character set cannot represent every character in a larger character set. However, this conversion must occur. This is also the cause of garbled code. If the database stores data in a single-byte character set, but the customer (such as a Java application, because the Java language uses Unicode) wants the data to be represented in multiple bytes, the conversion must be performed, only in this way can the customer application use the data.

ORACLE supports the following Unicode character sets. The following lists the names of character sets, the corresponding database version range, and the Unicode version used.

AL24UTFFSS: ORACLE is the first character set that supports Unicode. It is used since version 7.2, But it supports Unicode version 1.1. Therefore, it is not supported since 9i.

UTF8: it is a UTF-8-encoded character set used by ORACLE from ORACLE8, from ORACLE8.0 to ORACLE8.16, Unicode version is 2.1, while ORACLE817 to 10g, the Unicode standard is 3.0

UTFE: the Unicode Character Set of the database on the EBCDIC platform. Therefore, it belongs to the character set used by the dedicated system. Other attributes are basically the same as UTF8.

AL32UTF8: A UTF-8-encoded character set that was used from ORACLE9. Compared with UTF8, it adopts Unicode version updates and uses the Unicode 4.01 standard in 10g versions, UTF8 adopts the Unicode 3.0 standard in 10 Gb for compatibility considerations.

AL16UTF16: ORACLE's first character set that adopts UTF-16 encoding, which was used from ORACLE9 and is used as the default National Character Set and cannot be used as the character set of the database. This is because the character set of the Data Base determines the encoding method of the SQL and PL/SQL source code, for UTF-16, which uses a fixed two-byte to represent English letters, it is indeed not suitable for use as the character set of the database. Currently, ORACLE uses the database Character Set Based on ASCII or EBCDID as the subset encoding scheme.

For US7ASCII, the region is US, and seven bits are used to represent a character. The standard Character Set Name Is ASCII.

For the Chinese Character Set ZHS16GBK, Which is simplified Chinese (ZHT is traditional Chinese), a character must be 16 bits, and the standard Character Set Name Is GBK. While ZHS16CGB231280 represents Simplified Chinese, a character requires 16 bits, the standard Character Set Name Is GB231280, which belongs to the GB2312-80 standard we mentioned earlier in 1981. Although we say that the GBK encoding standard is an extension of the GB2312 encoding standard, the relationship between the database character set ZHS16GBK and ZHS16CGB231280 is not a strict superset and subset, it is mainly because the numbers of some Chinese characters are different in the two character sets, so they may encounter problems during character set conversion.

View character set parameters

1: View NLS_CHARACTERSET: Character Set, NLS_NCHAR_CHARACTERSET: National Character Set

Instance Character Set Environment

SELECT * FROM NLS_INSTANCE_PARAMETERS

It mainly involves the values of NLS_LANGUAGE and NLS_TERRITORY. NLS_INSTANCE_PARAMETERS is derived from v $ parameter. Note: "NLS_INSTANCE_PARAMETERS indicates the character set setting of the client. It can be a parameter file, an environmental variable, or a registry. Remember that it represents the character set environment of the instance.

Database Available Character Set parameter settings

SELECT * from v $ NLS_VALID_VALUES

Database Server Character Set

SELECT * FROM NLS_DATABASE_PARAMETERS

NLS_DATABASE_PARAMETERS is derived from props $, which indicates the character set of the database.

Client Character Set Environment

SELECT * from v $ NLS_PARAMETERS;

Select userenv ('language') from dual;

USERENV and V $ NLS_PARAMETERS indicate the current character set environment. If you run the command on the client, it indicates the client Character Set environment.

Session Character Set Environment

SELECT * FROM NLS_SESSION_PARAMETERS;

It is derived from v $ nls_parameters, which indicates the SESSION's own settings. It may be the SESSION's environment variable or the alter session is completed. If the SESSION has no special settings, it will be consistent with V $ NLS_PARAMETERS.

2: view the client character set (NLS_LANG) Method

If the system is a LINUX or UNIX platform, you can also run the following command to view it (provided that the NLS_LANG must be set; otherwise, a null value is found)

[Etl @ m1 ~] $ Env | grep NLS_LANG

NLS_LANG = AMERICAN_AMERICA.ZHS16GBK

[Etl @ m1 ~] $ Echo $ NLS_LANG

AMERICAN_AMERICA.ZHS16GBK

If the system is a WINDOWS platform, run the following command:

1: Enter regedit to enter the Registry. In the HKEY_LOCAL_MACHINE \ SOFTWARE \ ORACLE \ KEY_OraDb11g_home1 \ directory (the last item is related to the Instance name and database version), locate the NLS_LANG option, and double-click it, you can see the corresponding value.

2: echo % NLS_LANG %. If NLS_LANG is not set, you cannot use this command to see the relevant information.

3: Set the NLS_LANG Method

Windows:

3.1

Set NLS_LANG = SIMPLIFIED CHINESE_CHINA.ZHS16GBK

3.2 you can modify the registry key value for permanent settings

HKEY_LOCAL_MACHINE/SOFTWARE/ORACLE/KEY_XXXX_home1/NLS_LANG

UNIX & LINUX

3.3

Export NLS_LANG = AMERICAN_AMERICA.UTF8

3.4 you can edit the bash_profile file for permanent settings

Vi. bash_profile

Export NLS_LANG = "SIMPLIFIED CHINESE_CHINA.ZHS16GBK"

The character set of the client must be the same as that of the server to correctly display non-Ascii characters of the database. If multiple settings exist,Priority: SQL Function> Alter session> environment variable> registry> parameter file The character set must be consistent, but the language settings can be different. We recommend that you use English for language settings. If the character set is zhs16gbk, The nls_lang can be American_America.zhs16gbk.

Modify database Character Set

The database character set cannot be changed in principle after it is created. Therefore, it is very important to consider which character set to use at the beginning of preliminary planning and installation. For database servers, incorrect Character Set modification may lead to many unpredictable consequences, which may seriously affect the normal operation of the database, therefore, before modification, check whether the two character sets have the relationship between Subsets and supersets. Generally, we do not recommend that you modify the character set of the ORACLE database SERVER unless you have.

You can modify the character set settings of a database in two ways.

1. You usually need to export the database data, recreate the database, and then import the database data for conversion.

2. the alter database character set statement is used to modify the character set. However, the character set that can be modified after the DATABASE is created is restricted. Only when the new character set is the current character set, the character set of the DATABASE can be modified, for example, UTF8 is a superset of US7ASCII. You can use alter database character set UTF8 to modify the character set of a DATABASE.

In particular, the two most commonly used character sets ZHS16GBK and ZHS16CGB231280 do not have a subset or superset relationship. Therefore, in theory, mutual conversion between these two character sets is not supported.

For more information about database subset-superset pairs, see the official documentation, for example, ORACLE 10 GBHttp://docs.oracle.com/cd/B19306_01/server.102/b14225/applocaledata.htm

Table A-11 Subset-Superset Pairs

Subset	Superset
AR8ADOS710	AR8ADOS710T
AR8ADOS720	AR8ADOS720T
AR8ADOS720T	AR8ADOS720
AR8APTEC715	AR8APTEC715T
AR8ARABICMACT	AR8ARABICMAC
AR8ISO8859P6	AR8ASMO708PLUS
AR8ISO8859P6	AR8ASMO8X
AR8MUSSAD768	AR8MUSSAD768T
AR8MUSSAD768T	AR8MUSSAD768
AR8NAFITHA711	AR8NAFITHA711T
AR8NAFITHA721	AR8NAFITHA721T
AR8SAKHR707	AR8SAKHR707T
AR8SAKHR707T	AR8SAKHR707
BLT8CP921	BLT8ISO8859P13
BLT8CP921	LT8MSWIN921
D7DEC	D7SIEMENS9780X
D7SIEMENS9780X	D7DEC
DK7SIEMENS9780X	N7SIEMENS9780X
I7DEC	I7SIEMENS9780X
I7SIEMENS9780X	IW8EBCDIC424
IW8EBCDIC424	IW8EBCDIC1086
KO16KSC5601	KO16MSWIN949
LT8MSWIN921	BLT8ISO8859P13
LT8MSWIN921	BLT8CP921
N7SIEMENS9780X	DK7SIEMENS9780X
US7ASCII	See Table A-12, "US7ASCII Supersets ".
UTF8	AL32UTF8
WE8DEC	TR8DEC
WE8DEC	WE8NCR4970
WE8ISO8859P1	WE8MSWIN1252
WE8ISO8859P9	TR8MSWIN1254
WE8NCR4970	TR8DEC
WE8NCR4970	WE8DEC
WE8PC850	WE8PC858

   1: SQL>CONN / AS SYSDBA;

2:

   3: SQL>SHUTDOWN IMMEDIATE;

4:

   5: SQL>STARTUP MOUNT;

6:

   7: SQL>ALTER SYSTEM ENABLE RESTRICTED SESSION;

8:

   9: SQL>ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;

10:

  11: SQL>ALTER SYSTEM SET AQ_TM_PROCESSES=0;

12:

  13: SQL>ALTER DATABASE OPEN;

-- From subset to parent set

   1: SQL>ALTER DATABASE CHARACTER SET ZHS16GBK;

2:

3: -- if it is from parent set to subset, use the INTERNAL_USE parameter to skip over subset detection.

4:

   5: SQL>ALTER DATABASE NATIONAL CHARACTER SET UTF8;

6:

   7: --SQL>ALTER DATABASE NATIONAL CHARACTER SET INTERNAL_USE UTF8;

8:

   9: SQL>SHUTDOWN IMMEDIATE;

10:

  11: SQL>STARTUP;

Alter database national character set UTF8;

There may be prompts like ORA-12717: Cannot alter database national character set when NCLOB data exists. You can solve the problem by using alter database national character set INTERNAL_USE UTF8.

References:

Http://blog.csdn.net/jkl_123/article/details/6157379

Http://blog.csdn.net/tianlesoftware/Article/details/4915223

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

ORACLE Character Set Basics

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

ORACLE Character Set Basics

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support