ORACLE Character Set Basics

Source: Internet
Author: User

Concept description

ORACLE databases are divided into national character set and database character set. Both of them must be set during database creation. The National Character Set is mainly used for field data of the NCHAR, NVARCHAR, and NCLOB types. The database character set is widely used for field data of the CHAR, VARCHAR, CLOB, and LONG types;

The Character Set Name of ORACLE generally consists of the following parts: language or region, bit digits indicating a character, standard Character Set Name (optional, S or C, indicating the server or client ). The UTF8 and UTFE character sets in ORACLE do not comply with this rule, and the others are basically in this format. NLS_LANG = <Language >_< Territory>. <Clients Characterset>

Set nls_lang = AMERICAN_AMERICA.UTF8

Set nls_lang = SIMPLIFIED CHINESE_AMERICA.UTF8

NLS (National Language Support) National Language Support. NLS is a very powerful feature of databases. It controls many aspects of data, such as how data is stored. Generally, it controls the following two aspects:

Encoding of persistent storage of text data on disks

Transparently converts data from one character set to another.

Assume that you use the WE8ISO8859P1 character set in the database to store 8-bit data, but some of your customers use a 7-bit character set. For example, the US7ASCII Character Set conversion process usually modifies the data, however, you often map a large character set (in this example, the 8-bit Character Set) to a smaller character set (in this example, the 7-bit character set ). This is a lossy conversion (lossy conversion), and the character will be modified, because: a smaller character set cannot represent every character in a larger character set. However, this conversion must occur. This is also the cause of garbled code. If the database stores data in a single-byte character set, but the customer (such as a Java application, because the Java language uses Unicode) wants the data to be represented in multiple bytes, the conversion must be performed, only in this way can the customer application use the data.

ORACLE supports the following Unicode character sets. The following lists the names of character sets, the corresponding database version range, and the Unicode version used.

AL24UTFFSS: ORACLE is the first character set that supports Unicode. It is used since version 7.2, But it supports Unicode version 1.1. Therefore, it is not supported since 9i.

UTF8: it is a UTF-8-encoded character set used by ORACLE from ORACLE8, from ORACLE8.0 to ORACLE8.16, Unicode version is 2.1, while ORACLE817 to 10g, the Unicode standard is 3.0

UTFE: the Unicode Character Set of the database on the EBCDIC platform. Therefore, it belongs to the character set used by the dedicated system. Other attributes are basically the same as UTF8.

AL32UTF8: A UTF-8-encoded character set that was used from ORACLE9. Compared with UTF8, it adopts Unicode version updates and uses the Unicode 4.01 standard in 10g versions, UTF8 adopts the Unicode 3.0 standard in 10 Gb for compatibility considerations.

AL16UTF16: ORACLE's first character set that adopts UTF-16 encoding, which was used from ORACLE9 and is used as the default National Character Set and cannot be used as the character set of the database. This is because the character set of the Data Base determines the encoding method of the SQL and PL/SQL source code, for UTF-16, which uses a fixed two-byte to represent English letters, it is indeed not suitable for use as the character set of the database. Currently, ORACLE uses the database Character Set Based on ASCII or EBCDID as the subset encoding scheme.

For US7ASCII, the region is US, and seven bits are used to represent a character. The standard Character Set Name Is ASCII.

For the Chinese Character Set ZHS16GBK, Which is simplified Chinese (ZHT is traditional Chinese), a character must be 16 bits, and the standard Character Set Name Is GBK. While ZHS16CGB231280 represents Simplified Chinese, a character requires 16 bits, the standard Character Set Name Is GB231280, which belongs to the GB2312-80 standard we mentioned earlier in 1981. Although we say that the GBK encoding standard is an extension of the GB2312 encoding standard, the relationship between the database character set ZHS16GBK and ZHS16CGB231280 is not a strict superset and subset, it is mainly because the numbers of some Chinese characters are different in the two character sets, so they may encounter problems during character set conversion.

View character set parameters

1: View NLS_CHARACTERSET: Character Set, NLS_NCHAR_CHARACTERSET: National Character Set

Instance Character Set Environment

SELECT * FROM NLS_INSTANCE_PARAMETERS

It mainly involves the values of NLS_LANGUAGE and NLS_TERRITORY. NLS_INSTANCE_PARAMETERS is derived from v $ parameter. Note: "NLS_INSTANCE_PARAMETERS indicates the character set setting of the client. It can be a parameter file, an environmental variable, or a registry. Remember that it represents the character set environment of the instance.

Database Available Character Set parameter settings

SELECT * from v $ NLS_VALID_VALUES

Database Server Character Set

SELECT * FROM NLS_DATABASE_PARAMETERS

 

NLS_DATABASE_PARAMETERS is derived from props $, which indicates the character set of the database.

Client Character Set Environment

SELECT * from v $ NLS_PARAMETERS;

Select userenv ('language') from dual;

USERENV and V $ NLS_PARAMETERS indicate the current character set environment. If you run the command on the client, it indicates the client Character Set environment.

Session Character Set Environment

SELECT * FROM NLS_SESSION_PARAMETERS;

It is derived from v $ nls_parameters, which indicates the SESSION's own settings. It may be the SESSION's environment variable or the alter session is completed. If the SESSION has no special settings, it will be consistent with V $ NLS_PARAMETERS.

2: view the client character set (NLS_LANG) Method

If the system is a LINUX or UNIX platform, you can also run the following command to view it (provided that the NLS_LANG must be set; otherwise, a null value is found)

[Etl @ m1 ~] $ Env | grep NLS_LANG

NLS_LANG = AMERICAN_AMERICA.ZHS16GBK

[Etl @ m1 ~] $ Echo $ NLS_LANG

AMERICAN_AMERICA.ZHS16GBK

If the system is a WINDOWS platform, run the following command:

1: Enter regedit to enter the Registry. In the HKEY_LOCAL_MACHINE \ SOFTWARE \ ORACLE \ KEY_OraDb11g_home1 \ directory (the last item is related to the Instance name and database version), locate the NLS_LANG option, and double-click it, you can see the corresponding value.

2: echo % NLS_LANG %. If NLS_LANG is not set, you cannot use this command to see the relevant information.

3: Set the NLS_LANG Method

Windows:

3.1

Set NLS_LANG = SIMPLIFIED CHINESE_CHINA.ZHS16GBK

Set NLS_LANG = SIMPLIFIED CHINESE_CHINA.ZHS16GBK

3.2 you can modify the registry key value for permanent settings

HKEY_LOCAL_MACHINE/SOFTWARE/ORACLE/KEY_XXXX_home1/NLS_LANG

UNIX & LINUX

3.3

Export NLS_LANG = AMERICAN_AMERICA.UTF8

3.4 you can edit the bash_profile file for permanent settings

Vi. bash_profile

Export NLS_LANG = "SIMPLIFIED CHINESE_CHINA.ZHS16GBK"

The character set of the client must be the same as that of the server to correctly display non-Ascii characters of the database. If multiple settings exist,Priority: SQL Function> Alter session> environment variable> registry> parameter file The character set must be consistent, but the language settings can be different. We recommend that you use English for language settings. If the character set is zhs16gbk, The nls_lang can be American_America.zhs16gbk.

Modify database Character Set

The database character set cannot be changed in principle after it is created. Therefore, it is very important to consider which character set to use at the beginning of preliminary planning and installation. For database servers, incorrect Character Set modification may lead to many unpredictable consequences, which may seriously affect the normal operation of the database, therefore, before modification, check whether the two character sets have the relationship between Subsets and supersets. Generally, we do not recommend that you modify the character set of the ORACLE database SERVER unless you have.

You can modify the character set settings of a database in two ways.

1. You usually need to export the database data, recreate the database, and then import the database data for conversion.

2. the alter database character set statement is used to modify the character set. However, the character set that can be modified after the DATABASE is created is restricted. Only when the new character set is the current character set, the character set of the DATABASE can be modified, for example, UTF8 is a superset of US7ASCII. You can use alter database character set UTF8 to modify the character set of a DATABASE.

In particular, the two most commonly used character sets ZHS16GBK and ZHS16CGB231280 do not have a subset or superset relationship. Therefore, in theory, mutual conversion between these two character sets is not supported.

For more information about database subset-superset pairs, see the official documentation, for example, ORACLE 10 GBHttp://docs.oracle.com/cd/B19306_01/server.102/b14225/applocaledata.htm

Table A-11 Subset-Superset Pairs

Subset

Superset

AR8ADOS710

AR8ADOS710T

AR8ADOS720

AR8ADOS720T

AR8ADOS720T

AR8ADOS720

AR8APTEC715

AR8APTEC715T

AR8ARABICMACT

AR8ARABICMAC

AR8ISO8859P6

AR8ASMO708PLUS

AR8ISO8859P6

AR8ASMO8X

AR8MUSSAD768

AR8MUSSAD768T

AR8MUSSAD768T

AR8MUSSAD768

AR8NAFITHA711

AR8NAFITHA711T

AR8NAFITHA721

AR8NAFITHA721T

AR8SAKHR707

AR8SAKHR707T

AR8SAKHR707T

AR8SAKHR707

BLT8CP921

BLT8ISO8859P13

BLT8CP921

LT8MSWIN921

D7DEC

D7SIEMENS9780X

D7SIEMENS9780X

D7DEC

DK7SIEMENS9780X

N7SIEMENS9780X

I7DEC

I7SIEMENS9780X

I7SIEMENS9780X

IW8EBCDIC424

IW8EBCDIC424

IW8EBCDIC1086

KO16KSC5601

KO16MSWIN949

LT8MSWIN921

BLT8ISO8859P13

LT8MSWIN921

BLT8CP921

N7SIEMENS9780X

DK7SIEMENS9780X

US7ASCII

See Table A-12, "US7ASCII Supersets ".

UTF8

AL32UTF8

WE8DEC

TR8DEC

WE8DEC

WE8NCR4970

WE8ISO8859P1

WE8MSWIN1252

WE8ISO8859P9

TR8MSWIN1254

WE8NCR4970

TR8DEC

WE8NCR4970

WE8DEC

WE8PC850

WE8PC858

   1: SQL>CONN / AS SYSDBA;
   2:  
   3: SQL>SHUTDOWN IMMEDIATE;
   4:  
   5: SQL>STARTUP MOUNT;
   6:  
   7: SQL>ALTER SYSTEM ENABLE RESTRICTED SESSION;
   8:  
   9: SQL>ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
  10:  
  11: SQL>ALTER SYSTEM SET AQ_TM_PROCESSES=0;
  12:  
  13: SQL>ALTER DATABASE OPEN;

-- From subset to parent set

   1: SQL>ALTER DATABASE CHARACTER SET ZHS16GBK;
   2:  
3: -- if it is from parent set to subset, use the INTERNAL_USE parameter to skip over subset detection.
   4:  
   5: SQL>ALTER DATABASE NATIONAL CHARACTER SET UTF8;
   6:  
   7: --SQL>ALTER DATABASE NATIONAL CHARACTER SET INTERNAL_USE UTF8;
   8:  
   9: SQL>SHUTDOWN IMMEDIATE;
  10:  
  11: SQL>STARTUP;

Alter database national character set UTF8;

There may be prompts like ORA-12717: Cannot alter database national character set when NCLOB data exists. You can solve the problem by using alter database national character set INTERNAL_USE UTF8.

References:

Http://blog.csdn.net/jkl_123/article/details/6157379

Http://blog.csdn.net/tianlesoftware/Article/details/4915223

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.