Concept description
ORACLE databases are divided into national character set and database character set. Both of them must be set during database creation. The National Character Set is mainly used for field data of the NCHAR, NVARCHAR, and NCLOB types. The database character set is widely used for field data of the CHAR, VARCHAR, CLOB, and LONG types;
The Character Set Name of ORACLE generally consists of the following parts: language or region, bit digits indicating a character, standard Character Set Name (optional, S or C, indicating the server or client ). The UTF8 and UTFE character sets in ORACLE do not comply with this rule, and the others are basically in this format. NLS_LANG = <Language >_< Territory>. <Clients Characterset>
Set nls_lang = AMERICAN_AMERICA.UTF8
Set nls_lang = SIMPLIFIED CHINESE_AMERICA.UTF8
NLS (National Language Support) National Language Support. NLS is a very powerful feature of databases. It controls many aspects of data, such as how data is stored. Generally, it controls the following two aspects:
Encoding of persistent storage of text data on disks
Transparently converts data from one character set to another.
Assume that you use the WE8ISO8859P1 character set in the database to store 8-bit data, but some of your customers use a 7-bit character set. For example, the US7ASCII Character Set conversion process usually modifies the data, however, you often map a large character set (in this example, the 8-bit Character Set) to a smaller character set (in this example, the 7-bit character set ). This is a lossy conversion (lossy conversion), and the character will be modified, because: a smaller character set cannot represent every character in a larger character set. However, this conversion must occur. This is also the cause of garbled code. If the database stores data in a single-byte character set, but the customer (such as a Java application, because the Java language uses Unicode) wants the data to be represented in multiple bytes, the conversion must be performed, only in this way can the customer application use the data.
ORACLE supports the following Unicode character sets. The following lists the names of character sets, the corresponding database version range, and the Unicode version used.
AL24UTFFSS: ORACLE is the first character set that supports Unicode. It is used since version 7.2, But it supports Unicode version 1.1. Therefore, it is not supported since 9i.
UTF8: it is a UTF-8-encoded character set used by ORACLE from ORACLE8, from ORACLE8.0 to ORACLE8.16, Unicode version is 2.1, while ORACLE817 to 10g, the Unicode standard is 3.0
UTFE: the Unicode Character Set of the database on the EBCDIC platform. Therefore, it belongs to the character set used by the dedicated system. Other attributes are basically the same as UTF8.
AL32UTF8: A UTF-8-encoded character set that was used from ORACLE9. Compared with UTF8, it adopts Unicode version updates and uses the Unicode 4.01 standard in 10g versions, UTF8 adopts the Unicode 3.0 standard in 10 Gb for compatibility considerations.
AL16UTF16: ORACLE's first character set that adopts UTF-16 encoding, which was used from ORACLE9 and is used as the default National Character Set and cannot be used as the character set of the database. This is because the character set of the Data Base determines the encoding method of the SQL and PL/SQL source code, for UTF-16, which uses a fixed two-byte to represent English letters, it is indeed not suitable for use as the character set of the database. Currently, ORACLE uses the database Character Set Based on ASCII or EBCDID as the subset encoding scheme.
For US7ASCII, the region is US, and seven bits are used to represent a character. The standard Character Set Name Is ASCII.
For the Chinese Character Set ZHS16GBK, Which is simplified Chinese (ZHT is traditional Chinese), a character must be 16 bits, and the standard Character Set Name Is GBK. While ZHS16CGB231280 represents Simplified Chinese, a character requires 16 bits, the standard Character Set Name Is GB231280, which belongs to the GB2312-80 standard we mentioned earlier in 1981. Although we say that the GBK encoding standard is an extension of the GB2312 encoding standard, the relationship between the database character set ZHS16GBK and ZHS16CGB231280 is not a strict superset and subset, it is mainly because the numbers of some Chinese characters are different in the two character sets, so they may encounter problems during character set conversion.
View character set parameters
1: View NLS_CHARACTERSET: Character Set, NLS_NCHAR_CHARACTERSET: National Character Set
Instance Character Set Environment
SELECT * FROM NLS_INSTANCE_PARAMETERS
It mainly involves the values of NLS_LANGUAGE and NLS_TERRITORY. NLS_INSTANCE_PARAMETERS is derived from v $ parameter. Note: "NLS_INSTANCE_PARAMETERS indicates the character set setting of the client. It can be a parameter file, an environmental variable, or a registry. Remember that it represents the character set environment of the instance.
Database Available Character Set parameter settings
SELECT * from v $ NLS_VALID_VALUES
Database Server Character Set
SELECT * FROM NLS_DATABASE_PARAMETERS
NLS_DATABASE_PARAMETERS is derived from props $, which indicates the character set of the database.
Client Character Set Environment
SELECT * from v $ NLS_PARAMETERS;
Select userenv ('language') from dual;
USERENV and V $ NLS_PARAMETERS indicate the current character set environment. If you run the command on the client, it indicates the client Character Set environment.
Session Character Set Environment
SELECT * FROM NLS_SESSION_PARAMETERS;
It is derived from v $ nls_parameters, which indicates the SESSION's own settings. It may be the SESSION's environment variable or the alter session is completed. If the SESSION has no special settings, it will be consistent with V $ NLS_PARAMETERS.
2: view the client character set (NLS_LANG) Method
If the system is a LINUX or UNIX platform, you can also run the following command to view it (provided that the NLS_LANG must be set; otherwise, a null value is found)
[Etl @ m1 ~] $ Env | grep NLS_LANG
NLS_LANG = AMERICAN_AMERICA.ZHS16GBK
[Etl @ m1 ~] $ Echo $ NLS_LANG
AMERICAN_AMERICA.ZHS16GBK
If the system is a WINDOWS platform, run the following command:
1: Enter regedit to enter the Registry. In the HKEY_LOCAL_MACHINE \ SOFTWARE \ ORACLE \ KEY_OraDb11g_home1 \ directory (the last item is related to the Instance name and database version), locate the NLS_LANG option, and double-click it, you can see the corresponding value.
2: echo % NLS_LANG %. If NLS_LANG is not set, you cannot use this command to see the relevant information.
3: Set the NLS_LANG Method
Windows:
3.1
Set NLS_LANG = SIMPLIFIED CHINESE_CHINA.ZHS16GBK
Set NLS_LANG = SIMPLIFIED CHINESE_CHINA.ZHS16GBK
3.2 you can modify the registry key value for permanent settings
HKEY_LOCAL_MACHINE/SOFTWARE/ORACLE/KEY_XXXX_home1/NLS_LANG
UNIX & LINUX
3.3
Export NLS_LANG = AMERICAN_AMERICA.UTF8
3.4 you can edit the bash_profile file for permanent settings
Vi. bash_profile
Export NLS_LANG = "SIMPLIFIED CHINESE_CHINA.ZHS16GBK"
The character set of the client must be the same as that of the server to correctly display non-Ascii characters of the database. If multiple settings exist,Priority: SQL Function> Alter session> environment variable> registry> parameter file The character set must be consistent, but the language settings can be different. We recommend that you use English for language settings. If the character set is zhs16gbk, The nls_lang can be American_America.zhs16gbk.
Modify database Character Set
The database character set cannot be changed in principle after it is created. Therefore, it is very important to consider which character set to use at the beginning of preliminary planning and installation. For database servers, incorrect Character Set modification may lead to many unpredictable consequences, which may seriously affect the normal operation of the database, therefore, before modification, check whether the two character sets have the relationship between Subsets and supersets. Generally, we do not recommend that you modify the character set of the ORACLE database SERVER unless you have.
You can modify the character set settings of a database in two ways.
1. You usually need to export the database data, recreate the database, and then import the database data for conversion.
2. the alter database character set statement is used to modify the character set. However, the character set that can be modified after the DATABASE is created is restricted. Only when the new character set is the current character set, the character set of the DATABASE can be modified, for example, UTF8 is a superset of US7ASCII. You can use alter database character set UTF8 to modify the character set of a DATABASE.
In particular, the two most commonly used character sets ZHS16GBK and ZHS16CGB231280 do not have a subset or superset relationship. Therefore, in theory, mutual conversion between these two character sets is not supported.
For more information about database subset-superset pairs, see the official documentation, for example, ORACLE 10 GBHttp://docs.oracle.com/cd/B19306_01/server.102/b14225/applocaledata.htm
Table A-11 Subset-Superset Pairs
Subset |
Superset |
AR8ADOS710 |
AR8ADOS710T |
AR8ADOS720 |
AR8ADOS720T |
AR8ADOS720T |
AR8ADOS720 |
AR8APTEC715 |
AR8APTEC715T |
AR8ARABICMACT |
AR8ARABICMAC |
AR8ISO8859P6 |
AR8ASMO708PLUS |
AR8ISO8859P6 |
AR8ASMO8X |
AR8MUSSAD768 |
AR8MUSSAD768T |
AR8MUSSAD768T |
AR8MUSSAD768 |
AR8NAFITHA711 |
AR8NAFITHA711T |
AR8NAFITHA721 |
AR8NAFITHA721T |
AR8SAKHR707 |
AR8SAKHR707T |
AR8SAKHR707T |
AR8SAKHR707 |
BLT8CP921 |
BLT8ISO8859P13 |
BLT8CP921 |
LT8MSWIN921 |
D7DEC |
D7SIEMENS9780X |
D7SIEMENS9780X |
D7DEC |
DK7SIEMENS9780X |
N7SIEMENS9780X |
I7DEC |
I7SIEMENS9780X |
I7SIEMENS9780X |
IW8EBCDIC424 |
IW8EBCDIC424 |
IW8EBCDIC1086 |
KO16KSC5601 |
KO16MSWIN949 |
LT8MSWIN921 |
BLT8ISO8859P13 |
LT8MSWIN921 |
BLT8CP921 |
N7SIEMENS9780X |
DK7SIEMENS9780X |
US7ASCII |
See Table A-12, "US7ASCII Supersets ". |
UTF8 |
AL32UTF8 |
WE8DEC |
TR8DEC |
WE8DEC |
WE8NCR4970 |
WE8ISO8859P1 |
WE8MSWIN1252 |
WE8ISO8859P9 |
TR8MSWIN1254 |
WE8NCR4970 |
TR8DEC |
WE8NCR4970 |
WE8DEC |
WE8PC850 |
WE8PC858 |
1: SQL>CONN / AS SYSDBA;
2:
3: SQL>SHUTDOWN IMMEDIATE;
4:
5: SQL>STARTUP MOUNT;
6:
7: SQL>ALTER SYSTEM ENABLE RESTRICTED SESSION;
8:
9: SQL>ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
10:
11: SQL>ALTER SYSTEM SET AQ_TM_PROCESSES=0;
12:
13: SQL>ALTER DATABASE OPEN;
-- From subset to parent set
1: SQL>ALTER DATABASE CHARACTER SET ZHS16GBK;
2:
3: -- if it is from parent set to subset, use the INTERNAL_USE parameter to skip over subset detection.
4:
5: SQL>ALTER DATABASE NATIONAL CHARACTER SET UTF8;
6:
7: --SQL>ALTER DATABASE NATIONAL CHARACTER SET INTERNAL_USE UTF8;
8:
9: SQL>SHUTDOWN IMMEDIATE;
10:
11: SQL>STARTUP;
Alter database national character set UTF8;
There may be prompts like ORA-12717: Cannot alter database national character set when NCLOB data exists. You can solve the problem by using alter database national character set INTERNAL_USE UTF8.
References:
Http://blog.csdn.net/jkl_123/article/details/6157379
Http://blog.csdn.net/tianlesoftware/Article/details/4915223