first, what is
Oracle
Character Set
The Oracle character set is a collection of symbols that interpret a byte of data, have a size, and have a mutual containment relationship. ORACLE's support for national language architectures allows you to store, process, and retrieve data using localized languages. It enables database Tools, error messages, sort order, date, time, currency, numbers, and calendars to automatically adapt to localized languages and platforms.
The most important parameter that affects the Oracle database character set is the Nls_lang parameter.
Its format is as follows: Nls_lang = Language_territory.charset
It has three components (language, geography, and character set), each of which controls the characteristics of the NLS subset.
which
Language: Specifies the language of the server message, whether the effect prompt is Chinese or English
Territory: Specifies the date and number format of the server,
Charset: Specifies the character set.
such as: AMERICAN _ AMERICA. Zhs16gbk
From the composition of Nls_lang we can see that the real impact of the database character set is actually the third part.
So the character set between the two databases can import and export data to each other as long as the third part, the only thing that affects the message is the Chinese or English.
Two Related knowledge of character sets:
2.1 Character Set
In essence, according to a certain character coding scheme, a set of specific symbols are assigned to different numerical codes. The earliest supported encoding scheme for Oracle databases is US7ASCII.
Oracle's character set naming follows the following naming conventions:
<language><bit size><encoding>
That is: < language >< bits >< encoding >
For example: ZHS16GBK in GBK encoded format, 16-bit (two-byte) Simplified Chinese character set
2.2 character encoding scheme
2.2.1 Single-byte encoding
(1) Single byte 7-bit character set, can define 128 characters, most commonly used character set is Us7ascii
(2) Single byte 8-bit character set, can be defined 256 characters, suitable for most European countries
Example: WE8ISO8859P1 (Western Europe, 8-bit, ISO standard 8859P1 encoding)
2.2.2 multibyte Encoding
(1) variable-length multibyte-encoding
Some characters are represented by a single byte, Other characters are represented by two or more characters, and variable-length multibyte encoding is often used for support of Asian languages, such as Japanese, Chinese, Hindi, etc.
For example: Al32utf8 (where Al stands for all, refers to all languages), zhs16cgb231280
(2) fixed-length multibyte encoding
Each character uses a fixed-length byte encoding scheme, and currently Oracle's only supported fixed-size multibyte encoding is AF16UTF16, which is also used only for the national character set
2.2.3 Unicode encoding
Unicode is a single encoding scheme that covers all known characters currently used worldwide, i.e. Unicode provides a unique encoding for each character. UTF-16 is a Unicode 16-bit encoding, is a fixed-length multibyte encoding with 2 bytes representing a Unicode character, and AF16UTF16 is a UTF-16 encoded character set.
UTF-8 is a Unicode 8-bit encoding, which is a variable-length multibyte encoding that can represent a Unicode character in 1, 2, 3 bytes, Al32utf8,utf8, Utfe is a UTF-8 coded character set
2.3 Character Set Super
When the encoded value of a character set (character set a) contains the encoded value of all another character set (character set B), and the same encoded value for both character sets represents the same character, the character set A is the super of character set B, or character set B is a subset of character set A.
Oracle8i and Oracle9i have a subset-super table (Subset-superset pairs) in the official documentation, for example: WE8ISO8859P1 is a subset of we8mswin1252. Because US7ASCII is the oldest Oracle database encoding format, there are many character sets that are us7ascii, such as WE8ISO8859P1, zhs16cgb231280, and ZHS16GBK are superset of US7ASCII.
2.4 database Character Set ( Oracle server-side character set)
The database character set is specified when the database is created and cannot normally be changed after it is created. When you create a database, you can specify a character set (CHARACTER set) and a national character set (Nation CHARACTER set).
2.4.1 Character Set
(1) used to store char, VARCHAR2, CLOB, long and other types of data
(2) to indicate such as table name, column name, and PL/SQL variable, etc.
(3) used to store SQL and PL + + program units, etc.
2.4.2 National Character Set:
(1) to store nchar, NVARCHAR2, NCLOB and other types of data
(2) The national character set is essentially the additional character set chosen for Oracle, primarily to enhance the character processing capabilities of Oracle, since the nchar data type provides support for the use of fixed-length multibyte encoding in Asia, and the database character set is not. The national character set is redefined in oracle9i and can only be selected in Af16utf16 and UTF8 in Unicode encoding, and the default value is Af16utf16
2.4.3 Query Character Set parameters
You can query the following data dictionary or view to view the set of character sets
Nls_database_parameters, props$, v$nls_parameters
Nls_characterset represents the character set in the query results, Nls_nchar_characterset represents the national character set
2.4.4 Modifying the database character set
As stated above, the database character set cannot be changed in principle after it is created. But there are 2 ways to do it.
1. If you need to modify the character set, you typically need to export the database data, rebuild the database, and then import the database data in a way that transforms.
2. Modify the character set through the ALTER DATABASE CHARACTER SET statement, but there is a limit to modifying the character set after the database is created, and only if the new character set is a superset of the current character set can the database character set be modified, for example UTF8 is a superset of us7ascii. Modifying the database character set can be used with the ALTER DB CHARACTER set UTF8.
2.5 Client Character Set ( Nls_lang parameters)
2.5.1 Client Character Set meaning
The client character set defines how the client character data is encoded, and any character data sent from or to the client is encoded using a client-defined character set, and the client can be seen as a variety of applications that can be directly connected to the database, such as Sqlplus,exp/imp. The client character set is set by setting the Nls_lang parameter.
2.5.2 Nls_lang parameter format
Nls_lang=<language>_<territory>.<client character set>
Language: Display Oracle message, checksum, date name
Territory: Specify default date, number, currency, and other formats
Client Character Set: Specifies the character set that clients will use
For example: Nls_lang=american_america. Us7ascii
American is the language, America is the region, Us7ascii is the client character set
2.5.3 Client Character Set setting method
1) UNIX Environment
$NLS _lang= "Simplified Chinese" _china.zhs16gbk
$export Nls_lang
Edit the profile file for an Oracle user
2) Windows environment
Edit the Registry
Regedit.exe---"HKEY_LOCAL_MACHINE---" Software---"oracle-home
2.5.4 NLS parameter query
Oracle provides several NLS parameters to customize the database and the user's computer to fit the local format, such as Nls_language,nls_date_format,nls_calender, which can be viewed by querying the following data dictionary or v$ view.
Nls_database_parameters: Displays database current NLS parameter values, including database character set values
Nls_session_parameters: Displays the parameters set by Nls_lang or the parameter values after alter SESSION (excluding client character sets set by Nls_lang)
Nls_instance_paramete: Displays parameters defined by the parameter file Init<sid>.ora
V$nls_parameters: Display database current NLS parameter value
2.5.5 Modifying NLS parameters
You can modify the NLS parameters by using the following methods
(1) Modify the initialization parameter file used when the instance starts
(2) Modify the environment variable Nls_lang
(3) using the ALTER SESSION statement to modify the Oracle session
(4) use some SQL functions
NLS Priority level: SQL function > Alter SESSION > Environment variable or Registry > parameter file > database default parameters
three. exp/imp and character sets
3.1 Exp/imp
Export and Import are a pair of tools to read and write Oracle data. Export outputs the data from the Oracle database to the operating system files, Import reads the data from these files into the Oracle database, and since data migration using EXP/IMP, there are four links to the character set in the process of data from the source database to the target database. If the character set of these four links is inconsistent, character set conversions will occur.
BX7
____________ _________________ _____________
|imp import file |<-| environment variable nls_lang|<-| database character Set |
------------ ----------------- -------------
IMP
____________ _________________ _____________
|imp import file |->| environment variable nls_lang|->| database character Set |
------------ ----------------- -------------
Four character sets are
(1) Source database character set
(2) User session character set (via Nls_lang setting) in export process
(3) User session character set (via Nls_lang) in import process
(4) Target database character set
3.2 Exported conversion process
in the export process, if the source database character set is inconsistent with the export user session character set, a character set conversion occurs. and stores the ID number of the export user session character set in several bytes of the header of the exported file. The loss of data may occur during this conversion.
Example: If the source database uses ZHS16GBK, and the export user session character set uses US7ASCII, because ZHS16GBK is a 16-bit character set, and Us7ascii is a 7-bit character set, the conversion process Chinese characters are not able to find the equivalent character in the Us7ascii, so all Chinese words will be lost and become "?? "Form, the resulting DMP file after conversion has already occurred with data loss.
Therefore, if you want to export the source database data correctly, the user session character set in the export process should be equal to the source database character set or the superset of the source database character set
3.3 Import conversion process
(1) Determine the export database character set environment
by reading the header of the export file, you can get the character set of the exported file
(2) determine the character set of the import session, that is, the NLS_LANG environment variable used by the import session
(3) Imp read export file
read the export file character set ID, and compare the Nls_lang of the import process
(4) If the export file character set is the same as the import session character set, So in this step there is no need to convert, if different, You need to convert the data to the character set used by the import session. As you can see, two character set conversions occur during the import of data into the database
First: The import file character set and the import session used by the character set between the conversion, if the conversion process is not completed correctly, import to the target database of the importing process can not be completed.
Second: Imports the transition between the session character set and the database character set.
Four . View the database character set
Involves a three-part character set,
1. Oracel Server -side character set;
2. The character set of the Oracle client side;
3. The character set of the DMP file.
When doing data import, these three character sets are required to be imported in a correct way.
4.1 Querying the character set of the Oracle server side
There are many ways to identify the Oracle server-side character set, and the more intuitive query method is the following:
Sql> Select Userenv (' language ') from dual;
USERENV (' LANGUAGE ')
----------------------------------------------------
Simplified Chinese_china. Zhs16gbk
Sql>select userenv (' language ') from dual;
AMERICAN _ AMERICA. Zhs16gbk
4.2 How to query the character set of a DMP file
The DMP file exported with the Oracle Exp tool also contains character set information, and the 2nd and 3rd bytes of the DMP file record the character set of the DMP file. If the DMP file is small, such as only a few m or dozens of m, you can open it with UltraEdit (16 binary), look at the 2nd 3rd byte of content, such as 0354, and then use the following SQL to isolate its corresponding character set:
Sql> Select Nls_charset_name (To_number (' 0354 ', ' xxxx ')) from dual;
Zhs16gbk
If the DMP file is large, such as more than 2G (which is also the most common case), with a text editor opened very slowly or completely open, you can use the following command (on the UNIX host):
Cat Exp.dmp |od-x|head-1|awk ' {print $ |cut-c} ' 3-6
The corresponding character set can then be obtained using the SQL above.
4.3 Querying the character set of the Oracle client side
Under the Windows platform, it is the Nls_lang of the corresponding oraclehome in the registry. You can also set it in the DOS window itself,
For example: Set Nls_lang=american_america. Zhs16gbk
This will only affect the environment variables in this window.
Under the UNIX platform, it is the environment variable Nls_lang.
$echo $NLS _lang
American_america. Zhs16gbk
If the result of the check finds that the server side is inconsistent with the client-side character set, uniformly modify the same character set as the server side.
Add:
(1). Database server Character Set
SELECT * FROM Nls_database_parameters
From Props$, is the character set that represents the database.
(2). Client Character Set Environment
SELECT * FROM Nls_instance_parameters
It originates from V$parameter, which represents the setting of the client's character set, which may be a parameter file, an environment variable, or a registry
(3). Session Character Set Environment
SELECT * FROM Nls_session_parameters
From V$nls_parameters, which indicates the session's own settings, may be the session environment variable or alter session completion, if the session has no special settings, will be consistent with nls_instance_parameters.
(4). The client's character set requires consistency with the server to correctly display non-ASCII characters for the database.
If multiple settings exist, NLS priority level: SQL function > Alter SESSION > Environment variable or Registry > parameter file > database default parameters
Character set requirements are consistent, but language settings can be different, language settings are recommended in English. If the character set is ZHS16GBK, then Nls_lang can be AMERICAN_AMERICA.ZHS16GBK.
Five. Modifying the character set of Oracle
As stated above, the database character set cannot be changed in principle after it is created. Therefore, it is important to consider which character set to use at the beginning of design and installation. For database server, the incorrect modification of the character set will result in a lot of unpredictable consequences that can seriously affect the normal functioning of the database, so be sure to verify that there are subsets and superset relationships between the two character sets before you modify them. In general, we do not recommend modifying the character set of the Oracle database server side unless it is a last resort. In particular, there is no subset and superset relationship between the two character sets ZHS16GBK and zhs16cgb231280 that we use most often, so it is theoretically not supported to convert between the two character sets.
However, there are 2 ways to modify the character set.
1. It is often necessary to export database data, rebuild the database, and then import the database data in a way that transforms.
2. Modify the character set through the ALTER DATABASE CHARACTER SET statement, but there is a limit to modifying the character set after the database is created, and only if the new character set is a superset of the current character set can the database character set be modified, for example UTF8 is a superset of us7ascii. Modifying the database character set can be used with the ALTER DB CHARACTER set UTF8.
5.1 Modify Server End Character Set ( not recommended )
1. Close the database
Sql>shutdown IMMEDIATE
2. Boot to mount
Sql>startup MOUNT;
Sql>alter SYSTEM ENABLE RESTRICTED SESSION;
Sql>alter SYSTEM SET job_queue_processes=0;
Sql>alter SYSTEM SET aq_tm_processes=0;
Sql>alter DATABASE OPEN;
--Here you can get from the parent set to the subset
Sql>alter DATABASE CHARACTER SET ZHS16GBK;
Sql>alter DATABASE National CHARACTER SET ZHS16GBK;
--If you are from a subset to a parent set, you need to use the Internal_use parameter to skip the hyper-subset detection
Sql>alter DATABASE CHARACTER SET internal_use Al32utf8;
Sql>alter DATABASE National CHARACTER SET internal_use Al32utf8;
Sql>shutdown IMMEDIATE;
Sql>startup
Note: If there are no large objects, there is no effect on language conversion during use (remember that the set must be supported by Oracle, otherwise you cannot start) as you have done.
If the ' Ora-12717:cannot ALTER DATABASE national CHARACTER SET when NCLOB data exists ' message appears,
There are two ways to solve this problem
1. Use the Internal_use keyword to modify the regional settings,
2. Use re-create, but re-create is a bit complicated, so please use Internal_use
Sql>shutdown IMMEDIATE;
Sql>startup MOUNT EXCLUSIVE;
Sql>alter SYSTEM ENABLE RESTRICTED SESSION;
Sql>alter SYSTEM SET job_queue_processes=0;
Sql>alter SYSTEM SET aq_tm_processes=0;
Sql>alter DATABASE OPEN;
Sql>alter DATABASE National CHARACTER SET internal_use UTF8;
Sql>shutdown immediate;
sql>startup;
If you do this, the national charset locale is fine.
5.2 Modify DMP file Character Set
As mentioned above, the 2nd 3rd byte of the DMP file records the character set information, so the content of the 2nd 3rd byte of the DMP file can be ' deceived ' by the Oracle check. This is done theoretically only from subset to superset can be modified, but in many cases there is no subset and superset of the situation can also be modified, some of our commonly used character sets, such as US7ASCII,WE8ISO8859P1,ZHS16CGB231280,ZHS16GBK basic can be changed. Because the change is only the DMP file, so the impact is not small.
The specific modification method is more, the simplest is to modify the DMP file's 2nd and 3rd bytes directly with UltraEdit.
For example, to change the DMP file character set to ZHS16GBK, you can use the following SQL to isolate the character set corresponding to the 16 code: sql> Select To_char (nls_charset_id (' ZHS16GBK '), ' xxxx ') from Dual
0354
Then change the DMP file 2, 3 bytes to 0354.
If the DMP file is large, you cannot open it with your UE, you need to use the method of the program.
5.3 Client Character Set setting method
1) UNIX Environment
$NLS _lang= "Simplified Chinese" _china.zhs16gbk
$export Nls_lang
Edit the profile file for an Oracle user
2) Windows environment
Edit the Registry
Regedit.exe---"HKEY_LOCAL_MACHINE---" Software---"oracle-home
Or in the window settings:
Set Nls_lang=american_america. Zhs16gbk
Viewing and modifying the Oracle character set