Character Set problems during Oracle dmp import/export and Database Operations

Last Update:2018-06-04 Source: Internet

Author: User

Tags types of functions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Character Set during Oracle dmp import/export and Database Operations Original article title: Oracle _ Character Set Problem (association between database and client Character Set) Author: shm description: this article mainly records the character set issues of Oracle databases, and also involves CentOS or Windows

Character Set during Oracle dmp import/export and Database Operations Original article title: Oracle _ Character Set Problem (association between database and client Character Set) // Author: shm // Description: This article mainly records the Character Set Problem of the Oracle database, also involves CentOS or Windows

Character Set problems during Oracle dmp import/export and Database Operations

Original article title: Oracle _ character set (association between database and client Character Set)

// Time: 2013-07-07

// Author: shm

// Description: This article mainly records the Character Set Problem of the Oracle database, and also involves the association between the CentOS or Windows character set as the server operating system and the Oracle character set.

Oracle character set. This problem is raised for two reasons: first, a DMP file needs to be recovered to the database during work, and the DMP file character set is US7ASCII, the second reason is that you have been learning CentOS. You can successfully install Oracle on this system, but it is a bit too big to solve problems such as Chinese and English system character sets. So I turned back to the Book of Guoqiang and watched the video of Shangguan. Finally, I wrote this article, which is a note.

Character sets in Oracle databases are not a big problem, but they are also a headache. This is because there are three reasons: First, Oracle databases cannot be changed after the character set is specified during installation, and second, Character Set problems involve access between servers and clients, third, Oracle database migration will also be very related to character sets.

First, we need to clarify the Oracle Character Set related issues. First, we need to clarify the architecture during database operation and the character set settings in this architecture and the associations between these settings.

First draw a picture to see:

In this figure, we separate the server from the client to illustrate the problem. The client uses an application such as sqlplus or PL/SQL to connect to the server.

The server has two character sets: the server operating system character set (4) and the server database character set (1 );

The client has a character set: the character set of the client operating system (2 );

The client has a parameter: operating system parameter NLS_LANG (3 ).

Among the three character sets and one parameter, one character set does not affect the operation of the entire architecture. It is the server operating system character set (4 ), so this character set will not appear in our discussion.

Why is this Server OS Character Set useless? This is because Oracle performs character set validation and transcoding with the client during character access by the Oracle database itself, without the help of the server where the Oracle database is located. Here is an example of how it works.

For example, if there is a table in the Oracle database, use the following statement to create it:

Create table test (name varchar2 (10 ));

To illustrate the problem, assume that there is an environment where the character set of the Oracle database on the server is UTF8, the operating system character set on the client is ZHS16GBK, And the NLS_LANG parameter on the client is set to ZHS16GBK.

Then, the following command is issued from the client application (such as sqlplus:

Insert into test (name) values ('China ');

First, there is a string "China". The client operating system uses ZHS16GBK to encode it, such as compiling it into "167219" and handing it over to the sqlplus program, then, send it to the Oracle database.

Then, the Oracle database received a series of code "167219", instead of simply throwing it into the database. It asked the client operating system: "What is the format of the code you gave me?" How does the client operating system respond? It will answer the following question: "Please refer to the parameter NLS_LANG for encoding format ". From the Oracle database perspective, NLS_LANG = 'zhs16gbk'. This encoding format is different from the Oracle database's own encoding format "UTF8". Then, the Oracle database gives full play to its expertise. Why? Because the Oracle database has its own encoding table, and it is not one but many encoding tables, it can translate and transcode the encoding according to the encoding table. This is like Oracle database is a translation. It can be used in several languages. In this case, Oracle will take the code "167219" and query the ZHS16GBK encoding table based on the NLS_LANG parameter to find the character "China" corresponding to the code ", then, go to the UTF8 encoding table and check the encoding corresponding to "China". For example, the result is "3224678 ".

Finally, store the transcoding code "3224678" in the Oracle database.

To further illustrate the problem, we can execute another statement:

Select name from test;

First, the Oracle database extracts a string of code "3224678" from the database ".

Then, the Oracle database does not directly hand over the code string to the sqlplus program. It will ask another question: "I have extracted the code string. It is in UTF8 encoding format. Could you tell me sqlplus, what encoding format do you want? ", Sqlplus will still tell the Oracle database very quickly: "Please refer to the parameter NLS_LANG for encoding format ". From the Oracle database, ZHS16GBK is different from UTF8. Therefore, first check the UTF8 encoding table, find the character "China" corresponding to the encoding "3224678", and then check the ZHS16GBK encoding table, find the encoding "167219" corresponding to "China", and then hand over the final encoding "167219" to the sqlplus program.

Finally, sqlplus directly throws the encoded string to the client operating system, and the operating system only has the ZHS16GBK encoding table. It does not ask what format the string encoding is, it will only go directly to the ZHS16GBK encoding table to check the character "167219" and display it to the application. The result is "China ".

The above is a complete experiment that is encoded from the client and transcoded to the database through the Oracle database, and then retrieved from the database and transcoded to the client.

From the above process, we can draw the following conclusions:

1. The functions of Oracle Database access are client operating system character set, client operating system parameter NLS_LANG, and server database character set.

2. The server operating system character set does not work for Oracle databases.

3. The client operating system has only one encoding table, which corresponds to the client character set.

4. There is only one character set in the Oracle database, which is fixed and will not be changed.

5. There is only one encoding format for strings stored in the Oracle database. It is the encoding format corresponding to the character set of the database.

6. the Oracle database has many encoding tables, which can convert the encoding of Other encoding formats into the format specified by the character set of the database during data storage, during retrieval, the format specified by the database character set is converted to another encoding format.

7. transcoding in the entire architecture only occurs on the boundaries of the Oracle database, and is not found elsewhere.

8. Oracle compares the client operating system parameter NLS_LANG with its own character set to determine whether transcoding is required.

The most important conclusion is:

9. How do I select character sets for Oracle databases? There is only one principle, that is, this character set should contain the data characters that can be stored in the database during operation. Generally, we choose ZHS16GBK as a Chinese character. If you want to re-enter it, select AL32UTF8.

10. What character set does the server operating system choose? This character set has nothing to do with the database character set. It is only related to who? Operating System Administrator! Therefore, the system administrator selects only what the system administrator wants.

11. What character set does the client operating system choose? I am a Chinese user and I use a Chinese operating system, so I chose ZHS16GBK. ZHS16GBK is recommended for all Chinese users.

12. How do I set the client operating system parameter NLS_LANG? There is only one setting method, which is the same as the operating system character set. Otherwise something goes wrong ......

The most important sentence:

The best and least vulnerable character set errors are the same settings of the database character set, client character set, and client operating system NLS_LANG parameters.

In addition, record the Characters Related to the EXP and IMP processes.

When EXP is used, the character set of the Oracle database and the client operating system parameter NLS_LANG are used. At this time, the character set of the server and the client operating system does not work. If the client operating system parameter NLS_LANG is the same as the character set of the Oracle database, it will be exported directly without transcoding, and the character set of the exported file is the same as that of the preceding two items; if the client operating system parameter NLS_LANG is different from the character set of the Oracle database, the Oracle database then transcodes the data file from the character set encoding format of the Oracle database to the encoding format specified by the operating system parameter NLS_LANG on the client. To sum up, the character set format of the exported file must be the same as that of the operating system parameter NLS_LANG of the exported client.

In IMP, there are two types of functions: one is the character set specified by the second and third bytes of the DMP file, and the other is the character set of the Oracle database. If the two are identical, transcoding is not required. If the two are different, they are converted to the encoding format specified by the Oracle database character set.

At last, I recorded several problems I encountered.

1. I tested installing Oracle11gR2 on CentOS some time ago, when I set up the "zh_CN.UTF-8" in the CentOS Character Set and installed the Chinese font, at that time I could really get the results I wanted, that is: the character set of the Oracle database I installed is the Chinese Character Set ZHS16GBK. Why? Because the character set of the Oracle database is based on the operating system's character set by default, and I also select its default character set. So there is no error.

But now I know that the CentOS character set as the server has no impact on the Oracle database, so now let me go back and select what character set it is, and I will select the en_US.UTF-8, even en_US.US7ASCII. Why? Because it is difficult to display Chinese confirmation on the shell interface, you should use English to manage CentOS, which is convenient and has no impact on the database. Let it go.

2. When installing the Chinese Character Set oracle Database in an English operating system, be sure to select the database character set slowly and carefully select a ZHS16GBK or AL32UTF8.

3. The DMP file is a US7ASCII character set. How can I import it to the database where the character set is ZHS16GBK? Step 1: Install a database with the US7ASCII character set (such as 9i); Step 2: import the DMP file to the database; Step 3: Set and export the operating system parameter NLS_LANG = ZHS16GBK of the client, and then export it; step 4: import the exported DMP file to the database of ZHS16GBK. Theoretically successful. Lab tests are required.

4. It has been said that the character set of the database cannot be changed after the database is installed. So what if the leader says it must be changed? For example, if the original character set is ZHS16GBK and it has to be converted to UTF8, is there any way? The answer is yes, but it may not be all successful. Here is a concept of strict superset, which is not mentioned in this article. The answer is: Set the operating system parameter of the export client to UTF8 and then export it. Here, the data encoding format will be transcoded from ZHS16GBK to UTF8, and then delete the database of ZHS16GBK, create a new UTF8 database and then import it.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More