Character Set issues in Oracle DMP Import/export, database operations, etc.
Opening: Because to locate a garbled problem about DMP file Import, I began a long way to search the Oracle character set, and there are countless articles on the internet about Oracle character set, but this article is really my favorite, illustrated, appropriate examples of easy to understand, It's the best dish for me, but.
This article turns from: http://blog.163.com/jiankun_liu/blog/static/1863927762013698175289/
Original title: Oracle_ Character Set problem (database and client Character Set association)
Time: 2013-07-07
Author: SHM
Description: This article mainly records the character set problem of Oracle database, also relates to the CentOS of the server operating system or the relationship between the character set of Windows and the Oracle character set.
Oracle's character set, this problem is proposed for two reasons: first, the work encountered a DMP file needs to be restored to the database, and the DMP file's character set is Us7ascii, the second reason is to have been learning CentOS, on this system to install Oracle has been able to successfully , but the Chinese English system character set and so on issues a bit big. So again turned over the flip of the country strong book, looked at the video, finally a little experience, so I wrote this article is a note.
The character set problem for Oracle databases is not a big problem, but it is also a headache. This is because there are three reasons: first, the Oracle database in the installation of a good character set after the general can not change, the second is the character set problem involves the server and the client access problem, the third is the Oracle database migration will also be very relevant to the character set.
First, to clarify the issues associated with the Oracle character set, you need to first sort out the schemas in the database runtime and the relationship between the character set and the settings in the schema.
First draw a picture to see:
In this diagram, to illustrate the problem, we separate the server from the client, and the client connects to the service side with applications such as Sqlplus or Pl/sql.
The server has two character sets: Service-side operating system character set (4), server-side database character set (1);
Client has a character set: client operating system Character set (2);
Client has one parameter: Operating system parameter Nls_lang (3).
Of these three character sets and one parameter, there is a character set that has no effect on the operation of the entire schema, it is the service-side operating system character set (4), so this character set will no longer appear in our discussion.
Why is this service-side OS character set useless? This is because the process by which Oracle makes character set recognition and transcoding in the access Word characters and the client is done by the Oracle database itself and does not require help from the server where the Oracle database resides. What exactly is going on is illustrated in the following example.
For example, there is a table in the Oracle database that is created with the following statement:
CREATE TABLE Test (name VARCHAR2 (10));
To illustrate the problem, assume that there is an environment in which the character set of the server-side Oracle database is UTF8, the client operating system character set is ZHS16GBK, and the client Nls_lang parameter is set to ZHS16GBK.
Then send such a command from a client application, such as Sqlplus:
INSERT into test (name) VALUES (' China ');
First, there is a string "China", which the client operating system encodes with ZHS16GBK, such as "167219", and gives it to the Sqlplus program and sends it to the Oracle database.
Then, Oracle database received a string of code "167219", not directly to the database to throw out the end, it asked the client operating system: "I ask you to give me the code is what format coding ah." "How the client operating system responds." It would answer: "The encoding format please refer to the parameter Nls_lang". Oracle database Look, nls_lang= ' ZHS16GBK ', this coding format is not the same as the Oracle database's own coding format "UTF8", and then the Oracle database to play its own niche, why. Because the Oracle database has its own coding table and is not a single sheet of code, it can translate and transcoding code according to the coded table. This is like Oracle database is a translation, it will be several languages, cattle one. Like the above situation, Oracle will "167219" This string of code, according to the parameters Nls_lang check ZHS16GBK Code table, find the corresponding string of code characters "China", and then to the UTF8 code table to check "China" corresponding to the code, such as the results of the search is " 3224678 ".
Finally, the code "3224678" after the transcoding is stored in the Oracle database.
To further illustrate the problem, we will execute another statement:
Select name from test;
First, the Oracle database pulls out a string of code "3224678" from the database.
Then, the Oracle database is not directly to the code to the Sqlplus program, it will ask: "Code string I was taken out, it is UTF8 encoded format, please sqlplus, what you want to encode the format." , Sqlplus will still be very quick to tell the Oracle database: "Continue to reference parameter Nls_lang in the encoding format." Oracle database A look, ZHS16GBK and UTF8 is not the same, so first check UTF8 code table, find the code "3224678" corresponding to the character "China", and then check ZHS16GBK Code table, find "China" corresponding to the code "167219", Then it is the end of the string of code "167219" to the Sqlplus program.
Finally, sqlplus directly to the resulting string of code thrown to the client operating system, and the operating system only ZHS16GBK code table, it will not ask what the string encoding is the format, will only go directly to the ZHS16GBK Code table to check the "167219" corresponding character is what, and give it to the application to show it. The result of this display is "China".
The above is a complete from the client code and through the Oracle database transcoding database, and then removed from the database and transcoding to the client to display the experiment.
From the above process we can draw the following conclusions:
1. For Oracle database access, these are: client operating system character set, client operating system parameter Nls_lang, server-side database character set.
2. The service-side operating system character set is not functional for an Oracle database.
3. The client operating system has only one encoded table, which corresponds to the client character set.
The character set of the 4.Oracle database is only one and fixed, and generally does not change.
5. Strings stored in an Oracle database have only one encoding format, which is the corresponding encoding format for the character set of the database.
The 6.Oracle database has a number of coded tables that convert other coded formats to the format specified in the database character set when the data is stored, and convert the format specified from the database character set to the other encoding format when it is retrieved.
7. The transcoding in the entire architecture occurs only on the Oracle database boundary, not elsewhere.
8.Oracle is based on the client operating system parameters Nls_lang and its own character set to determine whether the need for transcoding.
The most important conclusion came out:
9.Oracle database How to select a character set. There is only one principle, and that is that the character set contains the data characters that can be stored in the database while it is running, usually as a Chinese we choose ZHS16GBK, if we want to insure a little more, choose Al32utf8.
10. What character set the server operating system chooses. This character set has nothing to do with the database character set, only with whom. Operating system administrator. So the principle of choice is that the system administrator chooses what they want.
11. What character set the client operating system chooses. I am Chinese, I use Chinese operating system, so I choose ZHS16GBK. The Chinese are advised to choose ZHS16GBK.
12. Client operating system parameter Nls_lang parameters are set. There is only one way to set this up, which is the same as the operating system character set. Otherwise, there will be problems.
The most important word:
The best and most difficult character set error is the same setting for the database character set, the client character set, and the client operating system Nls_lang parameter three places.
Also record the EXP and IMP process with character set-related things.
Exp, the role of the Oracle database has the character set and client operating system parameters Nls_lang Two, when the server and the client operating system character set does not work. If the client operating system parameter Nls_lang is the same as the Oracle database's character set, it is exported directly without transcoding and the character set of the exported file is the same as the preceding two items; if the client operating system parameter Nls_lang is different from the Oracle database's character set, The Oracle database then converts the data file from the character set encoding format of the Oracle database to the specified encoding format for the client operating system parameter Nls_lang. In a word: The character set format of the exported file must be the same as the export client operating system parameter Nls_lang.
Imp, the work is still two, one is the DMP file, the second third byte specifies the character set, and the other is the character set of the Oracle database. Two are identical without transcoding, and two different to the encoding format specified in the Oracle database character set.
Finally, I have a few questions to record.
1. I have tested the ORACLE11GR2 on CentOS on the previous period, when I set up the CentOS in the character set "ZH_CN." UTF-8 "and installed Chinese fonts, I did get the result I wanted, and that is: the character set of the Oracle database I installed is the Chinese character set ZHS16GBK. Why, because the character set of the Oracle database is based on the character set of the operating system by default, and I choose its default character set. So there's no mistake.
But, but now I know that this CentOS character set as a server has no effect on the Oracle database, so now let me go back and choose what character set it is, and I'll choose en_US. UTF-8, even en_US. Us7ascii. Why, because in the shell interface display Chinese confirmation is a difficult problem, so the management of CentOS, or in English, it is more convenient and no impact on the database. Let it go.
2. When installing the Chinese character set Oracle database in the English operating system, be careful to choose a zhs16gbk or Al32utf8 when choosing the database character set.
3.DMP file is the Us7ascii character set, to import it into the ZHS16GBK database and how to operate it. Step one: Install a database of the US7ASCII character set (for example, 9i); the second step is to import the DMP file into the database; The third step is to set the export client operating system parameters NLS_LANG=ZHS16GBK, and then export; step Fourth, Importing the later exported DMP file into the character set is a ZHS16GBK database. Theoretically successful. Need to do experimental tests.
4. It has been said that in general, the character set of the database cannot be changed after the database is properly installed. What if the leader says he must change? For example, the original character set is ZHS16GBK, must let turn into UTF8, there is no way. The answer is yes, but not necessarily all, there is a strict superset of the concept, the concept in this article does not say. The answer is to set the export client operating system parameters to UTF8, and then export, here, the data encoding format from ZHS16GBK to UTF8, and then delete the ZHS16GBK database, a new UTF8 database, and then import it.