Many people have asked me some questions about Oracle character sets, such as data migration in different databases and data exchange between different systems, because of the Oracle character set, the Migration fails or the data in the database becomes garbled.
Now, I will summarize some knowledge about oracle character sets and hope to help you in your future work.
1. What is the oracle character set?
The Oracle character set is a collection of symbols for the interpretation of byte data. It can be divided into different sizes and have an inclusive relationship. ORACLE supports the national language architecture, allowing you to store, process, and retrieve data in a localized language. It makes database tools, error messages, sorting order, date, time, currency, numbers, and calendar automatically adapt to localization languages and platforms.
The most important parameter that affects the character set of oracle databases is the NLS_LANG parameter. The format is as follows:
NLS_LANG = language_territory.charset
It has three components (language, region, and Character Set), each of which controls the NLS subset features. Where:
Language specifies the Language of the server message, territory specifies the date and number format of the server, and charset specifies the character set. For example: AMERICAN _ AMERICA. ZHS16GBK
From the composition of NLS_LANG, we can see that the real impact on the database character set is actually the third part. Therefore, if the character set between the two databases is the same as that in the third part, data can be imported and exported to each other. The preceding information is only prompted in Chinese or English.
Ii. How to query Oracle character sets
Many people have encountered data import failures due to different character sets. This involves three character sets: Oracle character set on the El server side, oracle client character set, and dmp file character set. During data import, the three character sets must be consistent before the data can be correctly imported.
1. query character sets of oracle server
There are many ways to find out the character set of the oracle server. The intuitive query method is as follows: SQL> select userenv ('language') from dual;
The results are as follows:
- AMERICAN _ AMERICA. ZHS16GBK
2. How to query the dmp file Character Set
The dmp file exported using oracle's exp tool also contains character set information. The 2nd and 3rd bytes of the dmp file record the character set of the dmp file. If the dmp file is not large, for example, only a few MB or dozens of MB, you can use UltraEdit to open it (in hexadecimal mode) and view the content of 2nd 3rd bytes, such as 0354, then, use the following SQL statement to find the corresponding character set:
- SQL> select nls_charset_name(to_number('0354','xxxx')) from dual;
- ZHS16GBK
If the dmp file is large, for example, 2 GB or above (this is also the most common case), you can use the following command (on a unix host) to open it slowly or completely ):
- cat exp.dmp |od -x|head -1|awk '{print $2 $3}'|cut -c 3-6
Then, you can use the preceding SQL statement to obtain the corresponding Oracle character set.
3. query the character set of the oracle client
This is relatively simple. On windows, it is the NLS_LANG of OracleHome in the registry. You can also set it in the dos window, for example:
- set nls_lang=AMERICAN_AMERICA.ZHS16GBK
In this way, only the environment variables in this window are affected.
On unix platforms, the environment variable NLS_LANG is used.
- $echo $NLS_LANG
- AMERICAN_AMERICA.ZHS16GBK
If the check result shows that the character sets on the server and client are inconsistent, change them to the same character set on the server.
3. Modify the character set of oracle
As mentioned above, oracle character sets have an inclusive relationship. For example, us7ascii is a subset of zhs16gbk. From us7ascii to zhs16gbk, there will be no data interpretation problems or data loss. Utf8 should be the largest among all character sets because it is based on unicode and stores double-byte characters (so it occupies more space ).
Once a database is created, the character set of the database cannot be changed theoretically. Therefore, it is important to consider which character set to use at the beginning of design and installation. According to the official instructions of Oracle, Character Set conversion is from a subset to a superset, but not vice versa. If there is no relationship between Subsets and supersets between the two character sets, Character Set conversion is not supported by oracle.
For database servers, incorrect Character Set modification may lead to many unpredictable consequences, which may seriously affect the normal operation of the database, therefore, before modification, you must confirm whether the two Oracle character sets have the relationship between Subsets and supersets. Generally, we do not recommend that you modify the character set of the oracle database server unless you have.
In particular, the two most commonly used character sets ZHS16GBK and ZHS16CGB231280 do not have a subset or superset relationship. Therefore, in theory, mutual conversion between these two character sets is not supported.
1. Modify the server character set (not recommended)
Before oracle 8, you can directly modify the data dictionary table props $ to change the character set of the database. However, after oracle8, at least three system tables record the information of the database character set. modifying only the props $ table is incomplete and may cause serious consequences. The correct modification method is as follows:
- $sqlplus /nolog
- SQL>conn / as sysdba;
If the database server has been started, run the shutdown immediate command to shut down the database server, and then run the following command:
- SQL>STARTUP MOUNT;
- SQL>ALTER SYSTEM ENABLE RESTRICTED SESSION;
- SQL>ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
- SQL>ALTER SYSTEM SET AQ_TM_PROCESSES=0;
- SQL>ALTER DATABASE OPEN;
- SQL>ALTER DATABASE CHARACTER SET ZHS16GBK;
- SQL>ALTER DATABASE national CHARACTER SET ZHS16GBK;
- SQL>SHUTDOWN IMMEDIATE;
- SQL>STARTUP
2. Modify the dmp file Character Set
As mentioned above, the 2nd 3rd bytes of the dmp file records the character set information. Therefore, you can directly modify the 2nd 3rd bytes of the dmp file to 'Cheat 'the oracle check. Theoretically, this can be modified only from the subset to the superset. However, in many cases, it can be modified without the subset and superset relationships. Some common Oracle character sets are as follows:
For example, you can modify US7ASCII, WE8ISO8859P1, ZHS16CGB231280, and ZHS16GBK. Because only the dmp file is changed, it has little impact.
There are many specific modification methods. The simplest is to directly use UltraEdit to modify the 2nd and 3rd bytes of the dmp file. For example, if you want to change the Oracle Character Set of the dmp file to ZHS16GBK, you can use the following SQL statement to find the hexadecimal code corresponding to the character set:
- SQL> select to_char(nls_charset_id('ZHS16GBK'), 'xxxx') from dual;
- 0354
Modify the 2 and 3 bytes of the dmp file to 0354.
If the dmp file is large and cannot be opened with ue, you need to use the program method. Some people on the internet use java stored procedures to write the Conversion Program (the advantage of using java stored procedures is that the versatility is good, but the disadvantage is relatively troublesome ). I passed the test in windows. However, the JVM option must be installed for oracle databases. If you are interested, you can study the program code.