How to change the Oracle character set to avoid garbled characters

Source: Internet
Author: User

The most common Chinese Character Set ZHS16GBK (GBK 16-bit Simplified Chinese) supports traditional Chinese and stores one Chinese Character in length. The UTF8 character set is stored in multiple bytes. One Chinese character (simplified or traditional) is sometimes stored in three characters in length.
Oracle supports Character Set changes, but UTF8 is the largest character set in Oracle, that is, UTF8 is the strict superset of ZHS16GBK.
Oracle allows the conversion from a subset to a superset, but does not allow the conversion from a subset to a subset. Generally, we recommend that you use dbca to delete the original database, re-create the database, select the correct character set, and import the data to the backup.
My solution is to back up data first, then forcibly convert the character set from UTF8 to ZHS16GBK, and then import the backup data. If not, create a new database, set the character set ZHS16GBK, and import the backup data. If this is not the case, you can change the character set from ZHS16GBK to UTF8 (which is safe), import the backup data, and restore it to the original state. In this way, the trouble of re-building a database may be avoided.

1. Back up data of all users in the database
Log On As an oracle user and run the following command:
# Export NLS_LANG = "SIMPLIFIED CHINESE_CHINA.UTF8"
It must be consistent with the database server, so that there will be no character conversion during exp export, and the original data will be backed up.
2. Evaluate the risk of converting UTF8 to ZHS16GBK
Before conversion, use the csscan tool of Oracle to scan the database and evaluate the possible damage to data before and after Character Set conversion. If the evaluation is poor, you must give up.
First, install a set of tables and processes for CSMIG users. Log on to UNIX as an oracle user,
# Sqlplus "/as sysdab"
SQL> @ $ ORACLE_HOME/rdbms/admin/csminst. SQL
SQL> exit
# $ ORACLE_HOME \ bin \ csscan-help
You can better understand how to use csscan.
# $ ORACLE_HOME/bin/csscan system/Sunday user = mmsc FROMCHAR = UTF8 TOCHAR = ZHS16GBK ARRAY = 102400 PROCESS = 3> csscan. log
The preceding command is used to scan all data in the User: mmsc, and change the conversion from UTF8 to ZHS16GBK. Then, three files are obtained: scan.txt, scan. out, And scan. err.
View scan. out, scan. err. We can see that all data under the mmsc user can be converted, and there is no conversion of "predictional", so you can be more assured.
3. Change the database character set to ZHS16GBK.
As mentioned above, the command "Alter Database Characeter Set XXXX" is used to convert from superset to subset, which is not allowed in Oracle. However, this command provides the following command methods:
Alter Database Character Set INTERNAL_CONVERT/INTERNAL_USE XXXX
This is a non-public Oracle command. "When using this command, Oracle will skip the check of all subsets and supersets and perform forced conversion between any character sets. Therefore, you must be very careful when using this command, you must be aware of the risks brought by this operation ".
Log on to UNIX as an oracle user,
# Sqlplus "/as sysdba"
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP MOUNT;
SQL> alter session set SQL _TRACE = TRUE;
SQL> ALTER SYSTEM ENABLE RESTRICTED SESSION;
SQL> ALTER SYSTEM SET JOB_QUEUE_PROCESSES = 0;
SQL> ALTER SYSTEM SET AQ_TM_PROCESSES = 0;
SQL> ALTER DATABASE OPEN;
SQL> ALTER DATABASE CHARACTER SET ZHS16GBK;
// If the "INTERNAL_USE" parameter is not used, an error is prompted:
// ERROR at line 1:
/ORA-12712: new character set must be a superset of old character set
SQL> alter session set SQL _TRACE = FALSE;
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP;
Check whether the character set of the database has been changed.
SQL> select value $ from props $ where name = 'nls _ CHARACTERSET ';
VALUE $
-----------------
ZHS16GBK
Check whether the Simplified Chinese and Traditional Chinese in the database are normal and no garbled characters are displayed.
SQL> select spid, spname, spshortname from spinfovisual_hk
......
Unfortunately, I saw a bunch of garbled characters, which proves that Oracle does not support Character Set changes from superset to subset. At that time, I was very nervous and was afraid of failure, so that I could restore to the original state.
However, according to the previous verification, it is OK to import the backup under UTF8 to ZHS16GBK, so continue to try again.
4. Import backup user data
Log on to UNIX as an oracle user and delete the user mmsc in the database first:
# Sqlplus "/as sysdba"
SQL> drop user mmsc cascade;
SQL> exit
Run createuser. SQL to generate the mmsc user.
Then, use the original backup file to import it to the mmsc User:
Note: Set NLS_LANG to be consistent with the current database: ZHS16GBK. In this way, the NLS_LANG of the user session during export is UTF8, which is consistent with the original database character set. The current value is ZHS16GBK, which is consistent with the database character set at this time. In this way, the characters will be converted during the import.
# Export NLS_LANG = "SIMPLIFIED CHINESE_CHINA.ZHS16GBK"
# Imp mmsc/mmsc @ mdspdb file = DSMPD113_user_mmsc.dmp ignore = y fromuser = mmsc touser = mmsc
Check the simplified and traditional Chinese characters in the database. Haha, no garbled characters are displayed. Everything is normal.
Then the verification proves that one Chinese Character occupies only two characters in length. Problem solved!

The most common Chinese Character Set ZHS16GBK (GBK 16-bit Simplified Chinese) supports traditional Chinese and stores one Chinese Character in length. The UTF8 character set is stored in multiple bytes. One Chinese character (simplified or traditional) is sometimes stored in three characters in length.
Oracle supports Character Set changes, but UTF8 is the largest character set in Oracle, that is, UTF8 is the strict superset of ZHS16GBK.
Oracle allows the conversion from a subset to a superset, but does not allow the conversion from a subset to a subset. Generally, we recommend that you use dbca to delete the original database, re-create the database, select the correct character set, and import the data to the backup.
My solution is to back up data first, then forcibly convert the character set from UTF8 to ZHS16GBK, and then import the backup data. If not, create a new database, set the character set ZHS16GBK, and import the backup data. If this is not the case, you can change the character set from ZHS16GBK to UTF8 (which is safe), import the backup data, and restore it to the original state. In this way, the trouble of re-building a database may be avoided.

1. Back up data of all users in the database
Log On As an oracle user and run the following command:
# Export NLS_LANG = "SIMPLIFIED CHINESE_CHINA.UTF8"
It must be consistent with the database server, so that there will be no character conversion during exp export, and the original data will be backed up.
2. Evaluate the risk of converting UTF8 to ZHS16GBK
Before conversion, use the csscan tool of Oracle to scan the database and evaluate the possible damage to data before and after Character Set conversion. If the evaluation is poor, you must give up.
First, install a set of tables and processes for CSMIG users. Log on to UNIX as an oracle user,
# Sqlplus "/as sysdab"
SQL> @ $ ORACLE_HOME/rdbms/admin/csminst. SQL
SQL> exit
# $ ORACLE_HOME \ bin \ csscan-help
You can better understand how to use csscan.
# $ ORACLE_HOME/bin/csscan system/Sunday user = mmsc FROMCHAR = UTF8 TOCHAR = ZHS16GBK ARRAY = 102400 PROCESS = 3> csscan. log
The preceding command is used to scan all data in the User: mmsc, and change the conversion from UTF8 to ZHS16GBK. Then, three files are obtained: scan.txt, scan. out, And scan. err.
View scan. out, scan. err. We can see that all data under the mmsc user can be converted, and there is no conversion of "predictional", so you can be more assured.
3. Change the database character set to ZHS16GBK.
As mentioned above, the command "Alter Database Characeter Set XXXX" is used to convert from superset to subset, which is not allowed in Oracle. However, this command provides the following command methods:
Alter Database Character Set INTERNAL_CONVERT/INTERNAL_USE XXXX
This is a non-public Oracle command. "When using this command, Oracle will skip the check of all subsets and supersets and perform forced conversion between any character sets. Therefore, you must be very careful when using this command, you must be aware of the risks brought by this operation ".
Log on to UNIX as an oracle user,
# Sqlplus "/as sysdba"
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP MOUNT;
SQL> alter session set SQL _TRACE = TRUE;
SQL> ALTER SYSTEM ENABLE RESTRICTED SESSION;
SQL> ALTER SYSTEM SET JOB_QUEUE_PROCESSES = 0;
SQL> ALTER SYSTEM SET AQ_TM_PROCESSES = 0;
SQL> ALTER DATABASE OPEN;
SQL> ALTER DATABASE CHARACTER SET ZHS16GBK;
// If the "INTERNAL_USE" parameter is not used, an error is prompted:
// ERROR at line 1:
/ORA-12712: new character set must be a superset of old character set
SQL> alter session set SQL _TRACE = FALSE;
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP;
Check whether the character set of the database has been changed.
SQL> select value $ from props $ where name = 'nls _ CHARACTERSET ';
VALUE $
-----------------
ZHS16GBK
Check whether the Simplified Chinese and Traditional Chinese in the database are normal and no garbled characters are displayed.
SQL> select spid, spname, spshortname from spinfovisual_hk
......
Unfortunately, I saw a bunch of garbled characters, which proves that Oracle does not support Character Set changes from superset to subset. At that time, I was very nervous and was afraid of failure, so that I could restore to the original state.
However, according to the previous verification, it is OK to import the backup under UTF8 to ZHS16GBK, so continue to try again.
4. Import backup user data
Log on to UNIX as an oracle user and delete the user mmsc in the database first:
# Sqlplus "/as sysdba"
SQL> drop user mmsc cascade;
SQL> exit
Run createuser. SQL to generate the mmsc user.
Then, use the original backup file to import it to the mmsc User:
Note: Set NLS_LANG to be consistent with the current database: ZHS16GBK. In this way, the NLS_LANG of the user session during export is UTF8, which is consistent with the original database character set. The current value is ZHS16GBK, which is consistent with the database character set at this time. In this way, the characters will be converted during the import.
# Export NLS_LANG = "SIMPLIFIED CHINESE_CHINA.ZHS16GBK"
# Imp mmsc/mmsc @ mdspdb file = DSMPD113_user_mmsc.dmp ignore = y fromuser = mmsc touser = mmsc
Check the simplified and traditional Chinese characters in the database. Haha, no garbled characters are displayed. Everything is normal.
Then the verification proves that one Chinese Character occupies only two characters in length. Problem solved!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.