Character Set problems during Oracle Data dumping

Source: Internet
Author: User
Http://www2.ccw.com.cn/02/0224/c/0224c05_1.asp
Character Set problems during Oracle Data dumping




Wang quanhai, Northeastern University, Qinhuangdao Branch

As an Oracle database user, there is no stranger to the export and import commands, because these two commands are often used by us.
Tools for data backup and recovery. However, Oracle Character Set problems that occur during the use of these two commands often cause unnecessary troubles and unnecessary data losses to oracle users. Ben
This article summarizes the Oracle Character Set conversion rules during the Export and Import processes and the precautions for using these two commands.

Reasons for character set Conversion

As shown in the export and import processes, we can see that there are four relations between character sets, and the inconsistency between these character sets is precisely the cause of Oracle Character Set conversion.

  • Source database character set;
  • User session character set during the export process;
  • User session character set during import;
  • The character set of the target database.

In the export and import processes, if the four factors that affect Character Set conversion are inconsistent, Oracle Character Set conversion may occur, namely:

  1. During the export process, if the source database character set is inconsistent with the Export user session Character Set, Character Set conversion occurs, and the idnumber of the Export user session character set is stored in several bytes in the header of the exported binary DMP file. Data may be lost during this conversion process.

    Example 1:
    If the source database uses zhs16gbk and the Export user session Character Set uses us7ascii, because zhs16gbk is an 8-bit character set, and us7ascii is a 7-bit character set
    Character Set. During this conversion process, Chinese characters cannot be found in us7ascii, so all Chinese characters will be lost and become "?"
    "Format, that is, the DMP file generated after the conversion has been lost.

    Example 2:
    If the source database uses zhs16gbk and the Export user session Character Set uses zhs16cgb231280
    The superset of the zhs16cgb231280 character set. In this process, most characters can be correctly converted. Only some characters that exceed the zhs16cgb231280 character set are changed to "?
    Format. If the source database uses the zhs16cgb231280 Character Set and the Export user session uses the zhs16gbk character set, the conversion process can be fully converted.
  2. In the conversion process from import to the target database, the conversion of its character set is exactly the opposite of the export process, which is not described here.
  3. The exported DMP file contains the export user session character set. During the import process, the DMP file character set (that is
    Export user session Character Set) to import user session Character Set conversion. If the conversion process cannot be completed correctly, the import process from the import to the target database cannot be completed.

Correctly convert character sets

In general, we do not want to convert characters when using Oracle's export and import, but sometimes this conversion is required.
Yes. For example, when installing the Oracle database, we select the zhs16cgb231280 character set. Because this character set is a small Chinese Character Set, some Chinese characters cannot be correctly expressed.
You must use the zhs16gbk character set to convert the character set.

To ensure that the Oracle character set is not converted or correctly converted during the Export and Import processes, we recommend that you perform this process
Check whether the character set of the source database is consistent with that of the Export user session, whether the character set of the source database is consistent with that of the target database, and whether the character set of the target database is consistent with that of the import user session.
Whether the operator set is consistent. If the four character sets are consistent, the Oracle character set does not need to be converted during the export or import process.

You can use the following methods to check the database character set:

  1. View the data in the initxxxx. ora file;
  2. View with SQL statements:
    Select name, value $ from SYS. Props $ where name = 'nls _ characterset '.

For the export and import user session character sets, you can also view or modify them through nls_lang in the Registry in windows. For Unix systems, you can view or modify the user's environment variable nls_lang.

Note that the character set of the Oracle database is usually determined during creation. Once the user data is stored, do not modify the character set because the data is stored using this character set, after changing other character sets, the original data cannot be correctly expressed. However, if you want to change the character set, you can perform the following steps:

  1. Back up the database and delete the original data (physical backup is supported. If export is used, ensure that the character set is not converted or data is not lost );
  2. Use the internal user to update the character set in the SYS. Props $ table:
    Update SYS. Props $ set name = 'dest. charset' where name = 'nls _ characterset ';
    Commit;
  3. Restart the database;
  4. Restore data.

The conversion between the following character sets is feasible:

  1. It is feasible to convert the character set subset to the character set parent set, for example, zhs16cgb231280 to zhs16gbk. When the character set parent class is converted to the character set subset, part of the data is lost.
  2. A dubyte character set that only contains English characters can also be converted to a single-byte character set. For example, zhs16gbk (English only) can be correctly converted to us7ascii.
  3. Single-byte character sets with the same encoding range can be converted to each other.

Note that there is no data loss here. It means that after a character set a is converted to another Character Set B, you can then convert from Character Set B to character set a or Character Set B to correctly represent the data converted from Character Set.

Effect of character set on programs

The character set can be divided into single-byte character sets and multi-byte character sets based on the number of bytes required for a character. The single-byte character set is further divided into seven character sets and
8-bit character set. The single-byte 7-bit encoding character set includes us7asc ⅱ and the single-byte 8-bit encoding Character Set conforms to ISO
We8iso8859p1 and so on. Multi-byte encoding is divided into fixed-length (length greater than or equal to 2) encoding mode and non-fixed-length encoding mode. In the multi-byte encoding character set
Zhs16gbk, zhs16cgb231280, and ja16sjis use two bytes to represent a character set, also known as the dual-byte character set.

How many characters is a Chinese character and an English letter? We know that a Chinese character is a dubyte character, but it has several characters and their numbers.
The data library character set. If the database character set uses a single-byte us7ascii, a Chinese character set contains two characters. If the database character set uses the dual-Byte Character Set zhs16gbk
A Chinese character is a character. You can use the substr function of Oracle to prove this.

When us7asc ⅱ is used:

Select substr ('northeastern University ', 1, 2) from dual;

The statement execution result returns 'east '.


When using the zhs16gbk character set:

Select substr ('northeastern University ', 1, 2) from dual;

The return value is 'northeast '.



Select the appropriate database Character Set

When selecting a database character set, consider the following:

1. languages supported by the database

When selecting character sets for a database, you will often find that several character sets are suitable for your current language requirements. For example, zhs16gbk and zhs16gbk are available in simplified Chinese.
Zhscgb231280 and other character sets are available. Which one should I choose? When selecting character sets, future system requirements of the database should be taken into account. If you know that the database will be extended to support different languages in the future, select
A wide range of character sets can be a better idea.

2. Interaction between system resources and Applications

The selected database character set should ensure seamless connection between the operating system and applications. If the selected character set is not valid for the operating system, the system needs
Character conversion between users. Some characters may be lost during this character conversion process. During conversion from one character set a to another Character Set B, the characters in a must be able to find the equivalence in B.
Or it will start with "?" . In this sense, if the two character sets have the same encoding range, they can be converted to each other.

Character Set conversion may affect system performance. Therefore, ensure that the client and server have the same character set to avoid Character Set conversion and improve system performance.

3. System Performance Requirements

Different database character sets have a certain impact on database performance. To achieve the best database performance, the selected database character set should avoid character conversion and select the most efficient encoding efficiency for the expected language. Generally, a single-byte character set has better performance than a multi-Byte Character Set, and has lower space requirements.

4. Other restrictions

When selecting an appropriate character set for the database, you should refer to the relevant documents of the corresponding Oracle version to check Oracle's restrictions on certain character sets. For example
In Oracle 8.1.5, the following character sets cannot be used:
Ja16eucfixed, zhs16gbkfixed, ja16dbcsfixed, ko16dbcsfixed, zhs16dbcsfixed,
Ja16sjisfixed and zht32trisfixed.

To sum up, a correct understanding of the Oracle Character Set conversion process can help us avoid unnecessary troubles and data loss. The rational use of the Oracle Character Set conversion process can also help us to correctly convert from one character set to another to meet our various application needs.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.