Considerations for Oracle Character Set Conversion

Source: Internet
Author: User

As an Oracle database user, there is no stranger to the export and import commands, because these two are the tools we often use for data backup and recovery. However, Oracle Character Set problems that occur during the use of these two commands often cause unnecessary troubles and unnecessary data losses to oracle users. This article will summarize the Oracle Character Set conversion rules during the Export and Import processes and the precautions for using these two commands.

Reasons for character set Conversion

 

As shown in the export and import processes, we can see that there are four relations between character sets, and the inconsistency between these character sets is precisely the cause of Oracle Character Set conversion.

* Source database character set;

* User session character set during the export process;

* User session character set during import;

* Target database character set.

In the export and import processes, if the four factors that affect Character Set conversion are inconsistent, Oracle Character Set conversion may occur, namely:

During the export process, if the source database character set is inconsistent with the Export user session Character Set, Character Set conversion occurs, and the idnumber of the Export user session character set is stored in several bytes in the header of the exported binary DMP file. Data may be lost during this conversion process.

Example 1: if the source database uses zhs16gbk and the Export user session Character Set uses us7ascii, because zhs16gbk is an 8-bit Character Set and us7ascii is a 7-bit character set, chinese characters cannot find equivalent characters in us7ascii, so all Chinese characters will be lost and become "?" "Format, that is, the DMP file generated after the conversion has been lost.

Example 2: if the source database uses zhs16gbk and the Export user session Character Set uses zhs16cgb231280, but because the zhs16gbk character set is a superset of the zhs16cgb231280 Character Set, most characters in this process can be correctly converted, only characters that exceed the zhs16cgb231280 character set are changed to "?" Format. If the source database uses the zhs16cgb231280 Character Set and the Export user session uses the zhs16gbk character set, the conversion process can be fully converted.

In the conversion process from import to the target database, the conversion of its character set is exactly the opposite of the export process, which is not described here.

The exported DMP file contains the export user session character set. During the import process, the DMP file character set (that is, the export user session Character Set) is converted to the import user session character set. If the conversion process cannot be completed correctly, the import process from the import to the target database cannot be completed.

Correctly convert character sets

In general, we do not want character conversion during Oracle's export and import, but sometimes this conversion is necessary. For example, when installing the Oracle database, we select the zhs16cgb231280 character set. Because this character set is a small Chinese Character Set, some Chinese characters cannot be correctly expressed. This problem must be solved by using the zhs16gbk character set, character Set conversion is required.

To ensure that the Oracle character set is not converted or correctly converted during the Export and Import processes, we recommend that you check whether the source database character set is consistent with the Export user session character set before performing this process, whether the character set of the source database is consistent with that of the target database, and whether the character set of the target database is consistent with that of the import user session. If the four character sets are consistent, the Oracle character set does not need to be converted during the export or import process.

You can use the following methods to check the database character set:

View the data in the initxxxx. ora file;

Use SQL statements to view: Select name, value $ from SYS. Props $ where name = 'nls _ characterset '.

For the export and import user session character sets, you can also view or modify them through nls_lang in the Registry in windows. For Unix systems, you can view or modify the user's environment variable nls_lang.

Note that the character set of the Oracle database is usually determined during creation. Once the user data is stored, do not modify the character set because the data is stored using this character set, after changing other character sets, the original data cannot be correctly expressed. However, if you want to change the character set, you can perform the following steps:

Back up the database and delete the original data (physical backup is supported. If export is used, ensure that the character set is not converted or data is not lost );

Use the internal user to update the character set in the SYS. Props $ table:

Update SYS. Props $ set name = 'dest.charset'
Where name = 'nls _ characterset '; commit;

Restart the database;

Restore data.

The conversion between the following character sets is feasible:

It is feasible to convert the character set subset to the character set parent set, for example, zhs16cgb231280 to zhs16gbk. When the character set parent class is converted to the character set subset, part of the data is lost.

A dubyte character set that only contains English characters can also be converted to a single-byte character set. For example, zhs16gbk (English only) can be correctly converted to us7ascii.

Single-byte character sets with the same encoding range can be converted to each other.

Note that there is no data loss here. It means that after a character set a is converted to another Character Set B, you can then convert from Character Set B to character set a or Character Set B to correctly represent the data converted from Character Set.

Character Set PairsProgramImpact

The character set can be divided into single-byte character sets and multi-byte character sets based on the number of bytes required for a character. The single-byte character set can be divided into seven-bit character sets and eight-bit character sets. The single-byte 7-bit encoding character set includes us7asc ⅱ, and the single-byte 8-bit encoding character set includes we8iso8859p1 compliant with ISO 8859-1 standards. Multi-byte encoding is divided into fixed-length (length greater than or equal to 2) encoding mode and non-fixed-length encoding mode. In the multi-byte encoding Character Set, zhs16gbk, zhs16cgb231280, and ja16sjis use two bytes to represent a single character set, also known as the dual-byte character set.

How many characters is a Chinese character and an English letter? We know that a Chinese character is a dubyte character, but it has several characters related to its database character set. If the database character set uses single-byte us7ascii, one Chinese character is two characters. If the database character set uses the dual-Byte Character Set zhs16gbk, one Chinese character is one character. You can use the substr function of Oracle to prove this.

When us7asc ⅱ is used:

Select substr ('northeastern University ', 1, 2) from dual;

The statement execution result returns 'east '.

When using the zhs16gbk character set:

Select substr ('northeastern University ', 1, 2) from dual;

The return value is 'northeast '.

Select the appropriate database Character Set

When selecting a database character set, consider the following:

1. languages supported by the database

When selecting character sets for databases, we often find that several character sets are suitable for your current language requirements. For example, zhs16gbk and zhscgb231280 character sets are available in simplified Chinese. Which one should you choose? When selecting character sets, future system requirements of the database should be taken into account. If you know that the database will be extended to support different languages in the future, it would be a better idea to select a wide range of character sets.

2. Interaction between system resources and Applications

The selected database character set should ensure seamless connection between the operating system and applications. If the selected character set is not a valid Character Set of the operating system, the system needs to convert the two. Some characters may be lost during this character conversion process. During conversion from one character set a to another Character Set B, the characters in a must be able to find equivalent characters in B; otherwise, they will start with "?" . In this sense, if the two character sets have the same encoding range, they can be converted to each other.

Character Set conversion may affect system performance. Therefore, ensure that the client and server have the same character set to avoid Character Set conversion and improve system performance.

3. System Performance Requirements

Different database character sets have a certain impact on database performance. To achieve the best database performance, the selected database character set should avoid character conversion and select the most efficient encoding efficiency for the expected language. Generally, a single-byte character set has better performance than a multi-Byte Character Set, and has lower space requirements.

4. Other restrictions

When selecting an appropriate character set for the database, you should refer to the relevant documents of the corresponding Oracle version to check Oracle's restrictions on certain character sets. For example, in Oracle 8.1.5, the following character sets cannot be used: ja16eucfixed, zhs16gbkfixed, ja16dbcsfixed, ko16dbcsfixed, zhs16dbcsfixed, ja16sjisfixed, and zht32trisfixed.

To sum up, a correct understanding of the Oracle Character Set conversion process can help us avoid unnecessary troubles and data loss. The rational use of the Oracle Character Set conversion process can also help us to correctly convert from one character set to another to meet our various application needs.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.