Oracle database Character Set problem summary __python

Source: Internet
Author: User
Tags ultraedit
Summary of Oracle database character set issues
Data migration in different databases, exchange of data with other systems, and so on, often because of different character sets cause migration failure or data in the database into garbled. Now I'm going to make a simple summary of some of the knowledge related to Oracle character sets

One, what is the Oracle character set

An Oracle character set is a collection of symbols that are interpreted as a byte of data, having a size and a mutual containment relationship. ORACLE's support for the national language architecture allows you to use localized languages to store, process, and retrieve data. It makes database Tools, error messages, sort orders, dates, times, currencies, numbers, and calendars automatically adapted to localized languages and platforms.

The most important parameter that affects the Oracle database character set is the Nls_lang parameter. Its format is as follows:

Nls_lang = Language_territory.charset

It has three components (language, region, and character set), and each component controls the characteristics of the NLS subset. which

Language Specifies the language of the server message, territory the date and number format of the specified server, charset the specified character set. such as: American _ AMERICA. Zhs16gbk

From the composition of Nls_lang we can see that the real impact of the database character set is actually the third part. So the character set between the two databases as long as the third part of the same can be imported to export data, before the impact of only the hint information is Chinese or English.

ii. How to query Oracle's character set

Many people have encountered situations in which data import fails because of a different character set. This involves three aspects of the character set, one is the character set on the Oracel server, the other is the character set of the Oracle client side, and the third is the character set of the DMP file. When you do the data import, you need all three character sets to be consistent to import correctly.

1, query Oracle server-side character set

There are a number of ways to detect the character set on the Oracle server side, and the more intuitive query method is the following:

Sql>select userenv (' language ') from dual;

The results are similar to the following: American _ AMERICA. Zhs16gbk

2. How to query the character set of DMP file

The DMP file exported with the Oracle Exp tool also contains character set information, and the 2nd and 3rd bytes of the DMP file record the character set of the DMP file. If the DMP file is small, such as only a few m or dozens of m, you can open it with UltraEdit (16), Look at 2nd 3rd byte, such as 0354, and then use the following SQL to identify its corresponding character set:

Sql> Select Nls_charset_name (To_number (' 0354 ', ' xxxx ')) from dual;
    Zhs16gbk

If the DMP file is large, such as more than 2G (which is the most common case), it is slow to open with a text editor, and can be opened completely, using the following command (on a UNIX host):

Cat Exp.dmp |od-x|head-1|awk ' {print $ $} ' |cut-c 3-6

You can then use the above SQL to get its corresponding character set.

3, query Oracle client-side character Set

This is relatively simple. Under the Windows platform, is the corresponding oraclehome in the registry Nls_lang. You can also set your own in a DOS window, such as:

Set Nls_lang=american_america. Zhs16gbk

This only affects the environment variables within the window.

Under the UNIX platform, it is the environment variable Nls_lang.

$echo $NLS _lang

    American_america. Zhs16gbk

If the result of the check finds that the server side is inconsistent with the client-side character set, unify the modifications to the same set of characters as the server side.

Iii. Modifying the character set of Oracle

As mentioned above, Oracle's character set has a mutual containment relationship. If Us7ascii is a subset of ZHS16GBK, from Us7ascii to ZHS16GBK there will be no data interpretation problems, no data loss. UTF8 should be the largest in all character sets because it is based on Unicode, and Double-byte saves characters (and therefore consumes more in storage space).

Once the database is created, the character set of the database is theoretically immutable. Therefore, it is important to consider which character set to use at the beginning of design and installation. According to Oracle's official instructions, the conversion of character sets is supported from subsets to superset, not vice versa. If there is no relationship between the two character sets and the superset at all, then the conversion of the character set is not supported by Oracle. In the case of database server, the wrong modification of the character set will result in many unpredictable consequences and may seriously affect the normal operation of the database, so make sure to confirm that the two character sets have a subset and a superset before modifying them. In general, we do not recommend modifying the character set on the server side of the Oracle database unless last resort. Specifically, there is no subset and superset relationship between the two most commonly used character sets ZHS16GBK and zhs16cgb231280, so the mutual conversion between the two character sets is theoretically not supported.

1. Modify server-side character set (not recommended)

Before Oracle 8, you can change the character set of a database by props$ the data dictionary table directly. However, after Oracle8, at least three system tables record the information of the database character set, only to change the props$ table is not complete, can cause serious consequences. The correct way to modify this is as follows:

$sqlplus/nolog

    sql>conn/as sysdba;

If the database server is already started, execute the shutdown immediate command to close the database server, and then execute the following command:

Sql>startup MOUNT;
    Sql>alter SYSTEM ENABLE restricted session;
    Sql>alter SYSTEM SET job_queue_processes=0;
    Sql>alter SYSTEM SET aq_tm_processes=0;
    Sql>alter DATABASE OPEN;
    Sql>alter DATABASE CHARACTER SET zhs16gbk;
    Sql>alter DATABASE National CHARACTER SET ZHS16GBK;
    Sql>shutdown IMMEDIATE;
    Sql>startup

1) Use the SYS username to login Oracle.   2 View Character Set content Sql>select * from props$;   3) modify character Set sql> update props$ set value$= ' new character set ' where Name= ' Nls_characterset ' 4) COMMIT; 5) then restart Shutdown immediate; Startup;

2, modify the DMP file character set

As stated above, the 2nd 3rd byte of the DMP file records the character set information, so that directly modifying the contents of the 2nd 3rd byte of the dmp file can ' cheat ' the Oracle's check. This is theoretically only from the subset to the superset can be modified, but in many cases without subsets and superset of the case can also be modified, we often use some character sets, such as US7ASCII,WE8ISO8859P1,ZHS16CGB231280,ZHS16GBK basic can be changed. Because the change is only DMP file, so the impact is not big.

The specific modification method is more, the simplest is to modify the 2nd and 3rd bytes of the DMP file directly with UltraEdit. For example, to change the character set of the DMP file to ZHS16GBK, you can use the following SQL to find out the 16 code corresponding to the character set:

Sql> Select To_char (nls_charset_id (' ZHS16GBK '), ' XXXX ') from dual;

   0354

Then modify the DMP file's 2, 3 bytes to 0354.

If the DMP file is large and cannot be opened with a UE, it needs to be programmed. Someone on the web has written a conversion program with Java stored procedures (the advantage of using Java stored procedures is that versatility is taught well, and disadvantage is more troublesome). I tested the pass under Windows. However, the Oracle database is required to install JVM options. Interested friends can study the program code.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.