Oracle Character set-related issues

Source: Internet
Author: User

Organize from Network + experimental character set the most important parameter that affects the Oracle database character set is the Nls_lang parameter.
Its format is as follows: Nls_lang = Language_territory.charset
The Nls_lang parameter consists of the following parts:
Nls_lang=<language>_<territory>.<clients characterset>
Nls_lang the meanings of each part are as follows:
LANGUAGE specifies:
-oracle the language used for messages
-Date month and day display
Territory Specify
-Currency and number formats
-Region and calculate week and date habits from the composition of Nls_lang we can see that the real impact of the database character set is actually the third part.
So the character set between the two databases can import and export data to each other as long as the third part, the only thing that affects the message is the Chinese or English. Character
In essence, according to a certain character coding scheme, a set of specific symbols are assigned to different numerical codes. The earliest supported encoding scheme for Oracle databases is US7ASCII.
Oracle's character set naming follows the following naming conventions:
<language><bit size><encoding>
That is: < language >< bits >< encoding >
For example: ZHS16GBK for GBK encoding format, 16-bit (two-byte) Simplified Chinese character set how to query the database character set Exp/imp and character set
Exp/imp
Export and import are a pair of tools to read and write Oracle data. Export outputs the data from the Oracle database to the operating system files, Import reads the data from these files into the Oracle database, and since data migration using EXP/IMP, there are four links to the character set in the process of data from the source database to the target database. If the character set of these four links is inconsistent, character set conversions will occur.
Exp
____________ _________________ _____________
|imp import file |<-| environment variable nls_lang|<-| database character Set |
------------   -----------------   -------------
IMP
____________ _________________ _____________
|imp import file |->| environment variable nls_lang|->| database character Set |
------------   -----------------   -------------


The four character sets are
(1) Source database character Set
(2) User session character set (via Nls_lang setting) in export process
(3) User session character set (via Nls_lang setting) during import
(4) Target database character set

5.2 Exported Conversion process
In the export process, if the source database character set is inconsistent with the export user session character set, a character set conversion occurs and the ID number of the export user session character set is stored in several bytes of the header of the exported file. The loss of data may occur during this conversion.

Example: If the source database uses ZHS16GBK, and the export user session character set uses US7ASCII, because ZHS16GBK is a 16-bit character set and Us7ascii is a 7-bit character set, the Chinese characters cannot find the equivalent character in Us7ascii during this conversion , so all Chinese characters will be lost and become "?? "Form, the resulting DMP file after conversion has already occurred with data loss.
Therefore, if you want to export the source database data correctly, the user session character set in the export process should be equal to the source database character set or the superset of the source database character set

5.3 Import Conversion process
(1) Determining the Export database character set environment
You can get the character set of the exported file by reading the export file header
(2) Determine the character set of the import session, which is the NLS_LANG environment variable used by the import session
(3) Imp read export file
Read the export file character set ID, and compare the Nls_lang of the import process
(4) If the export file character set and import session character set is the same, then in this step does not need to convert, if different, you need to convert the data into the character set used by the import session. As you can see, two character set conversions occur during the import of data into the database

First: The import file character set and the import session used by the character set between the conversion, if the conversion process is not completed correctly, import to the target database of the importing process can not be completed.
Second: Imports the transition between the session character set and the database character set. Sql> Select Userenv (' language ') from dual;

USERENV (' LANGUAGE ')
--------------------------------------------------------------------------------
American_america. ZHS16GBK DMP file Character Set view (EXP) sql> Select Nls_charset_name (To_number (' 0354 ', ' xxxx ')) from dual;

Nls_charset_name (To_number (' 0354 ', ' XXXX '))
--------------------------------------------------------------------------------
ZHS16GBK If the DMP file is large, such as more than 2G (this is also the most common case), with a text editor opened very slow or completely open, you can use the following command (on the UNIX host): [[email protected] ~]#.cat exp.dmp |od-x| Head-1|awk ' {print $ |cut-c} ' 3-6
0345 querying the character set of the Oracle client side
Under the Windows platform, it is the Nls_lang of the corresponding oraclehome in the registry. You can also set it in the DOS window itself,
For example: Set Nls_lang=american_america. ZHS16GBK the most common problem is plsqldevelop issue in Sqlplus create a script @echo off set nls_lang= database-side character set start Plsqldevelop.exe Linux Client recommends setting Nls_lang=american_america.zhs16gbk;export Nls_lang query database character set in environment variable sql> set line 200
sql> Col VALUE format A50
sql> col Parameters Format A20
Sql> select * from Nls_database_parameters;

PARAMETER VALUE
------------------------------ --------------------------------------------------
Nls_language AMERICAN
Nls_territory AMERICA
Nls_currency $
Nls_iso_currency AMERICA
Nls_numeric_characters.,
Nls_characterset ZHS16GBK
Nls_calendar Gregorian
Nls_date_format DD-MON-RR
Nls_date_language AMERICAN
Nls_sort BINARY
Nls_time_format HH.MI. Ssxff AM

PARAMETER VALUE
------------------------------ --------------------------------------------------
Nls_timestamp_format DD-MON-RR HH.MI. Ssxff AM
Nls_time_tz_format HH.MI. Ssxff AM TZR
Nls_timestamp_tz_format DD-MON-RR HH.MI. Ssxff AM TZR
Nls_dual_currency $
Nls_comp BINARY
Nls_length_semantics BYTE
NLS_NCHAR_CONV_EXCP FALSE
Nls_nchar_characterset AL16UTF16
Nls_rdbms_version 11.2.0.4.0

Rows selected. Sql> select ' Test ' from dual;

' Test
------
Test

Sql> FAQ Imp Import garbled modify DMP file character set
As mentioned above, the 2nd 3rd byte of the DMP file records the character set information, so the content of the 2nd 3rd byte of the DMP file can be ' deceived ' by the Oracle check. This is done theoretically only from subset to superset can be modified, but in many cases there is no subset and superset of the situation can also be modified, some of our commonly used character sets, such as US7ASCII,WE8ISO8859P1,ZHS16CGB231280,ZHS16GBK basic can be changed. Because the change is only the DMP file, so the impact is not small. Summarize and modify NLS parameters
You can modify the NLS parameters by using the following methods
(1) Modify the initialization parameter file used when the instance starts
(2) Modify the environment variable Nls_lang
(3) using the ALTER SESSION statement to modify the Oracle session
(4) using some SQL functions the client's character set requires consistency with the server to correctly display the non-ASCII characters of the database.
If multiple settings exist, NLS priority level: SQL function > Alter SESSION > Environment variable or Registry > parameter file > database default parameters
Character set requirements are consistent, but language settings can be different, language settings are recommended in English. If the character set is ZHS16GBK, then Nls_lang can be AMERICAN_AMERICA.ZHS16GBK modify the server-side character set the database character set cannot be changed in principle after creation. Therefore, it is important to consider which character set to use at the beginning of design and installation. For database server, the incorrect modification of the character set will result in a lot of unpredictable consequences that can seriously affect the normal functioning of the database, so be sure to verify that there are subsets and superset relationships between the two character sets before you modify them. In general, we do not recommend modifying the character set of the Oracle database server side unless it is a last resort. In particular, there is no subset and superset relationship between the two character sets ZHS16GBK and zhs16cgb231280 that we use most often, so it is theoretically not supported to convert between the two character sets. However, there are 2 ways to modify the character set.
1. It is often necessary to export database data, rebuild the database, and then import the database data in a way that transforms.
2. Modify the character set through the ALTER DATABASE CHARACTER SET statement, but there is a limit to modifying the character set after the database is created, and only if the new character set is a superset of the current character set can the database character set be modified, for example UTF8 is a superset of us7ascii. Modifying the database character set can be used with the ALTER DB CHARACTER set UTF8. Boot to mount
Sql>startup MOUNT;
Sql>alter SYSTEM ENABLE RESTRICTED SESSION;
Sql>alter SYSTEM SET job_queue_processes=0;
Sql>alter SYSTEM SET aq_tm_processes=0;
Sql>alter DATABASE OPEN;
--Here you can get from the parent set to the subset
Sql>alter DATABASE CHARACTER SET ZHS16GBK;
Sql>alter DATABASE National CHARACTER SET al16utf16;
--If you are from a subset to a parent set, you need to use the Internal_use parameter to skip the hyper-subset detection
Sql>alter DATABASE CHARACTER SET internal_use Al32utf8;
Sql>alter DATABASE National CHARACTER SET internal_use al16utf16;

Oracle Character set-related issues

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.