View the exported character set from the ExportDMP File

Source: Internet
Author: User
Oracle has officially launched the next-generation data logical backup and restoration tool DataPump (Data Pump) starting from 10 Gb, in order to replace ldquo; The ExpImp tool with a long history of rdquo. Jing

Oracle has officially launched a new generation of Data logical backup and restoration tool Data Pump (Data Pump) starting from 10 Gb, in order to replace ldquo; long history rdquo; Exp/Imp tool. Jing

Oracle has officially launched a new generation of Data logical backup and restoration tool Data Pump (Data Pump) starting from 10 Gb, in order to replace the "long history" Exp/Imp tool. After nearly 10 years of development, many users, especially developers, are still dependent on the tool. In many development environments, data transmission still uses Exp/Imp.

In the use of Exp/Imp tools, Character Set and version compatibility are two key difficulties that plague users. The character set determines whether the exported data can be correctly imported to the target system and whether a garbled error occurs. The version compatibility issue involves four versions: The Source System, the target system database version, the import client version, and the export Client Version. Oracle's official MOS used series matrices to describe the relationship and compatibility, but there were few friends who could remember the problem.

This article mainly introduces how to check the character set of Dmp files exported through Export. It also shows some common problems in character set usage.

1. Environment Introduction

Endian is an important aspect of the underlying technology of the operating system. Data transmission and device communication are accompanied by the recognition of character protocols between the source and target systems. The basic problems involved in the Endian issue are: in what order the Information Unit (bit, byte, word, and double word) transmits when communication occurs. If the two parties are not uniform or incompatible, it is impossible to implement data exchange and decoding.

In the Oracle field, more common Endian discussions are on the technology of table space that can be transferred. Currently, table space that can be transferred is generally considered a fast data migration strategy, which is basically similar to direct file transfer. However, you cannot transmit tablespaces across the Endian operating system.

In practice, we have two common Endian structures: Big-Endian and Little-Endian. Generally, Big-Endian is a type of Big-Endian operating system, which is commonly used in Unix systems. The read/write sequence of the Little-Endian system is opposite to that of the Big-Endian system, which indicates that the system is Intel/AMD x86 and Alpha.

The DMP file structure is different in different Endian environments. Oracle is compatible with two Endian transmission modes during operations. This is why we can easily exchange and transmit data between Unix and Linux.

By directly reading the DMP file, we can analyze the character set configuration used during the DMP file export process. The following is a series of tests:

2. Big-Endian DMP analysis experiment

First, we test the Big-Endian structure. Select the Unix System database environment.

[Oracle @ MISDB: ~] $ Uname-

Aix misdb 1 6 00F7FCC94C00

First, we do not set any environment variables to observe the Working Behavior of the Export.

[Oracle @ MISDB: ~] $ Exp \ "/as sysdba \" owner = scott file = scott_test.dmp

Export: Release 11.2.0.3.0-Production on Wed Jul 1 18:25:15 2015

Copyright (c) 1982,201 1, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0-64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Export done in US7ASCII character set and UTF8 NCHAR character set

Server uses AL32UTF8 character set (possible charset conversion)

About to export specified users...

. Exporting pre-schema procedural objects and actions

. Exporting foreign function library names for user SCOTT

. Exporting PUBLIC type synonyms

. Exporting private type synonyms

. Exporting object type definitions for user SCOTT

About to export SCOTT's objects...

. Exporting database links

. Exporting sequence numbers

. Exporting cluster definitions

. About to export SCOTT's tables via Conventional Path...

.. Exporting table BONUS 0 rows exported

EXP-00091: Exporting questionable statistics.

.. Exporting table DEPT 4 rows exported

EXP-00091: Exporting questionable statistics.

EXP-00091: Exporting questionable statistics.

(Space reasons, omitted ......)

. Exporting statistics

Export terminated successfully with warnings.

Pay attention to the two parts of the Red: one is the EXP-00091 error prompt information, the author's article has explained this error, mainly is not set the NLS_LANG environment variable. The other is the marked US7ASCII character set parameters.

The Oracle Export tool features this. If the NLS_LANG environment variable is not set before execution, the Oracle default Character Set US7ASCII will be taken as the Export object. The original character sets (such as AL32UTF8) in the database must be converted to US7ASCII for saving.

There are some potential problems: If the NLS_LANG environment variable is not set, data export is organized into the US7ASCII format. Once WE8DEC data is stored in the database, it is forcibly converted to US7ASCII. All characters without US7ASCII will be automatically corrupted into garbled characters.

Next, we will check the character set from the dmp file.

[Oracle @ MISDB: ~] $ Ls-l | grep dmp

-Rw-r -- 1 oracle oinstall 20480 Jul 01 scott_test.dmp

[Oracle @ MISDB: ~] $ Cat scott_test.dmp | od-x | head

0000000 0300 0145 5850 4f52 543a 5631 312e 3032

0000020 2e30 300a 4453 5953 0a52 5553 4552 530a

0000040 3430 3936 0a30 0a37 320a 300a 0001 0369

0000060 0367 0001 0000 0000 0000 0000 000e 0020

0000100 2020 2020 2020 2020 2020 2020 2020

*

0000140 2020 2020 2020 2020 2057 6564 204a 756c

0000160 2031 2031 383a 3235 3a31 3520 3230

0000200 7363 6f74 745f 7465 7374 2e64 6d70 0000

0000220 0000 0000 0000 0000 0000 0000 0000

Character Set information is usually expressed in the second and third bytes of the DMP file header. Note: The order is in the Big-Endian operating system. Generally, the header information is 03xx (xx is any bit value ).

0x0001 can be found in Oracle.

SQL> select nls_charset_id (value) nls_charset_id, value

2 from v $ nls_valid_values

3 where parameter = 'characterset'

4 order by nls_charset_id (value );

NLS_CHARSET_ID VALUE

------------------------------------------------------------------------------

1 US7ASCII

2 WE8DEC

(Space reasons, omitted ......)

1865 ZHT16BIG5FIXED

2000 AL16UTF16

247 rows selected

The value of US7ASCII is 0x0001, indicating that the exported file character set is US7ASCII. Next we will set the NLS_LANG environment variable for testing.

[Oracle @ MISDB: ~] $ Export NLS_LANG = AMERICAN_AMERICA.AL32UTF8

[Oracle @ MISDB: ~] $ Exp \ "/as sysdba \" owner = scott file = scott_test_Set.dmp

Export: Release 11.2.0.3.0-Production on Wed Jul 1 18:29:43 2015

Copyright (c) 1982,201 1, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0-64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Export done in AL32UTF8 character set and UTF8 NCHAR character set

About to export specified users...

(Space reasons, omitted ......)

Export terminated successfully without warnings.

View the file header information.

[Oracle @ MISDB: ~] $ Cat scott_test_Set.dmp | od-x | head

0000000 0303 6945 5850 4f52 543a 5631 312e 3032

0000020 2e30 300a 4453 5953 0a52 5553 4552 530a

0000040 3430 3936 0a30 0a37 320a 300a 0369 0369

0000060 0367 0001 0000 0000 0000 0000 0012

0000100 2020 2020 2020 2020 2020 2020 2020

*

0000140 2020 2020 2020 2020 2057 6564 204a 756c

0000160 2031 2031 383a 3239 3a34 3320 3230

0000200 7363 6f74 745f 7465 7374 5f53 6574 2e64

0000220 6d70 0000 0000 0000 0000 0000 0000

Number of digits: 0x0369 corresponds to AL32UTF8. The following section describes the hexadecimal encoding of some common character sets.

The values for the most commonly used character sets are below:

Name ID

----------------------

US7ASCII 0x0001

WE8DEC 0x0002

WE8ISO8859P1 0x001f

EE8ISO8859P2 0x0020

SE8ISO8859P3 0x0021

NE8ISO8850P4 0x0022

CL8ISO8859P5 0x0023

AR8ISO8859P6 0x0024

EL8ISO8859P7 0x0025

IW8ISO8859P8 0x0026

WE8ISO8859P9 0x0027

WE8ISO8859P15 0x002e

TH8TISASCII 0x0029

US8PC437 0x0004

WE8ROMAN8 0x0005

WE8PC850 0x000a

EE8PC852 0x0096

RU8PC855 0X009B

TR8PC857 0x009C

WE8PC858 0x001c

WE8PC860 0x00A0

IS8PC861 0x00A1

N8PC865 0x00BE

RU8PC866 0x0098

EE8MSWIN1250 0x00aa

CL8MSWIN1251 0x00ab

WE8MSWIN1252 0x00b2

EL8MSWIN1253 0x00ae

TR8MSWIN1254 0x00b1

IW8MSWIN1255 0x00af

AR8MSWIN1256 0x0230

BLT8MSWIN1257 0x00b3

ZHT16MSWIN950 0x0363

ZHS16GBK 0x0354

ZHT16HKSCS 0x0364

JA16EUC 0x033e

JA16SJIS 0x0340

ZHT16BIG5 0x0361

AL24UTFFSS 0x0366

UTF8 0x0367

AL32UTF8 0x0369

Note: After using NLS_LANG to display the specified character set combination, the DMP file organizes files according to this encoding method.

Next, let's take a look at what happens in Little-Endian.

For more details, please continue to read the highlights on the next page:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.