viewing queries for Oracle character sets and setting modifications to the Oracle character set ____oracle

Source: Internet
Author: User
Tags ultraedit

This article mainly discusses the following sections: How to view query Oracle character set, modify settings character set, and common Oracle UTF8 character set and Oracle exp character set issues.

One, what is the Oracle character set

An Oracle character set is a collection of symbols that are interpreted as a byte of data, having a size and a mutual containment relationship. ORACLE's support for the national language architecture allows you to use localized languages to store, process, and retrieve data. It makes database Tools, error messages, sort orders, dates, times, currencies, numbers, and calendars automatically adapted to localized languages and platforms.

The most important parameter that affects the Oracle database character set is whether the Nls_la is Chinese or English

Territory: Specifies the date and number format of the server.

Charset: Specifies the character set.

such as: American _ AMERICA. Zhs16gbk

From the composition of Nls_lang we can see that the real impact of the database character set is actually the third part.

So the character set between the two databases as long as the third part of the same can be imported to export data, before the impact of only the hint information is Chinese or English.

How to view the database version

The SELECT * from v$version contains version information, core version information, bit information (32-bit or 64-bit), and so on, on the Linux/unix platform, you can view it through file, such as file $ORACLE _home/bin/oracle

Two. View the database character set

The database server character Set select * from Nls_database_parameters, which originates from props$, is the character set representing the database.
  
Client Character Set Environment select * from Nls_instance_parameters, which originates from V$parameter,
  
Represents the setting of a client's character set, possibly a parameter file, an environment variable, or a registry
  
The session Character set environment select * from Nls_session_parameters, which originates from the V$nls_parameters, represents the session's own settings, either the environment variable of the session or the alter sessions complete. If the session does not have a special setting, it will be consistent with nls_instance_parameters.
  
The client's character set requires consistency with the server in order to correctly display non-ASCII characters of the database. If multiple settings exist, ALTER session> environment variable > registry > Parameter File
  
Character sets require consistency, but language settings can be different, and language settings are recommended in English. If the character set is ZHS16GBK, then Nls_lang can be AMERICAN_AMERICA.ZHS16GBK.

involves three aspects of the character set,

1. Oracel server-side character set;

2. Oracle client-side character set;

3. DMP the character set of the file.

When you do the data import, you need all three character sets to be consistent to import correctly.

2.1 Querying Oracle Server-side character sets

There are a number of ways to detect the character set on the Oracle server side, and the more intuitive query method is the following:

Sql> Select Userenv (' language ') from dual;

USERENV (' LANGUAGE ')

----------------------------------------------------

Simplified Chinese_china. Zhs16gbk

Sql>select userenv (' language ') from dual;

American _ AMERICA. Zhs16gbk

2.2 How to query the character set of a DMP file

The DMP file exported with the Oracle Exp tool also contains character set information, and the 2nd and 3rd bytes of the DMP file record the character set of the DMP file. If the DMP file is small, such as only a few m or dozens of m, you can open it with UltraEdit (16), Look at 2nd 3rd byte, such as 0354, and then use the following SQL to identify its corresponding character set:

Sql> Select Nls_charset_name (To_number (' 0354 ', ' xxxx ')) from dual;

Zhs16gbk

If the DMP file is large, such as more than 2G (which is the most common case), it is slow to open with a text editor, and can be opened completely, using the following command (on a UNIX host):

Cat Exp.dmp |od-x|head-1|awk ' {print $ $} ' |cut-c 3-6

You can then use the above SQL to get its corresponding character set.

2.3 Querying Oracle Client-side character sets

Under the Windows platform, is the corresponding oraclehome in the registry Nls_lang. You can also set your own in the DOS window,

For example: Set Nls_lang=american_america. Zhs16gbk

This only affects the environment variables within the window.

Under the UNIX platform, it is the environment variable Nls_lang.

$echo $NLS _lang

American_america. Zhs16gbk

If the result of the check finds that the server side is inconsistent with the client-side character set, unify the modifications to the same set of characters as the server side.

Add:

(1). Database server Character Set

SELECT * FROM Nls_database_parameters

From Props$, is the character set that represents the database.

(2). Client Character Set Environment

SELECT * FROM Nls_instance_parameters

It comes from V$parameter, which indicates the setting of the client's character set, possibly a parameter file, an environment variable, or a registry

(3). Session Character Set Environment

SELECT * FROM Nls_session_parameters

From V$nls_parameters, which represents the session's own settings, may be the environment variable for the session or the ALTER session is complete, and will be consistent with nls_instance_parameters if there are no special settings.

(4). The client's character set requires consistency with the server in order to correctly display non-ASCII characters of the database.

NLS Priority level If multiple settings exist: SQL function > Alter SESSION > Environment variable or Registry > parameter file > database default parameters

Character sets require consistency, but language settings can be different, and language settings are recommended in English. If the character set is ZHS16GBK, then Nls_lang can be AMERICAN_AMERICA.ZHS16GBK.

Three Modifying Oracle's character set

8i the above version can modify the character set by ALTER DATABASE, but it is limited to a subset to a superset, and it is not recommended to modify the props$ table, which can cause serious errors.
  
Startup Nomount;
Alter database Mount exclusive;
Alter system enable restricted session;
Alter system set job_queue_process=0;
Alter database open;
Alter database character Set ZHS16GBK;

As stated above, the database character set cannot be changed in principle after it is created. Therefore, it is important to consider which character set to use at the beginning of design and installation. In the case of database server, the wrong modification of the character set will result in many unpredictable consequences and may seriously affect the normal operation of the database, so make sure to confirm that the two character sets have a subset and a superset before modifying them. In general, we do not recommend modifying the character set on the server side of the Oracle database unless last resort. Specifically, there is no subset and superset relationship between the two most commonly used character sets ZHS16GBK and zhs16cgb231280, so the mutual conversion between the two character sets is theoretically not supported.

However, there are 2 ways to modify the character set.

1. It is often necessary to export the database data, rebuild the database, and then import the database data in the form of conversion.

2. Modifying the character set through the ALTER DATABASE CHARACTER SET statement, but modifying the character set after the database is created is limited and the database character set can be modified only if the new character set is a superset of the current character set, for example, UTF8 is a us7ascii superset. Modify the database character set to use ALTER DB CHARACTER set UTF8.

3.1 Modifying the server-side character set (not recommended)

1. Close the database

Sql>shutdown IMMEDIATE

2. Boot to mount

Sql>startup MOUNT;

Sql>alter SYSTEM ENABLE restricted session;

Sql>alter SYSTEM SET job_queue_processes=0;

Sql>alter SYSTEM SET aq_tm_processes=0;

Sql>alter DATABASE OPEN;

--This can be from the parent set to the subset

Sql>alter DATABASE CHARACTER SET ZHS16GBK;

Sql>alter DATABASE National CHARACTER SET al16utf16;

--if it's from subset to parent set, you need to use the Internal_use parameter, skip the super subset detection

Sql>alter DATABASE CHARACTER SET internal_use Al32utf8;

Sql>alter DATABASE National CHARACTER SET internal_use al16utf16;

Sql>shutdown IMMEDIATE;

Sql>startup

Note: If you do not have a large object, there is no effect in the use of Language conversion, (remember that the set must be Oracle support, or you cannot start), just follow the above procedure.

In the event of a hint such as ' Ora-12717:cannot ALTER DATABASE national CHARACTER SET when NCLOB data exists ',

There are two ways to solve this problem

1. Use the Internal_use keyword to modify the locale,

2. Using re-create, but re-create a bit complicated, so please use Internal_use

Sql>shutdown IMMEDIATE;

Sql>startup MOUNT EXCLUSIVE;

Sql>alter SYSTEM ENABLE restricted session;

Sql>alter SYSTEM SET job_queue_processes=0;

Sql>alter SYSTEM SET aq_tm_processes=0;

Sql>alter DATABASE OPEN;

Sql>alter DATABASE National CHARACTER SET internal_use UTF8;

Sql>shutdown immediate;

sql>startup;

If you follow the above procedure, national CharSet has no problem with regional settings

3.2 Modifying the DMP file character set

As stated above, the 2nd 3rd byte of the DMP file records the character set information, so that directly modifying the contents of the 2nd 3rd byte of the dmp file can ' cheat ' the Oracle's check. This is theoretically only from the subset to the superset can be modified, but in many cases without subsets and superset of the case can also be modified, we often use some character sets, such as US7ASCII,WE8ISO8859P1,ZHS16CGB231280,ZHS16GBK basic can be changed. Because the change is only DMP file, so the impact is not big.

The specific modification method is more, the simplest is to modify the 2nd and 3rd bytes of the DMP file directly with UltraEdit.

For example, to change the character set of the DMP file to ZHS16GBK, you can use the following SQL to find out the 16 code corresponding to the character set: Sql> select To_char (nls_charset_id (' ZHS16GBK '), ' xxxx ') from Dual

0354

Then modify the DMP file's 2, 3 bytes to 0354.

If the DMP file is large and cannot be opened with a UE, it needs to be programmed.

3.3 Client Character Set setting method
1) UNIX Environment
$NLS _lang= "Simplified Chinese" _china.zhs16gbk
$export Nls_lang
Edit profile files for Oracle users
2) Windows environment
Edit the Registration form
Regedit.exe---"HKEY_LOCAL_MACHINE---" SOFTWARE---"oracle-home

Or in the window settings:

Set Nls_lang=american_america. Zhs16gbk

Four Related knowledge of character set:

4.1 Character Set
The essence is that according to a certain character coding scheme, a set of different numerical codes is given to a group of specific symbols. The earliest supported encoding scheme for Oracle databases is US7ASCII.
Oracle's character set naming follows the following naming rules:
<language><bit size><encoding>
namely: < language >< bit number >< code >
For example: ZHS16GBK represents the use of GBK encoding format, 16-bit (two-byte) Simplified Chinese character set

4.2 Character encoding Scheme


4.2.1 Single byte encoding
(1) Single-byte 7-bit character set, you can define 128 characters, the most commonly used character set is Us7ascii
(2) Single-byte 8-bit character set, can be defined 256 characters, suitable for most countries in Europe
For example: WE8ISO8859P1 (Western Europe, 8-bit, ISO-standard 8859P1 code)

4.2.2 Multibyte Encodings
    (1) variable-length multibyte encodings
    Some characters are represented in one byte and other characters in two or more characters. Variable-length multibyte encodings are often used in support of Asian languages,   such as Japanese, Chinese, Hindi, etc.
    for example: Al32utf8 (where Al stands for all, which applies to all languages), zhs16cgb231280
    (2) fixed-length multibyte encodings
    each character uses a fixed-byte encoding scheme, The only fixed-length multi-byte code supported by Oracle at the moment is af16utf16 and is used only for national character sets

4.2.3 Unicode encoding
Unicode is a single encoding scheme that covers all the known characters currently used worldwide, that is, Unicode provides a unique encoding for each character. UTF-16 is a Unicode 16-bit encoding, a fixed-length multi-byte encoding that represents a Unicode character in 2 bytes, and Af16utf16 is a UTF-16 coded character set.
UTF-8 is a Unicode 8-bit encoding, a variable-length multi-byte encoding that can represent a Unicode character in 1, 2, 3 bytes, Al32utf8,utf8, UTFE UTF-8 encoded character set

4.3 Character Set Super
When the encoded value of a character set (character set a) contains the encoded value of all another character set (character set B), and the two character sets have the same encoded value representing the same character, the character set A is the super of character set B, or the character set B is a subset of the character set A.
A subset-Super table (Subset-superset pairs) is available in official documentation for Oracle8i and oracle9i, for example: WE8ISO8859P1 is a subset of we8mswin1252. Because US7ASCII is the earliest Oracle database encoding format, there are many character sets that are us7ascii, such as WE8ISO8859P1, zhs16cgb231280, and ZHS16GBK are superset of US7ASCII.

4.4 Database Character Set (Oracle server-side character set)
The database character set is specified when the database is created and cannot normally be changed after it is created. When you create a database, you can specify the character set (CHARACTER set) and the national CHARACTER set.

4.4.1 Character Set
(1) Used to store type data such as Char, VARCHAR2, CLOB, long, etc.
(2) used to mark such as table name, column name and Pl/sql variable, etc.
(3) used to store SQL and Pl/sql program units, etc.

4.4.2 National Character Set:
(1) to store nchar, NVARCHAR2, NCLOB and other types of data
(2) The national character set is essentially an additional set of characters selected for Oracle, primarily for the purpose of enhancing the character processing capabilities of Oracle, since the nchar data type can provide support for the use of fixed-length multibyte encodings in Asia, while the database character set is not. The national character set is redefined in oracle9i and can only be selected in Af16utf16 and UTF8 in Unicode encoding, and the default value is Af16utf16

4.4.3 Query Character Set parameters
You can query the following data dictionaries or views to view character set settings
Nls_database_parameters, props$, v$nls_parameters
Nls_characterset represents the character set in the query result, Nls_nchar_characterset represents the national character set

4.4.4 Modify the database character set
As stated above, the database character set cannot be changed in principle after it is created. But there are 2 methods available.

1. If you need to modify the character set, you typically need to export the database data, rebuild the database, and then import the database data to convert it.

2. Modifying the character set through the ALTER DATABASE CHARACTER SET statement, but modifying the character set after the database is created is limited and the database character set can be modified only if the new character set is a superset of the current character set, for example, UTF8 is a us7ascii superset. Modify the database character set to use ALTER DB CHARACTER set UTF8.

4.5 Client Character Set (Nls_lang parameter)


4.5.1 Client Character Set meaning
The client character set defines how the client character data is encoded, and any character data originating from or destined for the client uses the client-defined character set encoding, and the client can be viewed as a variety of applications that can be directly connected to the database, such as Sqlplus,exp/imp. The client character set is set by setting the Nls_lang parameter.

4.5.2 Nls_lang parameter format
Nls_lang=<language>_<territory>.<client character set>
Language: Display Oracle messages, checksums, date naming
Territory: Specify default date, number, currency format
Client Character Set: Specifies the character set that the client will use
For example: Nls_lang=american_america. Us7ascii
American is the language, America is the region, Us7ascii is the client character set

4.5.3 Client Character Set setting method
1) UNIX Environment
$NLS _lang= "Simplified Chinese" _china.zhs16gbk
$export Nls_lang
Edit profile files for Oracle users
2) Windows environment
Edit the Registration form
Regedit.exe---"HKEY_LOCAL_MACHINE---" SOFTWARE---"oracle-home

4.5.4 NLS parameter query
Oracle provides several NLS parameters for custom databases and user machines to accommodate local formats, such as Nls_language,nls_date_format,nls_calender, which can be viewed by querying the following data dictionaries or v$ views.
Nls_database_parameters: Displays the current NLS parameter values for the database, including the database character set values
Nls_session_parameters: Displays parameters set by Nls_lang, or parameter values after alter session change (excluding client character sets set by Nls_lang)
Nls_instance_paramete: Displays the parameters defined by the parameter file Init<sid>.ora

V$nls_parameters: Displays the current NLS parameter values for the database

4.5.5 Modify NLS Parameters
Use the following methods to modify the NLS parameters
(1) Modifying the initialization parameter file used when the instance is started
(2) Modify environment variable Nls_lang
(3) using the ALTER SESSION statement to modify the
(4) using certain SQL functions
NLS Action Priority: SQL function > Alter SESSION > Environment variable or Registry > parameter file > database default parameters

Five Exp/imp and Character Set

5.1 Exp/imp
Export and Import are a pair of tools to read and write Oracle data. Exporting data from Oracle databases to operating system files, Import reads the data from these files into the Oracle database, and because of data migrations using EXP/IMP, the data from the source database to the target database has four links to the character set. Character set conversion will occur if the character sets for these four links are inconsistent.
Exp
____________ _________________ _____________
|imp import file |<-| environment variable nls_lang|<-| database character Set |
------------   -----------------   -------------

IMP
____________ _________________ _____________
|imp import file |->| environment variable nls_lang|->| database character Set |
------------   -----------------   -------------

The four character sets are
(1) Source database character Set
(2) The user session character set in the export process (set by Nls_lang)
(3) The user session character set in the import process (via Nls_lang setting)
(4) Target database character set

5.2 Exported Conversion process
In the export process, character set translation occurs if the source database character set is inconsistent with the export user session character set, and the ID number of the export user session character set is stored in several bytes of the exported file's header. The loss of data may occur during this conversion process.


Example: If the source database uses ZHS16GBK, and the export user session character set uses US7ASCII, because ZHS16GBK is a 16-bit character set, and Us7ascii is a 7-bit character set, the Chinese character cannot find the equivalent characters in the US7ASCII during the conversion process. , so all Chinese characters will be lost and become "?? form so that the DMP file generated after the conversion has lost data.
Therefore, if you want to export the source database data correctly, the user session character set in the export process should be equal to the source database character set or a superset of the source database character set

5.3 Import Conversion process
(1) Determine the export database character set environment
You can get the character set settings for the exported file by reading the export file header
(2) Determine the character set to import session, that is, the NLS_LANG environment variable used by the session
(3) Imp reads the export file
Read the export file character set ID and compare the Nls_lang of the import process
(4) If the export file character set and the import session character set are the same, then there is no need for conversion in this step, and if different, you need to convert the data to the character set used in the import session. As you can see, two times character set conversion occurred during the import of data into the database


The first time: the conversion between the import file character set and the character set used by the import session, and if the conversion process does not complete correctly, import to the target database cannot be completed.
Second time: Import transitions between the session character set and the database character set.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.