View and modify Oracle Character Set

Source: Internet
Author: User

1. What is the Oracle character set?
The Oracle character set is a collection of symbols for the interpretation of byte data. It can be divided into different sizes and have an inclusive relationship. ORACLE supports the national language architecture, allowing you to store, process, and retrieve data in a localized language. It makes database tools, error messages, sorting order, date, time, currency, numbers, and calendar automatically adapt to localization languages and platforms.
The most important parameter that affects the character set of Oracle databases is the NLS_LANG parameter.
The format is as follows: NLS_LANG = language_territory.charset
It has three components (language, region, and Character Set), each of which controls the NLS subset features.
Where:
Language: Specifies the Language of the server message, which affects whether the prompt information is in Chinese or English.
Territory: Specifies the date and number format of the server,
Charset: Specifies the character set.
For example: AMERICAN _ AMERICA. ZHS16GBK
From the composition of NLS_LANG, we can see that the real impact on the database character set is actually the third part.
Therefore, if the character set between the two databases is the same as that in the third part, data can be imported and exported to each other. The preceding information is only prompted in Chinese or English.
2. Knowledge about character sets:
2.1 Character Set
In essence, according to a certain character encoding scheme, assign a specific set of symbols to different numerical encoding sets. The earliest supported encoding scheme of Oracle Database is US7ASCII.
The character set naming rules of Oracle follow the following naming rules:
<Language> <bit size> <encoding>
Namely: <language> <bit number> <encoding>
For example, ZHS16GBK indicates that the GBK encoding format and 16-bit (two-byte) simplified Chinese character set are used.
2.2 character encoding scheme
2.2.1 single-byte encoding
(1) single-byte 7-bit character set, which can be 128 characters. The most common character set is US7ASCII.
(2) single-byte 8-bit character set, which can be defined as 256 characters, suitable for most European countries
Example: WE8ISO8859P1 (Western Europe, 8-bit, ISO standard 8859P1 encoding)
2.2.2 multi-byte encoding
(1) variable-length multi-byte encoding
Some characters are represented by one byte. Other characters are represented by two or more characters. Long-length multi-byte encoding is commonly used for Asian languages, such as Japanese, Chinese, and Hindi.
For example, AL32UTF8 (where AL stands for ALL, which applies to ALL languages), zhs16cgb231280
(2) fixed length multi-byte encoding
Each character uses a fixed-length multi-byte encoding scheme. Currently, the only fixed-length multi-byte encoding supported by oracle is AF16UTF16, which is only used for national character sets.
2.2.3 unicode encoding
Unicode is a single encoding scheme that covers all the known characters currently used around the world, that is, Unicode provides a unique encoding for each character. UTF-16 is a unicode 16-bit encoding method, a fixed length multi-byte encoding, with 2 bytes representing a unicode character, AF16UTF16 is the UTF-16 encoding character set.
UTF-8 is unicode 8-bit encoding, is a variable-length multi-byte encoding, this encoding can use 1, 2, 3 bytes to represent a unicode character, AL32UTF8, UTF8 and UTFE are UTF-8 encoded character sets
2.3 character set super
When the encoding value of A character set (character set A) contains the encoding value of all other character sets (Character Set B), and the same encoding value of the two character sets represents the same character, character Set A is the Super character of Character Set B, or Character Set B is the subset of Character Set.
In the official documents of Oracle8i and oracle9i, the subset-superset pairs table is provided. For example, WE8ISO8859P1 is a subset of WE8MSWIN1252. Because US7ASCII is the earliest Oracle Database encoding format, many character sets are supersets of US7ASCII. For example, WE8ISO8859P1, ZHS16CGB231280, and ZHS16GBK are US7ASCII supersets.
2.4 database character set (oracle Server Character Set)
The database character set is specified during database creation and cannot be changed after database creation. When creating a database, you can specify the character set and national character set ).
2.4.1 Character Set
(1) used to store CHAR, VARCHAR2, CLOB, LONG, and other data types
(2) used to mark table names, column names, PL/SQL variables, etc.
(3) used to store SQL and PL/SQL program units
2.4.2 National Character Set:
(1) used to store NCHAR, NVARCHAR2, NCLOB, and other data types
(2) The National Character Set is essentially an additional character set selected for oracle. It is mainly used to enhance the character processing capability of oracle, the NCHAR data type can support the use of fixed-length multi-byte encoding in Asia, while the database character set cannot. The National Character Set is redefined in oracle9i and can only be selected from AF16UTF16 and UTF8 in unicode encoding. The default value is AF16UTF16.
2.4.3 query character set parameters
You can query the following data dictionaries or views to view Character Set settings.
Nls_database_parameters, props $, v $ nls_parameters
In the query results, NLS_CHARACTERSET indicates the character set, and NLS_NCHAR_CHARACTERSET indicates the national character set.
2.4.4 modifying database character sets
As mentioned above, the database character set cannot be changed in principle after it is created. However, there are two feasible methods.
1. If you need to modify the character set, you usually need to export the database data, recreate the database, and then import the database data for conversion.
2. you can use the alter database character set statement to modify the character set. However, there are limits on modifying the character set after the DATABASE is created. Only when the new character set is the current character set, the character set of the DATABASE can be modified, for example, UTF8 is a superset of US7ASCII. You can use alter database character set UTF8 to modify the character set of a DATABASE.
2.5 client character set (NLS_LANG parameter)
2.5.1 client Character Set meaning
The client character set defines the encoding method of client character data. Any character data sent from or to the client is encoded using the character set defined by the client. The client can be seen as a variety of applications that can be directly connected to the database, for example, sqlplus, exp/imp. The client character set is set by setting the NLS_LANG parameter.
2.5.2 NLS_LANG parameter format
NLS_LANG = <language >_< territory>. <client character set>
Language: displays the oracle message, validation, and date name.
Territory: Specifies the default date, number, currency, and other formats
Client character set: Specifies the character set that the Client will use
Example: NLS_LANG = AMERICAN_AMERICA.US7ASCII
AMERICAN is the language, AMERICA is the region, and US7ASCII is the client Character Set
2.5.3 client Character Set setting method
1) UNIX environment
$ NLS_LANG = "simplified chinese" _ china. zhs16gbk
$ Export NLS_LANG
Edit the profile file of an oracle user
2) Windows
Edit Registry
Regedit.exe --- HKEY_LOCAL_MACHINE --- SOFTWARE --- ORACLE-HOME
2.5.4 NLS parameter query
Oracle provides several NLS parameter customization databases and user machines to adapt to local formats, such as NLS_LANGUAGE, NLS_DATE_FORMAT, and NLS_CALENDER. You can query the following data dictionary or view in v $ view.
NLS_DATABASE_PARAMETERS: displays the current NLS parameter values of the database, including the database character set values
NLS_SESSION_PARAMETERS: displays the parameters set by NLS_LANG or the value of the parameters changed by alter session (excluding the client Character Set set by NLS_LANG)
NLS_INSTANCE_PARAMETE: displays the parameters defined by the parameter file init <SID>. ora.
V $ NLS_PARAMETERS: displays the current NLS parameter values of the database.
2.5.5 modify NLS Parameters
You can modify the NLS parameters using the following methods:
(1) modify the initialization parameter file used for instance startup
(2) modify the environment variable NLS_LANG
(3) Use the alter session Statement to modify
(4) use some SQL Functions
NLS priority: SQL function> alter session> environment variables or registry> parameter files> default database Parameters
Iii. EXP/IMP and Character Set
3.1 EXP/IMP
Export and Import are a pair of Oracle Data Reading and Writing Tools. Export outputs data from the Oracle database to the operating system file. Import reads data from these files to the Oracle database. Because exp/imp is used for data migration, character sets are involved in the process from the source database to the target database. If the character sets of these four links are inconsistent, character sets will be converted.
EXP
__________________________________________
| Imp import file | <-| environment variable NLS_LANG | <-| database character set |
------------------------------------------
IMP
__________________________________________
| Imp import file |-> | environment variable NLS_LANG |-> | database character set |
------------------------------------------
The four character sets are
(1) source database Character Set
(2) user session character set during the Export process (set through NLS_LANG)
(3) user session character set during Import (set through NLS_LANG)
(4) target database Character Set
3.2 export the conversion process
During the Export process, if the source database character set is inconsistent with the Export user session character set, the character set conversion will occur, and the Export user session Character Set ID will be stored in several bytes in the header of the exported file. Data may be lost during this conversion process.
For example, if the source database uses ZHS16GBK and the Export user session Character Set uses US7ASCII, because ZHS16GBK is a 16-bit Character Set and US7ASCII is a 7-bit character set, chinese characters cannot find equivalent characters in US7ASCII, so all Chinese characters will be lost and become "?" In this way, the generated Dmp file has been lost.
Therefore, if you want to Export the source database data correctly, the user session character set during the Export process should be equal to the source database character set or the superset of the source database Character Set
3.3 import conversion process
(1) determine the environment for exporting database character sets
You can get the character set settings of the exported file by reading the exported file header.
(2) determine the character set of the imported session, that is, the NLS_LANG environment variable used to import the Session.
(3) IMP reads the exported file
Read the Character Set ID of the exported file and compare it with the NLS_LANG of the import process.
(4) If the exported file character set is the same as the imported Session character set, the conversion is not required in this step. If the conversion is different, the data must be converted to the character set used for the imported Session. It can be seen that two character set conversion occurs during the process of importing data to the database.
First time: the conversion between the character set of the imported file and the character set used by the imported Session. If the conversion process cannot be completed correctly, the Import process to the target database cannot be completed.
Second: import the conversion between the Session Character Set and the database character set.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.