1. What is the Oracle character set?
The Oracle character set is a collection of symbols for the interpretation of byte data. It can be divided into different sizes and have an inclusive relationship. ORACLE supports the national language architecture, allowing you to store, process, and retrieve data in a localized language. It makes database tools, error messages, sorting order, date, time, currency, numbers, and calendar automatically adapt to localization languages and platforms.
The most important parameter that affects the character set of Oracle databases is the NLS_LANG parameter.
The format is as follows: NLS_LANG = language_territory.charset
It has three components (language, region, and Character Set), each of which controls the NLS subset features.
Where
Language: Specifies the Language of the server message, which affects whether the prompt information is in Chinese or English.
Territory: Specifies the date and number format of the server,
Charset: Specifies the character set.
For example, AMERICAN _ AMERICA. ZHS16GBK
From the composition of NLS_LANG, we can see that the real impact on the database character set is actually the third part.
Therefore, if the character set between the two databases is the same as that in the third part, data can be imported and exported to each other. The preceding information is only prompted in Chinese or English.
2. Knowledge about character sets:
2.1 Character Set
In essence, according to a certain character encoding scheme, assign a specific set of symbols to different numerical encoding sets. The earliest supported encoding scheme of Oracle Database is US7ASCII.
The character set naming rules of Oracle follow the following naming rules:
Languagebit sizeencoding
That is, bit encoding.
For example, ZHS16GBK uses the GBK encoding format and the 16-bit (two-byte) simplified Chinese character set.
2.2 character encoding scheme
2.2.1 single-byte encoding
(1) single-byte 7-bit character set, which can be 128 characters. The most common character set is US7ASCII.
(2) single-byte 8-bit character set, which can be defined as 256 characters, suitable for most European countries
Example: WE8ISO8859P1 (Western Europe, 8-bit, ISO standard 8859P1 encoding)
2.2.2 multi-byte encoding
(1) variable-length multi-byte encoding
Some characters are represented by one byte. Other characters are represented by two or more characters. Long-length multi-byte encoding is commonly used for Asian languages, such as Japanese, Chinese, and Hindi.
For example, AL32UTF8 (where AL stands for ALL, which applies to ALL languages), zhs16cgb231280
(2) fixed length multi-byte encoding
Each character uses a fixed-length multi-byte encoding scheme. Currently, the only fixed-length multi-byte encoding supported by oracle is AF16UTF16, which is only used for national character sets.
2.2.3 unicode encoding
Unicode is a single encoding scheme that covers all the known characters currently used around the world, that is, Unicode provides a unique encoding for each character. UTF-16 is a unicode 16-bit encoding method, a fixed length multi-byte encoding, with 2 bytes representing a unicode character, AF16UTF16 is the UTF-16 encoding character set.
UTF-8 is unicode 8-bit encoding, is a variable-length multi-byte encoding, this encoding can use 1, 2, 3 bytes to represent a unicode character, AL32UTF8, UTF8 and UTFE are UTF-8 encoded character sets
2.3 character set super
When the encoding value of A character set (character set A) contains the encoding value of all other character sets (Character Set B), and the same encoding value of the two character sets represents the same character, character Set A is the Super character of Character Set B, or Character Set B is the subset of Character Set.
In the official documents of Oracle8i and oracle9i, the subset-superset pairs table is provided. For example, WE8ISO8859P1 is a subset of WE8MSWIN1252. Because US7ASCII is the earliest Oracle Database encoding format, many character sets are supersets of US7ASCII. For example, WE8ISO8859P1, ZHS16CGB231280, and ZHS16GBK are US7ASCII supersets.
2.4 database character set (oracle Server Character Set)
The database character set is specified during database creation and cannot be changed after database creation. When creating a database, you can specify the character set and national character set ).
2.4.1 Character Set
(1) used to store CHAR, VARCHAR2, CLOB, LONG, and other data types
(2) used to mark table names, column names, and PLSQL Variables
(3) used to store SQL and PLSQL program units
2.4.2 National Character Set:
(1) used to store NCHAR, NVARCHAR2, NCLOB, and other data types
(2) The National Character Set is essentially an additional character set selected for oracle. It is mainly used to enhance the character processing capability of oracle, the NCHAR data type can support the use of fixed-length multi-byte encoding in Asia, while the database character set cannot. The National Character Set is redefined in oracle9i and can only be selected from AF16UTF16 and UTF8 in unicode encoding. The default value is AF16UTF16.
2.4.3 query character set parameters
You can query the following data dictionaries or views to view Character Set settings.
Nls_database_parameters, props $, v $ nls_parameters
In the query results, NLS_CHARACTERSET indicates the character set, and NLS_NCHAR_CHARACTERSET indicates the national character set.
2.4.4 modifying database character sets
As mentioned above, the database character set cannot be changed in principle after it is created. However, there are two feasible methods.
1. If you need to modify the character set, you usually need to export the database data, recreate the database, and then import the database data for conversion.
2. you can use the alter database character set statement to modify the character set. However, there are limits on modifying the character set after the DATABASE is created. Only when the new character set is the current character set, the character set of the DATABASE can be modified, for example, UTF8 is a superset of US7ASCII. You can use alter database character set UTF8 to modify the character set of a DATABASE.
2.5 client character set (NLS_LANG parameter)
2.5.1 client Character Set meaning
The client character set defines the encoding method of client character data. Any character data sent from or to the client is encoded using the character set defined by the client. The client can be seen as a variety of applications that can be directly connected to the database, for example, sqlplus and expimp. The client character set is set by setting the NLS_LANG parameter.
2.5.2 NLS_LANG parameter format
NLS_LANG = language_territory.client character set
Language: displays the oracle message, verification, and date name.
Territory: Specifies the default date, number, currency, and other formats
Client character set: Specifies the character set that the Client will use
Example: NLS_LANG = AMERICAN_AMERICA.US7ASCII
AMERICAN is the language, AMERICA is the region, and US7ASCII is the client Character Set
2.5.3 client Character Set setting method
1) UNIX environment
$ NLS_LANG = "simplified chinese" _ china. zhs16gbk
$ Export NLS_LANG
Edit the profile file of an oracle user
2) Windows
Edit Registry
Regedit.exe --- HKEY_LOCAL_MACHINE --- SOFTWARE --- ORACLE -- HOME
2.5.4 NLS parameter query
Oracle provides several NLS parameter customization databases and user machines to adapt to local formats, such as NLS_LANGUAGE, NLS_DATE_FORMAT, and NLS_CALENDER. You can query the following data dictionary or view in v $ view.
NLS_DATABASE_PARAMETERS: displays the current NLS parameter values of the database, including the database character set values
NLS_SESSION_PARAMETERS: displays the parameters set by NLS_LANG or the value of the parameters changed by alter session (excluding the client Character Set set by NLS_LANG)
NLS_INSTANCE_PARAMETE: displays the parameters defined by the parameter file initSID. ora.
V $ NLS_PARAMETERS: displays the current NLS parameter values of the database.
2.5.5 modify NLS Parameters
You can modify the NLS parameters using the following methods:
(1) modify the initialization parameter file used for instance startup
(2) modify the environment variable NLS_LANG
(3) Use the alter session Statement to modify
(4) use some SQL Functions
NLS role priority level: SQL function alter session environment variable or registry parameter file default database parameter