Some colleagues often ask questions about character sets in oracle databases, such as data migration in different databases and data exchange with other systems. These problems are often caused by different character sets.
Some colleagues often ask questions about character sets in oracle databases, such as data migration in different databases and data exchange with other systems. These problems are often caused by different character sets.
Some colleagues often ask questions about character sets in Oracle databases, such as data migration in different databases and data exchange with other systems, migration failure or garbled data in the database is often caused by different character sets. Now, I will summarize some knowledge about oracle character sets and hope to help you in your future work.
1. What is the oracle character set?
The Oracle character set is a collection of symbols for the interpretation of byte data. It can be divided into different sizes and have an inclusive relationship. ORACLE supports the national language architecture, allowing you to store, process, and retrieve data in a localized language. It makes database tools, error messages, sorting order, date, time, currency, numbers, and calendar automatically adapt to localization languages and platforms.
The most important parameter that affects the character set of oracle databases is the NLS_LANG parameter. The format is as follows:
NLS_LANG = language_territory.charset
It has three components (language, region, and Character Set), each of which controls the NLS subset features. Where:
Language specifies the Language of the server message, territory specifies the date and number format of the server, and charset specifies the character set. For example: AMERICAN _ AMERICA. ZHS16GBK
From the composition of NLS_LANG, we can see that the real impact on the database character set is actually the third part. Therefore, if the character set between the two databases is the same as that in the third part, data can be imported and exported to each other. The preceding information is only prompted in Chinese or English.
Ii. How to query Oracle character sets
Many people have encountered data import failures due to different character sets. This involves three character sets: one is the character set on the El server side, the other is the character set on the oracle client side, and the other is the dmp file character set. During data import, the three character sets must be consistent before the data can be correctly imported.
1. query character sets of oracle server
There are many ways to find the character set of the oracle server. The intuitive query method is as follows:
SQL> select userenv ('language') from dual;
There are many specific modification methods. The simplest is to directly use UltraEdit to modify the 2nd and 3rd bytes of the dmp file. For example, if you want to change the dmp file's character set to ZHS16GBK, you can use the following SQL statement to find the hexadecimal code corresponding to this character set:
SQL>Select to_char (nls_charset_id ('zhs16gbk'), 'xxx') from dual;
0354
Modify the 2 and 3 bytes of the dmp file to 0354.
If the dmp file is large and cannot be opened with ue, you need to use the program method. Some people on the internet use java stored procedures to write the Conversion Program (the advantage of using java stored procedures is that the versatility is good, but the disadvantage is relatively troublesome ). I passed the test in windows. However, the JVM option must be installed for oracle databases. If you are interested, you can study the program code.
First iteration: master the basic concepts of character sets.
Some may think that this is an option, but in fact it is precisely because of the unclear understanding of the basic concepts that lead to many problems and questions.
The first is the concept of character sets.
We know that electronic computers were originally used for scientific computing (so called "computers"), but with the development of technology, computers need to be used for other applications. This requires that the computer not only can process numeric values, but also other information such as text and special symbols. The computer itself can only process numerical information directly, therefore, we need to encode the text and Symbol Information by numerical values,The original character set is an ASCII character set that we are very familiar with. It uses seven binary bits to represent 128 characters. Later, as the needs of different countries and organizations emerged, many character sets emerged, for example, the ISO8859 series character set that represents the Western European character, represents the Chinese character GB2312-80, GBK character set.
When creating a database, we need to consider the character set and the national character set (specified by the character set and national character set clauses in create database ). To solve this problem, we must know what data needs to be stored in the database,If you only need to store English information, you can use US7ASCII as the character set. If you want to store Chinese, you need to select a character set that supports Chinese (such as ZHS16GBK ); if you need to store multi-language text, you need to select UTF8.
Experiment Result Analysis 3
Quote:
--------------------------------------------------------------------------------
Initially released by tellin
Use ZHS16GBK to insert data
SQL> INSERT INTO TEST VALUES ('northeast ');
1 row created.
SQL> SELECT * FROM TEST;
R1
--------------------
6 + 11
??
SQL> EXIT
--------------------------------------------------------------------------------
When the client character set is set to ZHS16GBK and then "Northeast" is inserted to the database, Oracle checks that the character set for the database is US7ASCII, which is inconsistent with that for the client, and must be converted, however, the "Northeast" character in the character set ZHS16GBK does not have the corresponding characters in US7ASCII. Therefore, Oracle inserts a unified "replacement character" into the database. Here it is "?" ", The encoding is 63 (00111111). In this case, the input information is actually lost, regardless of how the character set settings are changed (as shown in the following experiment results ), the results of the SELECT statement in the second row are both "?" Number (note 2, not 4 ).
Quote:
--------------------------------------------------------------------------------
Change the client character set to US7ASCII
D: \> SET NLS_LANG = AMERICAN_AMERICA.US7ASCII
D: \> SQLPLUS "/as sysdba"
The character set inserted with ZHS16GBK cannot be displayed, but the character set inserted with US7ASCII can be displayed.
SQL> SELECT * FROM TEST;
R1
----------
Northeast China
??
Change the server character set to ZHS16GBK
SQL> update props $ set value $ = 'zhs16gbk' WHERE;
1 row updated.
SQL> COMMIT;
Change the client character set to ZHS16GBK
D: \> SET NLS_LANG = AMERICAN_AMERICA.ZHS16GBK
D: \> SQLPLUS "/as sysdba"
The former US7ASCII character set can be displayed, but the data inserted with ZHS16GBK cannot be displayed. This indicates that the data inserted with ZHS16GBK is garbled.
SQL> SELECT * FROM TEST;
R1
--------------------
Northeast China
??
--------------------------------------------------------------------------------
It should be noted that modifying the database character set using "update props $ set value $ = 'zhs16gbk' WHERE;" is not a common practice and may cause problems, here is only the original article referencing the experiment results of netizens.
Experiment result analysis 4
Quote:
--------------------------------------------------------------------------------
SQL> INSERT INTO TEST VALUES ('northeast ');
1 rowCreated.
After analyzing so much content, it is actually very simple to sum up. To reduce errors and troubles in character sets, we need to adhere to two basic principles:
On the database End: select the desired character set (specified by the character set and national character set clauses in create database );
On the client: Set the character set actually used by the operating system (set through the Environment Variable NLS_LANG ).