A preliminary study on the character set problem (vi) The generation of-----garbled characters

Source: Internet
Author: User
Tags character set commit insert query reserved client oracle database sqlplus
Problem

Original link:


Http://www.eygle.com/special/NLS_CHARACTER_SET_06.htm




The original published in the Itpub Technology series "Oracle Database DBA topic Technology Pristine", without permission, is prohibited reproduced this article.

Finally, we will discuss the generation of garbled.



Typically, there are 3 character set settings in our real-world environment.

First: Client application Character Set (application Character set)

Second: Client Nls_lang parameter settings

Third: server-side, database character set (Character set) settings



We say that a character display in the client application (such as sqlplus,cmd,notepad, etc.) depends on the client operating system, the client can display what kind of characters,
We can enter these characters in the application, and as to whether these characters can be stored properly in the database, it is closely related to the other two character set settings.

In the transmission process, the client Nls_lang is mainly used for conversion judgment

If Nls_lang is equal to the database character set, no conversion is made to insert the character directly into the database

If the conversion is different, the conversion has two main tasks
If there is a correspondence, the corresponding binary encoding is mapped (after this step mapping, the character represented may be converted) passed to the database if there is no corresponding relationship, pass a substitution character (many platforms are?)


The database character set, which, when different from the client Nls_lang, will further handle the characters that have been Nls_lang converted
For? (i.e. characters that do not have a corresponding relationship) directly to the The form is stored in the database for other characters and is converted between the Nls_lang and the database character set.


Here we take a look at the most common character sets and garbled generation:

1. When the Nls_lang character set differs from the database character set, the Nls_lang is different from the server-side character sets

In this case, there are two possible ways:
The character entered by the client does not have a corresponding character in the Nls_lang and cannot be converted, Nls_lang replaces these unmapped characters with substitution characters (this step is converted in TTS
In many character sets, this substitution character is "?" when the character of the client corresponds to a different character in the Nls_lang, the conversion is passed to the database, the characters are stored, but the metadata is lost, and the database
Character no longer represents the input of the client. And this process is not reversible, which is why many times in the client input is the normal encoding, query will get the reason of unknown characters.


Let's take a look at this process by using the image above, when the client enters the euro symbol in the WE8ISO8859P15 character set: €, when the client Nls_lang is different from the database end character set.
For the first conversion, the client-side € symbol encoding is A4, and in the Nls_lang conversion, A4 corresponds to the ' ¤ ' in Nls_lang, and the conversion of this step produces an error mapping. Because the database character set does not
With the Nls_lang settings, then further conversion occurred, the encoding into the database into the C2A4, although the correct conversion with the Nls_lang, but the client input data has
Damaged or lost.

We can do a simple test with our familiar character set:

Test environment:

The client applies to the Chinese 18030 character set

Nls_lang set to Us7ascii character set

Database character set to ZHS16GBK



C:\>set Nls_lang=american_america. Us7asciic:\>sqlplus eygle/eyglesql*plus:release 9.2.0.4.0-production on Tue Nov 4 01:19:57 2003Copyright (c) 1982, 2 002, Oracle Corporation. All rights reserved. Connected to:oracle9i Enterprise Edition release 9.2.0.4.0-productionwith The partitioning, Oracle Label security, OLAP and Oracle Data Mining optionsjserver release 9.2.0.4.0-productionsql> inserts into test values (' Test '); 1 row created. Sql> Select Name,dump (name) from test; Namedump (NAME)--------------------------------------------------2bjttyp=1 len=4:50,98,74,84 at this time we found that the query came out with confusing characters, We convert these characters to 2.110010 1100010 1001010 1010100 fill 8 is 00110010 01100010 01001010 01010100 We change the first to 1 10110010 11100010. 11001010 11010100 We look at the right storage:
C:\>set Nls_lang=american_america. Zhs16gbk

C:\>sqlplus Eygle/eygle

Sql*plus:release 9.2.0.4.0-production on Tue Nov 4 01:40:18 2003

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0-production
With the partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
Jserver Release 9.2.0.4.0-production

sql> INSERT into test values (' test ');

1 row created.

sql> Col Dump (name) for A30
Sql> Select Name,dump (name) from test;

Name DUMP (name)
---------- ------------------------------
Test Typ=1 len=4:178,226,202,212

1 row selected.


We convert this result to a 2 binary representation
10110010 11100010 11001010 11010100

This result is the first time we have to fill 1 of garbled results.

This test shows that in the us7ascii conversion of Chinese to remove the first 1, so that the loss of metadata, resulting in garbled, nls_lang conversion effect can be added!




3. Nls_lang the same as the database character set
In this case, the database end of the code passed on by the client does not make any conversion (this can improve performance), direct storage into the database, then there is the same problem with the above,
If the character set passed by the client can be stored correctly in the database, if not, it will be replaced with a replacement character? , garbled in this way produced.




As shown in the figure above, when the Nls_lang and database character sets are all UTF8, the client's code for the euro symbol A4 is inserted into the database without any conversion, while the number of UTF8
According to the library, A4 represents an illegal character.



Let's take a look at a simple test

Test environment:

Client character set applied to Chinese GB18030

Client Nls_lang is Us7ascii

Database Character Set is Us7ascii
We know this time, the stored data, the database does not make any conversion, in the following test, we see the Chinese in the Us7ascii character set to display correctly.




C:\>set Nls_lang=american_america. Us7asciic:\>sqlplus eygle/eyglesql*plus:release 9.2.0.4.0-production on Tue Nov 4 01:02:04 2003Copyright (c) 1982, 2 002, Oracle Corporation. All rights reserved. Connected to:oracle9i Enterprise Edition release 9.2.0.4.0-productionwith The partitioning, Oracle Label security, OLAP and Oracle Data Mining optionsjserver release 9.2.0.4.0-productionsql> inserts into test values (' Test '); 1 row created. Sql> commit; Commit complete. Sql> select * from test; NAME----------Test 1 row selected. sql> Col Dump (name) for a30sql> Select Name,dump (name) from Test;name dump (name)--------------------------------- -------Test typ=1 len=4:178,226,202,2121 Row selected. Sql> select * from Nls_database_parameters; PARAMETER VALUE----------------------------------------------------------------------nls_language Americannls_ Territory americanls_currency $NLS _iso_currency americanls_numeric_characters., Nls_characterset US7ASCIINLS_ CALENDAR gregoriannls_date_format dd-mon-rrnls_date_language americannls_sort Binarynls_time_format hh.mi. Ssxff amparameter VALUE----------------------------------------------------------------------nls_timestamp_ FORMAT DD-MON-RR HH.MI. Ssxff Amnls_time_tz_format HH.MI. Ssxff AM Tzrnls_timestamp_tz_format dd-mon-rr hh.mi. Ssxff AM tzrnls_dual_currency $NLS _comp binarynls_length_semantics bytenls_nchar_conv_excp FALSENLS_NCHAR_ CHARACTERSET al16utf16nls_rdbms_version 9.2.0.4.020 rows selected. Sql>



Conclusion:

For DBAs, there is a very important principle: do not put your database in a dangerous situation!

This requires us to do an effective backup before doing anything that might change the structure of the database, and many DBAs have learned the hard way out of the backup operation.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.