Understanding of Oracle Character encoding

Source: Internet
Author: User
Tags sqlplus

Original link: http://blog.csdn.net/dbanote/article/details/9158367

First, query the service-side character setSelectUserenv'language') fromdual; USERENV ('LANGUAGE')    ----------------------------------------------------simplified Chinese_china.    ZHS16GBK Two, the client Nls_lang parameter (that is, the parameter of Sqlplus) This parameter is used to indicate to Oracle the character set of the client operating system (Sqlplus client). Select *  fromV$nls_parameterswhereParameter='Nls_characterset'; PARAMETER VALUE------------------------------ ------------------------------nls_characterset ZHS16GBK Setting up client Nls_lang C:\Users\85243>SetNls_lang=Simplified CHINESE_CHINA.ZHS16GBK C:\Users\85243>SetNls_lang=Simplified Chinese_china.al32utf8 Three, client operating system character set C:\Users\85243>chcp Activity code page:936936 represents the GBK encoding 65001 for the UTF-8

With the above 3 basic concepts, let me explain the basic principles of Oracle character set translation:
1. Set the Nls_lang of the client as the character set of the client operating system
2. If the database character set equals Nls_lang, the database and the client transmit the characters without any conversion
3. If they are not equal, you need to convert between different character sets, only the client operating system character set is a subset of the database character set on the basis of the correct conversion, otherwise garbled.

analysis of several common situations1. The database character set is Unicode (UTF-8 encoding) Our database version is ten.2.0.4.0, the database character set is: SQL> Select *  fromV$nls_parameterswhereParameter='Nls_characterset'; PARAMETER VALUE---------------------------------------- ------------------------------  Nls_characterset Al32utf82The client operating system character set is code page 936 (character set ZHS16GBK) you can use CHCP to get the code page for Windows C:\Documents andSettings\a105024\desktop>chcp Active code page:936                    3. Create a test table SQL> Create TableTest (ID Number,var varchar2( -)); Tablecreated. 4. Insert data here in the same operating system boot two Session,session1 Nls_lang is set to the same as the database character set (that is, Al32utf8): C:\Documents andSettings\a105024\desktop>SetNls_lang=simplified Chinese_china.al32utf8 Connect to the database and insert a piece of data: session_1>Insert  intoTestValues(1,'China'); 1row created. Session_1>Commit; CommitComplete . The Session2 Nls_lang is set to the same as the client operating system (i.e. ZHS16GBK): C:\Documents andSettings\a105024\desktop>SetNls_lang=simplified CHINESE_CHINA.ZHS16GBK Connect to the database and insert a piece of data: session_2>Insert  intoTestValues(2,'China'); 1row created. Session_2>Commit; CommitComplete . 5the execution query executes the query on session 1: session_1>Select *  fromtest; IDVAR          ---------- ---------------------                   1China2Trickle Ricoh Executing query on session 2: Session_2>Select *  fromtest; IDVAR          ---------- --------------------                   1??? 2China The above example looks very strange, session1 and 2 can display their own inserted string normally, and do not display the other side of the string inserted correctly. In order to figure out, we first need to know how these two strings are stored in the database. We can use the dump function to get the characters encoded in the database: SQL> SelectIdDump(var,1016) fromtest; IDDUMP(VAR,1016)      -- ------------------------------------------------------------       1Typ=1 Len=4CharacterSet=Al32utf8:d6,d0,b9,fa2Typ=1 Len=6CharacterSet=AL32UTF8:E4,B8,AD,E5,9B,BD According to Al32utf8 's code, the correct encoding for the word "China" is (both 3 bytes): Medium--E4,b8,adCountry--E5,9B,BDTherefore session 1 inserted string in the database encoding is wrong, session 2 is correct.    Scene 1:session 1 Insert, Session 1 query, store errors in the database, but display correctly. Insert process: "China" in the client operating system character Set ZHS16GBK encoding is "D6,D0,B9,FA", because the Nls_lang and the database character set is the same, the database side of the client passed the character encoding without any conversion directly into the database, Therefore, the encoding stored in the database is also "D6,D0,B9,FA", the reading process: the database side reads the encoding is "D6,D0,B9,FA", because the Nls_lang and the database character set is the same, the client to the database end passed the character encoding does not make any conversion direct display, the encoding "D6, D0,b9,fa "In the client operating system character set ZHS16GBK the corresponding kanji is" China ".    Scene 2:session 1 Insert, Session 2 query, store errors in the database, display also error.    The insertion process, like scenario 1, is no longer described here. Read process: The database-side reads the encoding is "D6,D0,B9,FA", due to the Nls_lang and database character set, the client to the database side passed the character encoding conversion, the database side character set Al32utf8 "D6,d0,b9,fa"    The corresponding encoding cannot be found in the client operating system character set ZHS16GBK, so it has to be replaced.    Scene 3:session 2 inserted, session 1 query, stored correctly in database, but display error. Insert process: "China" in the client operating system character Set ZHS16GBK encoding is "D6,D0,B9,FA", due to Nls_lang and the database character set, Oracle will be a character encoding conversion, that is, the character set ZHS16GBK "Chinese" encoding "    D6,d0,b9,fa "translated to the character set" Al32utf8 "in" China "code" E4,B8,AD,E5,9B,BD ". Read process: The database side reads the encoding is "E4,B8,AD,E5,9B,BD", because the Nls_lang and the database character set is the same, the client to the database side passes over the character encoding does not make any conversion direct display, the code "E4,B8,AD,E5,9B,BD" In the client operating system character set ZHS16GBK corresponds to the Chinese character "Juan Ricoh" (originally 2 characters, now becomes 3 characters, because ZHS16GBK characters are encoded in 2 bytes). Scene 4:session 2 Insert, Session 2 query, inThe database is stored correctly, and the display is correct.    The insertion process is similar to Scenario 3. Read process: The database-side read encoding is "E4,B8,AD,E5,9B,BD", due to the Nls_lang and database character set, the client to the database-side transmission of character encoding, the database-side character set Al32utf8 "China" two word encoding "E4,b8,ad , E5,9B,BD "translated into the client operating system character set ZHS16GBK" China "in the Code" D6,D0,B9,FA ", and normal display.

Original:
http://blog.csdn.net/dbanote/article/details/9158367

Understanding of Oracle Character encoding

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.