經常碰到SQLPLUS展現亂碼的問題,字元集和相關的定義都有說明但是很少有能把這些關係說的很簡單易懂的。
在此之前我們需要搞清楚三個概念,作業系統字元集,用戶端字元集,Oracle字元集:
作業系統字元集:對應的參數是LANG,這個參數應該是Oracle資料庫的超集,如果作業系統不支援,那麼我們的資料就會亂碼。這裡的作業系統指的是用戶端的作業系統。伺服器端的作業系統不會影響資料的存取。
資料庫字元集:NLS_CHARACTERSET,可以在nls_database_parameters中查看當前資料庫的字元集,安裝資料庫的時候選擇,一般不修改,不過在新的字元集是現有字元集的嚴格超集的情況下可以改,其他情況下修改可能導致資料庫異常。例如將UTF8字元集修改為AL32UTF8
關於子集超集的映射關係,見如下Oracle官網的文檔的Binary Subset-Superset Pairs。
http://docs.oracle.com/database/121/NLSPG/applocaledata.htm#NLSPG591
用戶端字元集:對應的參數是NLS_LANG,如果用戶端未設定,此時則取的是安裝時資料庫的預設參數
為了協助理解,我畫了一張圖如下,圖中標紅部分如果一致表示資料的儲存方式一致,即如果LANG、NLS_LANG、NLS_CHARACTERSET的編碼是一致的如UTF8,那麼資料的傳輸過程中不會異常,字元亂碼只是顯示問題。
1、作業系統字元集
linux下首先locale 查看字元集
[oracle@oddpc ~]$ localeLANG=en_US.UTF-8LC_CTYPE="en_US.UTF-8"LC_NUMERIC="en_US.UTF-8"LC_TIME="en_US.UTF-8"LC_COLLATE="en_US.UTF-8"LC_MONETARY="en_US.UTF-8"LC_MESSAGES="en_US.UTF-8"LC_PAPER="en_US.UTF-8"LC_NAME="en_US.UTF-8"LC_ADDRESS="en_US.UTF-8"LC_TELEPHONE="en_US.UTF-8"LC_MEASUREMENT="en_US.UTF-8"LC_IDENTIFICATION="en_US.UTF-8"LC_ALL=[oracle@oddpc ~]$ echo $LANGen_US.UTF-8
2、該主機並未安裝中文支援包,設定LANG後可以效果如下,顯然無路如何調整NLS_LANG在這台機器上都無法展現中文
[oracle@evenpc ~]$ export LANG=zh_CN.utf8[oracle@evenpc ~]$ date2016? 10? 13? ??? 15:17:01 CST
3、安裝中文支援包,使用yum -y groupinstall chinese-support 可以安裝中文支援包,安裝過程略過,安裝完畢後可以正常顯示中文
[oracle@oddpc ~]$ export LANG=zh_CN.utf8[oracle@oddpc ~]$ date2016年 10月 13日 星期四 15:14:19 CST
4、接下來就是展現測試,我安裝了兩個資料庫執行個體PROD1和PROD5,PROD1 的字元集是WE8MSWIN1252,PROD5的字元集是AL32UTF8
預設情況下NLS_LANG是空的,此時NLS_LANG取預設安裝時的值,PROD1是AMRICAN,PROD5是SIMPLIFIED CHINESE
[oracle@oddpc ~]$ echo $NLS_LANG[oracle@oddpc ~]$
SQL> show parameter lang NAME TYPE VALUE------------------------------------ ----------- ------------------------------nls_date_language stringnls_language string AMERICAN
SQL> select sysdate from dual;SYSDATE---------13-OCT-16
PROD5
SQL> show parameter lang NAME TYPE VALUE------------------------------------ ----------- ------------------------------nls_date_language stringnls_language string SIMPLIFIED CHINESE
SQL> select sysdate from dual;SYSDATE----------13-10?-16
5、PROD5 發生亂碼,PROD1英文正常,設定下NLS_LANG參數
PROD1 的結果如下,可以看到提示資訊已經變成中文,但是由於字元集非UTF8中文字元存入後將亂碼
[oracle@oddpc ~]$ export NLS_LANG="SIMPLIFIED CHINESE_CHINA.UTF8"[oracle@oddpc ~]$ sqlplus / as sysdbaSQL*Plus: Release 11.2.0.3.0 Production on 星期四 10月 13 15:42:46 2016Copyright (c) 1982, 2011, Oracle. All rights reserved.串連到: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> show parameter langNAME TYPE VALUE------------------------------------ --------------------------------- ------------------------------nls_date_language string SIMPLIFIED CHINESEnls_language string SIMPLIFIED CHINESESQL> show parameter db_nameNAME TYPE VALUE------------------------------------ --------------------------------- ------------------------------db_name string PROD1SQL> show parameter langNAME TYPE VALUE------------------------------------ --------------------------------- ------------------------------nls_date_language string SIMPLIFIED CHINESEnls_language string SIMPLIFIED CHINESESQL> select sysdate from dual;SYSDATE------------13-10? -16
SQL> select * from nls_database_parameters;PARAMETER VALUE---------------------------------------- ----------------------------------------NLS_LANGUAGE AMERICANNLS_TERRITORY AMERICANLS_CURRENCY $NLS_ISO_CURRENCY AMERICANLS_NUMERIC_CHARACTERS .,NLS_CHARACTERSET WE8MSWIN1252NLS_CALENDAR GREGORIANNLS_DATE_FORMAT DD-MON-RRNLS_DATE_LANGUAGE AMERICANNLS_SORT BINARYNLS_TIME_FORMAT HH.MI.SSXFF AMPARAMETER VALUE---------------------------------------- ----------------------------------------NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AMNLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZRNLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZRNLS_DUAL_CURRENCY $NLS_COMP BINARYNLS_LENGTH_SEMANTICS BYTENLS_NCHAR_CONV_EXCP FALSENLS_NCHAR_CHARACTERSET AL16UTF16NLS_RDBMS_VERSION 11.2.0.3.0已選擇20行。
PROD5的結果如下,此時PROD5顯示正常
[oracle@oddpc ~]$ export NLS_LANG="SIMPLIFIED CHINESE_CHINA.UTF8"[oracle@oddpc ~]$ sqlplus / as sysdbaSQL*Plus: Release 11.2.0.3.0 Production on 星期四 10月 13 15:46:36 2016Copyright (c) 1982, 2011, Oracle. All rights reserved.串連到: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing optionsSQL> show parameter db_nameNAME TYPE VALUE------------------------------------ --------------------------------- ------------------------------db_name string PROD5SQL> select sysdate from dual;SYSDATE------------13-10月-16SQL> show parameter langNAME TYPE VALUE------------------------------------ --------------------------------- ------------------------------nls_date_language string SIMPLIFIED CHINESEnls_language string SIMPLIFIED CHINESE
SQL> select * from nls_database_parameters;PARAMETER VALUE---------------------------------------- ----------------------------------------NLS_LANGUAGE AMERICANNLS_TERRITORY AMERICANLS_CURRENCY $NLS_ISO_CURRENCY AMERICANLS_NUMERIC_CHARACTERS .,NLS_CHARACTERSET AL32UTF8NLS_CALENDAR GREGORIANNLS_DATE_FORMAT DD-MON-RRNLS_DATE_LANGUAGE AMERICANNLS_SORT BINARYNLS_TIME_FORMAT HH.MI.SSXFF AMPARAMETER VALUE---------------------------------------- ----------------------------------------NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AMNLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZRNLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZRNLS_DUAL_CURRENCY $NLS_COMP BINARYNLS_LENGTH_SEMANTICS BYTENLS_NCHAR_CONV_EXCP FALSENLS_NCHAR_CHARACTERSET AL16UTF16NLS_RDBMS_VERSION 11.2.0.3.0已選擇20行。
總結:通過以上的實驗可以看出,用戶端展現是否亂碼是由NLS_LANG決定,發生中文亂碼的情況下,首先查看資料庫的NLS_CHARACTERSET是否支援中文儲存,如果不支援,無論如何設定均無法正常顯示中文。Oracle官方文檔上給出了各種語言的編碼支援如下。
http://docs.oracle.com/database/121/NLSPG/applocaledata.htm#NLSPG593
Table A-13 Languages and Character Sets Supported by LCSSCAN and GDK
Language |
Character Sets |
Arabic |
AL16UTF16, AL32UTF8, AR8ISO8859P6, AR8MSWIN1256, UTF8 |
Bulgarian |
AL16UTF16, AL32UTF8, CL8ISO8859P5, CL8MSWIN1251, UTF8 |
Catalan |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
Croatian |
AL16UTF16, AL32UTF8, EE8ISO8859P2, EE8MSWIN1250, UTF8 |
Czech |
AL16UTF16, AL32UTF8, EE8ISO8859P2, EE8MSWIN1250, UTF8 |
Danish |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
Dutch |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
English |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
Estonian |
AL16UTF16, AL32UTF8, NEE8IOS8859P4, UTF8 |
Finnish |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
French |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
German |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
Greek |
AL16UTF16, AL32UTF8, EL8ISO8859P7, EL8MSWIN1253, UTF8 |
Hebrew |
AL16UTF16, AL32UTF8, IW8ISO8859P8, IW8MSWIN1255, UTF8 |
Hindi |
AL16UTF16, AL32UTF8, IN8ISCII, UTF8 |
Hungarian |
AL16UTF16, AL32UTF8, EE8ISO8859P2, EE8MSWIN1250, UTF8 |
Indonesian |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
Italian |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
Japanese |
AL16UTF16, AL32UTF8, ISO2022-JP, JA16EUC, JA16SJIS, UTF8 |
Korean |
AL16UTF16, AL32UTF8, ISO2022-KR, KO16KSC5601, KO16MSWIN949, UTF8 |
Latvian |
AL16UTF16, AL32UTF8, NEE8ISO8859P4, UTF8 |
Lithuanian |
AL16UTF16, AL32UTF8, NEE8ISO8859P4, UTF8 |
Malay |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
Norwegian |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
Persian |
AL16UTF16, AL32UTF8, AR8MSWIN1256, UTF8 |
Polish |
AL16UTF16, AL32UTF8, EE8ISO8859P2, EE8MSWIN1250, UTF8 |
Portuguese |
AL16UTF16, AL32UTF8, US7ASCII, UTF8, WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252 |
Romanian |
AL16UTF16, AL32UTF8, EE8ISO8859P2, EE8MSWIN1250, UTF8 |
Russian |
AL16UTF16, AL32UTF8, CL8ISO8859P5, CL8KOI8R, CL8MSWIN1251, RU8PC866, UTF8 |
Serbian |
AL16UTF16, AL32UTF8, CL8ISO8859P5, CL8MSWIN1251, UTF8 |
Simplified Chinese |
AL16UTF16, AL32UTF8, HZ-GB-2312, UTF8, ZHS16GBK, ZHS16CGB231280 |
Slovak |
AL16UTF16, AL32UTF8, EE8ISO8859P2, EE8MSWIN1250, UTF8 |
Slovenian |
AL16UTF16, AL32UTF8, EE8ISO8859P2, EE8MSWIN1250, UTF8 |
Spanish |