Some questions about the character set:
Bloggers have recently begun to learn Oracle, have to say that Oracle's content is really too broad, learn it is not only the database itself, many times will involve a variety of operating systems, remember just under Redhat began to install Oracle when almost a day of work. OK, anyway, recently bloggers found a problem when they were testing some simple SQL statements:
Wow, this is what ghost, can't recognize my big celestial language? Scare me quickly from the Internet to check the reason, seemingly involved in the cause of the character set. In fact, a simple answer, only in the installation of the database when the character set the default language is not selected.
( character set to explain here : Character set (also called character set, Character set) is the character encoding table (codepage), a character regardless of English, Chinese, Korean and so on in the computer system memory or hard disk through binary byte (byte) Save, This binary encoding is a character encoding (also called an inner code), and the character set is the corresponding (mapped) Table of characters and internal codes.
Because of the multinational language, there is a character set which is produced according to the native language. such as the most widely used ASCII coding, made by the U.S. National Standards Agency (ANSI), applies to all Latin and English characters. The Chinese mainland uses character sets such as gb2312,gbk,gb18030, which contain the inner code of all Chinese characters, where gbk,gb18030 is called a large character set and is encoded in traditional Chinese. Hong Kong, Taiwan, Macao region using BIG5 code, BIG5 included in the traditional Chinese (some traditional and Chinese characters have differences) encoding, does not contain simplified Chinese character encoding.
This paragraph is transferred from:
http://www.cnblogs.com/zhwl/p/3745257.html)
Utf-8: variable length, when Utf-8 is used to represent English, it is represented directly using one byte, but it is possible to query Chinese characters by using multibyte bytes to represent
UTF-16: Fixed length, all using two-bit byte, which means that English will occupy a lot of space, but it will read faster
The database character set in the figure above is the setting for the character set of the databases (note that the above is the server-side character set setting.) The client's character set is in the user's environment variables, if the use of default, if inserted into the database characters, the query will appear garbled situation (for reasons see the following description). The option here is the GBK character set that contains all kanji characters.
CharacterSet: Controls the character set used by the client application. The last option Defult territory to understand, this is the choice of the region, will be the currency and number format, region and the calculation of the day and date of the habit, and so have influence, and defualt language what role. It is actually the language used for Oracle messages, and the month and day displays in the date are affected.
The composition of the service-side character set is embodied in the Nls_language, Nls_territory, Nls_characterset Three values of the data dictionary table v$nls_parameters, where Nls_ The value of CharacterSet is the specific database character set. such as using the query statement Sql>select * FROM V$nls_parameters, the following results can be obtained:
PARAMETER VALUE
Nls_language Simplified Chinese
Nls_territory
Nls_characterset ZHS16GBK
That is, the character set used by the current database is ZHS16GBK.
Ps: According to Oracle's official documentation, once the database is created, the character set of the database cannot be changed. The character set of the database service side is set when the data is created. However, you can modify the set character set by using the following methods:
Method One: Rebuilding the database. Set the character set of the database to the desired character set when you set up the database.
Method Two: Modify the sys.props$ table. That is, with the SYS user Landing Oracle, use the following statement to modify the corresponding character set and submit: Sql>update props$ set value$= ' ZHS16GBK ' WHERE name= ' nls_characterset '; sql>commit; This method to change the database character set, only valid for the changed data, that is, the original data in the database is still stored in the original character set. "Very wild path." The base table for Oracle is changed.
Can be used in SQL statements: Sql>select userenv (' language ') from dual;
To query, for example,
Query client Character set: $echo $NLS _lang $env | grep Nls_lang
1, can be added in the. bash_profile file: Export Nls_lang=american_america. Zhs16gbk method to change (permanently set the instance level.) )
2, alter session set Nls_language= ' American ' nls_territory= ' America ';
Alter session set nls_language= ' American ';
Alter session set nls_territory= ' America '; (Session level)
Operating system Character Set:
View: Echo $LANG/env | grep LANG
Gdm-the GNOME Display Manager
G: Graphics D: Desktop M: Manager
1, directly set the way to modify the variable, ordered the following two order:
[Root@david ~]# lang=xxx or export lang=xxx;
[Root@david ~]# lc_all= "xxx" or export lc_all= "XXX";
2, modify the file mode, by modifying the/etc/sysconfig/i18n file control
[Root@david ~]# vim/etc/sysconfig/i18n
Lang= "ZH_CN. GB18030 "is the language of the system
Supported= "ZH_CN. Utf-8:zh_cn. GB18030:zh_CN:zh:en_US. Utf-8:en_us:en "
Sysfont= "Lat0-sun16"
After the file is saved and exited, it will be effective to execute the following command
[Test@pan ~]$ source/etc/sysconfig/i18n
question:
What happens if the operating system character set is inconsistent with the client character set.
The Userenv (' language ') American_america is queried by sql> select Userenv (' language ') from dual. ZHS16GBK where American_america is the same level as the current session (consistent). Or only with the sever end of the relationship.
Can be added in the. bash_profile file: EXPORT nls_lang=american_america. ZHS16GBK is the change to the instance-level client character set?
The real reason for garbled:
NLS (National Language Support) language support. The NLS is a very powerful feature of the database that controls many aspects of the data: such as how data is stored, which generally controls the following two areas:
1. How to encode text data when it is persisted on disk
2. Transparently converts data from one character set to another.
Suppose you store 8 bits of data in the database using the WE8ISO8859P1 character set, but some of your customers are using a 7-bit character set, such as the US7ASCII character set conversion process that usually modifies the data, And you tend to map a larger character set (in this case, the 8-bit character set) to a smaller character set (the 7-bit character set in this case). This is a lossy conversion (lossy conversion), and the character is modified simply because a smaller character set cannot represent every character in a larger character set. But this conversion has to happen. This is also the reason for the garbled. If the database stores data in a single-byte character set, but the customer (such as a Java application, because the Java language uses Unicode) expects the data to be in multibyte representations, the conversion must be performed so that the data can be used by the client application. (http://www.cnblogs.com/kerrycode/p/3749085.html)
Reference documents:
http://blog.itpub.net/29519108/viewspace-1298298/(mostly in English literature)
Http://www.cnblogs.com/kerrycode/p/3749085.html (Basic concept description)
http://www.cnblogs.com/zhwl/p/3745257.html (Oracle Database Text storage solution, very detailed Chinese literature)
http://blog.csdn.net/jovitang/article/details/5174062 (Summary of a technical person)