Concept Description
The Oracle database is divided into the national character set (character set) and the database character set (db character set). Both are set when you create the database. The national character set is mainly used for nchar, NVARCHAR, NCLOB Types of field data, and the database character set is used widely for: CHAR, VARCHAR, CLOB, field data of type long;
Oracle's character set names generally consist of the language or range, the bit number representing a character, the standard character set name (optional, S, or C, which represents the server or client). Oracle Character Set UTF8 and UTFE do not conform to this rule, and other basic forms are the same. Nls_lang=<language>_<territory>.<clients characterset>
Set Nls_lang=american_america. UTF8
Set Nls_lang=simplified Chinese_america. UTF8
NLS (National Language Support) language support. The NLS is a very powerful feature of the database that controls many aspects of the data: such as how data is stored, which generally controls the following two areas:
How text data is encoded when it is persisted on disk
Transparently converts data from one character set to another.
Suppose you store 8 bits of data in the database using the WE8ISO8859P1 character set, but some of your customers are using a 7-bit character set, such as the US7ASCII character set conversion process that usually modifies the data, and you tend to put a larger character set (in this case 8 bit character set) is mapped to a smaller character set (the 7-bit character set in this example). This is a lossy conversion (lossy conversion), and the character is modified simply because a smaller character set cannot represent every character in a larger character set. But this conversion has to happen. This is also the reason for the garbled. If the database stores data in a single-byte character set, but the customer (such as a Java application, because the Java language uses Unicode) expects the data to be in multibyte representations, the conversion must be performed so that the data can be used by the client application.
There are several types of Unicode characters supported by Oracle, and the following list shows the name of the character set, the corresponding database version range, and the version of Unicode that is used.
AL24UTFFSS: Oracle is the first Unicode-enabled character set, starting with version 7.2, but it supports Unicode version 1.1, so this character set is not supported from 9i.
UTF8: is the UTF-8 encoded character set that Oracle started from ORACLE8, from ORACLE8.0 to Oracle8.16,unicode version 2.1, and ORACLE817 to 10g, with a Unicode standard of 3.0
UTFE: For the database Unicode character set on the EBCDIC code platform. Therefore it belongs to the character set used by the special system, and other attributes are basically the same as UTF8.
Al32utf8: is the UTF-8 encoded character set that is used from ORACLE9, and it uses the Unicode version update compared to UTF8, which uses the Unicode 4.01 standard in the 10g version, and UTF8 because of compatibility considerations, The Unicode 3.0 standard is used in the 10g version.
AL16UTF16: Oracle is the first UTF-16 encoding character set, starting from ORACLE9, is used as the default national character set, it can not be used as a database character set. This is because the character set of the database determines the encoding of the SQL and Pl/sql source code, and for UTF-16, which uses a fixed two-byte encoding scheme to represent the English alphabet, it does not fit to be used as a database character set. The database character set currently used by Oracle is a coding scheme based on ASCII or Ebcdid as a subset.
For Us7ascii, the representation region is us, with 7 bits representing one character, and the standard character set name is ASCII.
For Chinese character set ZHS16GBK, Simplified Chinese (zht is traditional Chinese), one character requires 16 bits, the standard character set name is GBK. While zhs16cgb231280 represents Simplified Chinese, one character requires 16 bits, and the standard character set name is GB231280, which belongs to the GB2312-80 standard we mentioned earlier in 1981. Although we say that the GBK coding standard is an extension of the GB2312 encoding standard, the database character set ZHS16GBK and zhs16cgb231280 are not strictly the relationship between the superset and the subset, mainly that some Chinese characters are encoded in the two character sets of the values are different, As a result, they have problems with character set translation.
View Character Set parameters
1: View Nls_characterset: Character Set, Nls_nchar_characterset: national Character set
Instance Character Set environment
SELECT * from Nls_instance_parameters
It mainly involves the value of nls_language and Nls_territory. Nls_instance_parameters It comes from V$parameter, note: A lot of information on the Internet that "nls_instance_parameters represents the setting of the client's character set, can be parameter file, environment variable or registry", And everyone on the internet is also cloud. Remember that it is a character set environment that represents an instance.
Database available Character Set parameter settings
SELECT * from V$nls_valid_values
Database server Character Set
SELECT * from Nls_database_parameters
Nls_database_parameters is derived from props$, which represents the character set of the database.