Tips for using Oracle Database varchar2, oraclevarchar2
The maximum varchar2 length of Oracle, Encoding Problems
Varchar2 type:
Set the Oracle field length to varchar2 (4000 ),
1. When the character set of the database is set to gb2312, a Chinese character occupies two bytes and can contain 2000 Chinese characters,
2. When the character set of the database is set to UTF-8, a Chinese character occupies three bytes and can only store 4000/3 = more than 1300 bytes;
Oracle Database code SIMPLIFIED CHINESE_CHINA.ZHS16GBK, Java background unified UTF-8, the database can also insert Chinese reasons:
The database will first decode the data into bytecode in UTF-8 format when storing project data, and then re-store the decoded bytecode into the database using GBK encoding;
When retrieving data from the database, the database first decodes the data in the database in GBK format into bytecode, and then decodes the decoded bytecode and re-encodes the data in UTF-8 format, finally, return the data to the client.
GBK corresponds to SIMPLIFIED CHINESE_CHINA.ZHS16GBK
Whether ORACLE VARCHAR2 is a byte or a character
Insert seven Chinese characters into the VACHAR2 (20) field of a table in Oracle. The error message is: the inserted value is too large.
You can insert six Chinese characters.
So Baidu, which was originally related to ORACLE Character Set settings. (Previous projects were all set to ZHS16GBK)
Set the character set to ZHS16GBK: The Chinese Character occupies 2 bytes.
Set the character set to AL32UTF8: Chinese characters occupy three bytes.
Query character set:
select * from v$nls_parameters t where t.PARAMETER='NLS_CHARACTERSET';
Varchar2 traps in oracle databases
Oracle databases are familiar to everyone. There is a very common data type in the database: string type.
There are three common types in oracle: varchar2 (byte), varchar2 (char), and nvarchar2 ().
So what are the differences between the three types?
First, we need to remember: whether varchar2 or nvarchar2,The maximum number of bytes is 4000..
Varchar2 (byte): the default representation. For example, if we write varchar2 (100), it is equivalent to varchar2 (100 byte), indicating that the maximum number of bytes is 100, this field can contain up to 100 bytes, emphasizing the space size. Because we describe bytes, we must be careful when saving Chinese characters and other characters. If your database uses GBK encoding, a Chinese character occupies 2 bytes and can store up to 50 Chinese characters. If your database uses UTF8 encoding, A Chinese Character occupies three bytes and can store up to 33 Chinese characters.
Varchar2 (char): indicates that the maximum number of characters is 100. This field can contain up to 100 characters, with an emphasis on the number. Assume that we are writing varchar2 (100 char), so no matter it is a number, letter, Chinese character, it is regarded as a single character, a maximum of 100 write, of course, the more Chinese characters, the larger the space occupied, the above database encoding principles are also followed.For example, if a Chinese character is saved, the bottom layer occupies 2 or 3 bytes, and a letter occupies 1 byte, it definitely does not mean that one letter or number in some articles also occupies 2 or 3 bytes.!
Nvarchar2 (): There is no byte or char, similar to varchar2 (char), but nvarchar2 () shields the database encoding, regardless of the encoding, nvarchar2 () each Chinese Character occupies two bytes.
This is the general tutorial, but if you think more, you will find a fatal problem.
In practice, it is very likely that this writing method will appear: varchar2 (1400 char). We believe that this field cannot exceed 1400 characters at most, this means that we may store 1399 characters, which seems to be correct.
However, if these 1399 characters are all Chinese characters and the length does not exceed 1400, it seems that everything is normal, but we actually lose part of the data. Why?
Because 1399 Chinese characters are encoded by UTF8 (99% of projects are encoded by UTF8 ..), It takes 1399*3 = 4197 bytes. In the beginning of this article, no matter what char is, the maximum length is 4000 bytes, and one cannot be more than one. Therefore, the extra 197 bytes, will be erased, and throughout the process, there is no error prompt, your data will evaporate!
Therefore, for GBK-encoded databases, the safe syntax is varchar2 (2000 char) and nvarchar2 (2000). For UTF8-encoded databases, the safe syntax is: varchar2 (1333 char), nvarchar2 (2000 ).