The "length" functions return the length of char. LENGTH calculates length using characters as defined by the input character set.
LENGTHB uses bytes instead of characters. LENGTHC uses Unicode complete characters. leng2uses UCS2 codepoints. LENGTH4 uses UCS4 codepoints
The length function returns the length of a character. It uses the defined character set to calculate the length.
Lengthb uses bytes to replace characters
VSIZE returns the number of bytes in the internal representation of expr.
Vsize returns the number of internal bytes. Internal representation of expr who can explain.
SQL example:
Select length ('adfad reasonable ') "bytesLengthIs" from dual -- 7
Select lengthb ('adfad') "bytesLengthIs" from dual -- 5
Select lengthb ('adfad reasonable ') "bytesLengthIs" from dual -- 11
Select vsize ('adfad reasonable ') "bytesLengthIs" from dual -- 11
Select lengthc ('adfad reasonable ') "bytesLengthIs" from dual -- 7
Conclusion: Under the UTF-8 character set
Lengthb = vsize
Lengthc = length
Q: How does a Chinese character occupy 3 bytes? Instead of two. Why is the UTF-8 character set?
Who knows ??????
References: Oracle9i SQL Reference Release 2 (9.2)
........................................ ..........
We tested it using the getBytes method of String.
The conclusion is that UTF-8 Chinese characters occupy 3 bytes, gbk Chinese characters occupy 2 bytes, iso-8859-1 Chinese characters are identified as occupying 2 bytes, iso does not support Chinese character encoding, it should all be a Latin letter. oracle does not matter. oracle is only responsible for data storage.
You can use select * from v $ nls_parameters to check the oracle character set.
Below is the test class:
Import java. io. UnsupportedEncodingException;
Public class TextEncoding
{
/**
*
* @ Author: sunflower
* @ Date: 10:09:40 AM
* @ Todo: calls the String's own getBytes (encoding) method,
* Decodes the String into a byte sequence using the specified character set and stores the result in a new byte array.
* @ Param content
* @ Param encode
* @ Return
*/
Public static byte [] getBytes (String content, String charsetName)
Throws UnsupportedEncodingException {
Return content. getBytes (charsetName );
}
/**
*
* @ Author: sunflower
* @ Date: 10:19:40 AM
* @ Todo: calls the String's own getBytes () method,
* Use the default Character Set of the platform to decode the String as a byte sequence and store the result in a new byte array.
* @ Param content
* @ Return
*/
Public static byte [] getBytes (String content ){
Return content. getBytes ();
}
Public static void main (String [] args ){
String content = "1e baby ";
Byte [] len;
Try {
Len = getBytes (content, "UTF-8 ");
System. out. println ("the byte array length is" + len. length );
Len = getBytes (content, "GBK ");
System. out. println ("the byte array length is" + len. length );
Len = getBytes (content, "ISO-8859-1 ");
System. out. println ("the byte array length is" + len. length );
} Catch (Exception e ){
System. out. println ("Can't recognize ");
}
// System. out. println ("the content byte [] length is" + );
}
}
Output:
The byte array length is 8
The byte array length is 6
The byte array length is 4
Trackback: http://tb.blog.csdn.net/TrackBack.aspx? PostId = 1492768