Core Java (8) code points and code units in Java

Source: Internet
Author: User
Tags 04x

Recently I read core Java, but I have never understood it before. I have read many posts and blogs and finally understood it.

In Java, the char [], String, stringbuilder and stringbuffer class adopts UTF-16 encoding, using U + 0000 ~ U + FFFF represents a basic character (BMP character), but Char located in U + d800 to U + dbff and U + dc00 to U + dfff is considered as non-defining characters. Most common Unicode characters can be expressed by a single code unit, while auxiliary characters need to be represented by a pair of code units. That is, the basic character is represented by a char, and the secondary character is represented by a pair of char.

Java uses the Unicode code pointer to indicate the character values (INT type) in the range of U + 0000 and U + 10ffff, and the code unit (UNICODE code unit) represents the 16-bit char value (char type) of the code unit encoded as the UTF-16 ). That is to say, there may be a character whose number of code points is 1 and the number of code units is 2. Therefore, the number of code units is not necessarily the number of characters.

In contrast, code units are more at the bottom layer.

Related functions:

The length () function returns the number of code units required for a given string identified by a UTF-16.

The codepointcount () function returns the number of code points required for a given string identified by a UTF-16.

Test procedure:

Package COM. xujin; public class test {public static void main (string... ARGs) {char [] CH = character. tochars (0x10400); system. out. printf ("U + 10400 high proxy character: % 04x \ n", (INT) CH [0]); // d801 system. out. printf ("U + 10400 low proxy character: % 04x \ n", (INT) CH [1]); // dc00 string STR = new string (CH); system. out. println ("code unit length:" + Str. length (); // 2 system. out. println ("Number of code points:" + Str. codepointcount (0, str. length (); // 1system. out. P Rintln (Str. codepointat (0); // return the code point starting or ending at a given position, 66560system. out. println (Str. charat (1); // return the code unit at the given position. Because it is not defined,? // Print the code of all characters by traversing a string. The Code point is str + = "Hello, world! "; Int I = 0; int CP = Str. codepointat (I); While (I <Str. length () {system. out. println (Str. codepointat (I); If (character. issupplementarycodepoint (CP) I + = 2; // if the position of the CP is the first part of the code point, run else I ++ here ;} /** 66560*72*108*111*119*114*100 */}}

References:

Http://bbs.csdn.net/topics/340195349

Http://blog.csdn.net/longyulu/article/details/7374862

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.