My Java diary-Char

Source: Internet
Author: User
  • Every char variable in Java is 16-bit, which corresponds to a code unit (Cu) in UTF-16 encoding.Java's Char variables fully comply with UTF-16 coding specificationsFor a detailed explanation of the encoding specification, see 《Thoroughly understand character encoding.
    Because one Cu may correspond to a code point in a unicode table (CP for short, a CP corresponds to a real UNICODE character ), it may only represent one of the two cus of a CP. Therefore, a char variable in Java does not necessarily represent one character.
    When representing a char constant in Java, you can use the \ U escape character to represent a cu. Note that when \ U is used to represent a char constant, the four-digit hexadecimal number is required. For example, '\ u0012'. In this way, a char variable must be 16-bit. If it is written as '\ u12', the Java compiler reports an error. For the supplement characters (that is, the CP value is greater than 0 xFFFF) in the Unicode table, for example, U + 12345, we cannot directly use the '\ u12345' format, and must follow the UTF-16 coding specifications, written
    '\ Ud808 \ udf45'
    The length () method of the string object returns the number of Cus, while the codepointcount () method returns the CP format, which is not necessarily equal. The index parameter in a similar charat () method is also for Cu, while the index parameter of codepoint () method is for CP, the latter index must be obtained through the offsetbycodepoint () method of the string object. The meanings of offsetbycodepoints (INT index, int codepointoffset) parameters are as follows:Starting from the index Cu
    Codepointoffset the index of the first Cu of the CP in the string object
    . Example: String sentence = "\ ud835 \ udd6b \ ud836 \ udd6cqq ";

    • Sentence. offsetbycodepoints (1, 0) refers to the first 1st CP with 0th Cus (\ udd6b) as the starting point, because \ udd6b itself is not started with D8, therefore, \ udd6b itself is treated as an independent CP, so the returned value is 1;
    • Sentence. offsetbycodepoints (1, 1) refers to the first 1st CP starting from 1st Cus (\ udd6b), that is, the Code Point '\ ud836 \ udd6c, because this CP is composed of two Cus, the returned value is the index of the Cu \ ud836, that is, 2
    • Sentence. offsetbycodepoints (2, 0) refers to the first 2nd CP with 0th Cus (\ ud836) as the starting point, because \ ud836 itself is started with D8, therefore, the corresponding CP is still the Code Point '\ ud836 \ udd6c', so the returned value is sentence. offsetbycodepoints (1, 1) is the same as 2
    • Sentence. offsetbycodepoints (1, 2) refers to the first 1st CP starting from 2nd Cus (\ udd6b), corresponding to the Code Point 'q, this CP only corresponds to one Cu, so the returned value is 4.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.