About emoji emoticons and utf-16 encoding

Source: Internet
Author: User

Yesterday's colleagues in the iOS group ran into a tricky question: How to get the number of characters in a text box when the input box contains a emoji expression (a emoji expression counts as one character). First from the Java I recently contacted, Java, when using the length method of string, if it is a normal Chinese and English characters, no problem, but if the character's Unicode encoding is greater than 0xFFFF, this length method does not correctly get the number of characters, In fact, such special characters are calculated as 2 characters.     Of course, Java already has a ready-made way to solve this problem: Codepointcount. Unfortunately, it took a long time to find a similar solution in the objective-c. (It seems that the array length is the exact number of characters after substring.) I'm not an iOS programmer and I can't provide a solution in OC for the time being.     But in yesterday's groping, also have a little harvest, take out to share.    1. Emoji expression Most of the Unicode encoding is greater than 0xFFFF, that is, UTF16 encoding occupies 4 bytes, only a small portion of the expression Unicode is less than 0xFFFF, this Part UTF16 encoded 2 bytes. 2. Whether it is Android or iOS, the string that is read from the text box is stored in the UTF-16 encoded (big-endian) Form in memory. (By default) 3. By the way, by extracting the rules of UTF-16 encoding (see this rule, the problem of solving code point count on iOS itself is solved):
   1) If U < 0x10000, encode u as a 16-bit unsigned integer and      terminate.   2) let U ' = u-0x10000. Because U is less than or equal to 0x10ffff,      U ' must being less than or equal to 0xFFFFF. That's, U ' can be      represented in.   3) Initialize-16-bit unsigned integers, W1 and W2, to 0xD800 and      0xdc00, respectively. These integers each      has a value of bits free to encode, the character, and for a total of ten bits.   4) Assign The high-order bits of the 20-bit U ' to the ten low-order bits of W1 and the      low-order bits of U ' to t He low-order      bits of W2. Terminate.   Graphically, steps 2 through 4 look like:   U ' = yyyyyyyyyyxxxxxxxxxx   W1 = 110110yyyyyyyyyy   W2 = 110111xxxxxx Xxxx

About emoji emoticons and utf-16 encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.