The storage method for UTF-8 format strings in Java.

Source: Internet
Author: User
Tags truncated



Knowledge Points:
Through byte[] bytes= "xxxx". GetBytes ("Utf-8") gets the string parsed into a byte array via Utf-8. In the UTF-8 encoding format, the computer uses 1 bytes to store characters in the ASCII range, with 3 bytes in the stored Chinese character.
UTF-8 is a variable-length byte encoding method. For the UTF-8 encoding of a character, if there is only one byte, its maximum bits is 0, if it is multibyte, its first byte starts at the highest bit, the number of consecutive bits values is 1 determines the number of digits encoded, and the remaining bytes begin with 10. The UTF-8 can be up to 6 bytes.
As table:
1 byte 0xxxxxxx
2 bytes 110xxxxx 10xxxxxx
3 bytes 1110xxxx 10xxxxxx 10xxxxxx
4 bytes 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
5 bytes 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
6 bytes 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

Note: When the UTF-8 encoding in the calculation stores multibyte characters, the first number of the 8 bits is not used as the sign bit, such as direct output, resulting in negative numbers.
byte[] BSS = "This is a magical world". GetBytes ("Utf-8"); SYSTEM.OUT.PRINTLN ("BSS length:" +bss.length);//output: 27, one Chinese with three bytes of storage.                //output: -24-65-103-26-104-81-28-72-128-28-72-86-25-91-98-27-91-121-25-102-124-28-72-106-25-107 -116 for  (byte b:bss) {      System.out.print (b + "");}

To correctly obtain the actual encoded value represented by each byte. This can be done in the following way. (Need to understand the displacement operation, the original code, anti-code, complementary knowledge)

1. Decimal

byte [] BSS = "This is a magical world". GetBytes ("Utf-8");        System.out.println ("BSS length:" +bss.length); // output: 27, one Chinese with three bytes of storage.         //  for         (byte  b:bss) {            System.out.print (integer.valueof (b&0xff) +" ");        }

2.16 Binary

byte [] BSS = "This is a magical world". GetBytes ("Utf-8");        System.out.println ("BSS length:" +bss.length); // output: 27, one Chinese with three bytes of storage.         //  for         (byte  b:bss) {            System.out.print (integer.tohexstring (b& 0xFF) + "");        }

3. Binary

 byte[] BSS = "This is a magical world". GetBytes ("Utf-8"); System.out.println ("BSS length:" +bss.length);//output: 27, one Chinese with three bytes of storage. //output: 11101000 10111111 10011001 11100110 10011000 10101111 11100100//10111000 10000000 11100100 10111000 10101010 11100111 10100101 10011110//11100101 10100101 10000111 11100111 10011010 10000100 11100100//10111000 10010110 11100111 10010101 10001100         for(byteB:BSS) {System.out.print (integer.tobinarystring (b&0XFF) + ""); }

Practice: Mixed string interception in Chinese and English

* By passing in the string and byte-count, intercept the string according to the number of bytes, utf-8 the non-English characters occupy multiple bytes,
* The last truncated character should be discarded if the intercept position is in the middle of a non-English character.

  

 Public classStrtruncate { Public Static voidMain (string[] args)throwsunsupportedencodingexception {Scanner Scanner=NewScanner (system.in); System.out.println ("Input (string, number of bytes)"); String Inputstr=Scanner.nextline (); String Sub=NewStrtruncate (). GETSUBSTR (Inputstr.split (",") [0], integer.valueof (Inputstr.split (",") [1])); System.out.println ("The truncated string is:" +sub); }     PublicString Getsubstr (string resource,intCharlen)throwsunsupportedencodingexception {if(Charlen <= 0) {            return NULL; }        byte[] bytes = Resource.getbytes ("Utf-8"); if(Bytes[charlen] < 0) {             while(! Integer.tobinarystring (Bytes[charlen] & 0xff). StartsWith ("11") ) {Charlen--; }} String subStr=NewString (bytes, 0, Charlen, "Utf-8"); returnsubStr; }}

The results of the implementation are as follows:

The storage method for UTF-8 format strings in Java.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.