A character in UTF8 can is from 1 to 4 bytesLong, subjected to the following rules:for1-byteCharacter, the first bit is a 0, followed by its Unicode code. For n-bytes character, the first n-bits is all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits bei Ng 10.This is how the UTF-8encoding would work:char. Number Range| UTF-8octet sequence (hexadecimal)|(binary)--------------------+---------------------------------------------0000 0000-0000 007F |0xxxxxxx0000 0080-0000 07FF |110xxxxx 10xxxxxx0000 0800-0000 FFFF |1110xxxx 10xxxxxx 10xxxxxx0001 0000-0010 FFFF |11110xxx 10xxxxxx 10xxxxxx 10xxxxxxGiven An array of integers representing the data,returnWhether it is a valid utf-8encoding. Note:the input is an array of integers. Only the least significant8 bits of each integer are used to store the data. This means each integer represents only 1byteof data. Example1:d ATA= [197, 1], which represents the octet sequence:11000101 10000010 00000001. Returntrue. It is a valid UTF-8 encoding forA 2-bytes character followed by a 1-bytecharacter. Example2:d ATA= [235, 4], which represented the octet sequence:11101011 10001100 00000100. Returnfalse. the first3 bits is all one's and the 4th bit is 0 means it's a 3-bytes character. The nextbyteis a continuationbyteWhich starts with ten and that ' s correct. But the second continuationbyteDoes not start with ten, so it is invalid.
This problem gives a method of judging one single UTF-8 char, and then gives a UTF-8 char sequence, which determines whether the sequence is correct. (Read the question for a long time)
The key to this problem is to learn how to use & to take out a bit sequence.
Binary number notation: Add 0b in front, octal plus 0o, hex plus 0x
1 Public classSolution {2 Public BooleanValidUtf8 (int[] data) {3 if(data==NULL|| data.length==0)return false;4 for(inti=0; i<data.length; i++) {5 if(Data[i] > 255)return false;6 intMorechecks = 0;//morecheck is the number of more bytes, need to check for this Char7 if((Data[i] & 0b10000000) = = 0) morechecks = 0;8 Else if((Data[i] & 0b11100000) = = 0b11000000) Morechecks = 1;9 Else if((Data[i] & 0b11110000) = = 0b11100000) Morechecks = 2;Ten Else if((Data[i] & 0b11111000) = = 0b11110000) Morechecks = 3; One Else return false; A for(intJ=1; j<=morechecks; J + +) { - if(I+j >= data.length)return false; - if((Data[i+j] & 0b11000000)! = 0b10000000)return false; the } -i = i +morechecks; - } - return true; + } -}
Leetcode:utf-8 Validation