Android Development writeUTF and readUTF principles

Source: Internet
Author: User
Tags 0xc0

Write code to play tonight, use the java.io.RandomAccessFile.writeUTF(String) function, and the file is saved by default as GBK, is obviously garbled. Suddenly remembered to look at the storage coding rules, went to find some articles about the principle of writeUTF (String), in this record.
First need to understand the Unicode and UTF8 rules, found a @feng article "character code notes: Ascii,unicode and UTF-8", write very clear, here to record a paragraph:

| Unicode Symbol Range | UTF-8 Encoding method

| 0000 0000-0000 007F | 0xxxxxxx
| 0000 0080-0000 07FF | 110xxxxx 10xxxxxx
| 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
| 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Below, or take the Chinese character "Yan" as an example, demonstrates how to implement UTF-8 encoding.
Known as "Strict" Unicode is 4E25 (100111000100101), according to the table above, you can find 4E25 in the range of the third row (0000 0800-0000 FFFF), so "strict" UTF-8 encoding requires three bytes, that is, the format is " 1110xxxx 10xxxxxx 10xxxxxx ". Then, starting from the last bits of "Yan", the X in the format is filled in sequentially, and the extra bits complement 0. This gets, "strict" UTF-8 code is "11100100 10111000 10100101", converted into 16 binary is e4b8a5.

That is, 4E25 (100 111000 100101) is filled in the position (1110xxxx 10xxxxxx 10xxxxxx) in turn x !

A point of emphasis in the article is that Unicode is a unified code for unifying the world's many coding problems, and that UTF-8 is just one way to achieve Unicode.

Print System.out.println(Integer.toHexString(‘严‘)); , printing results for 4e25 , Unicode encoding. When using Randomaccessfile.writeutf (String), "strict" to UTF8 is written to a file.

Here is the source code, as well as some of my comments, later recalled the time to use:

Java.io.DataOutputStream.writeUTF (String, DataOutput)
Staticint writeUTF (StringSTR, dataoutput out) throws IOException {int strlen =Str.length ();int Utflen =0;int C,Count =0;/* Depending on the size of C, the size of the storage length Utflen, maximum 65535 bytes, that is 64KB */for (int i =0; i < strlen; i++) {c =Str.charat (i);if ((c >=0x0001) && (c <=0x007F)) {utflen++;}Elseif (C >0X07FF) {Utflen + =3; }else {Utflen + =2; } }if (Utflen >65535)ThrowNew Utfdataformatexception ("Encoded string too long:" + Utflen +"bytes");/* Create an "appropriate" length byte array bytearr, the following +2 is because the data length needs to be stored in the Bytearr first two bytes utflen*/byte[] Bytearr =Nullif (out instanceof DataOutputStream) {DataOutputStream dos = (dataoutputstream) out;if (Dos.bytearr = =null | | (Dos.bytearr.length < (utflen+2))) Dos.bytearr =Newbyte[(utflen*2) +2]; Bytearr = Dos.bytearr; }else {Bytearr =Newbyte[utflen+2]; } bytearr[count++] = (BYTE) ((Utflen >>>8) &0xFF); bytearr[count++] = (BYTE) ((Utflen >>>0) &0xFF);/* If it is ASCII code directly exists in the Bytearr inside, after all, the source code is written by the ASCII code, the probability is high, and omit the following for the judgement of the * *int i=0;for (i=0; i<strlen; i++) {c =Str.charat (i);if (! ( (c >=0x0001) && (c <=0x007F)))Break bytearr[count++] = (BYTE) C; }*/* Above can not be satisfied with the following: */for (; i < strlen; i++) {c =Str.charat (i);if ((c >=0x0001) && (c <=0x007F)) {/* Single byte, encoding rule: 0xxxxxxx, ASCII code processing */bytearr[count++] = (BYTE) C; }Elseif (C >0X07FF) {/* Three bytes, encoding rules: 1110xxxx 10xxxxxx 10xxxxxx Reference above the word "Yan" (100 111000 100101), the results are (' 1110 ' 0100 ' 111000 ' 100101) */bytearr[count++] = (BYTE) (0xE0 | (c >>) &0x0F)); bytearr[count++] = (BYTE) (0x80 | (c >>6) &0x3F)); bytearr[count++] = (byte) (0x80 | ((c >> 0) & 0x3f);} else {/* two bytes, encoding rule: 110xxxxx 10xxxxxx*/bytearr[count++" = (byte) (0xc0 | ((c >> 6) & 0x1f); bytearr[ count++] = (byte) (0x80 | ((c >> 0) & 0x3f);} } out.write (Bytearr, 0, Utflen+2); return Utflen + 2;}        

Understand the rules of writing, and then read the rules, reverse understanding is good.

Java.io.DataInputStream.readUTF (Datainput)
PublicFinalStatic String readUTF (Datainput in)Throws IOException {int utflen = In.readunsignedshort ();byte[] Bytearr =Nullchar[] Chararr =Nullif (ininstanceof datainputstream) {datainputstream dis = (datainputstream) in;if (Dis.bytearr.length < Utflen) {Dis.bytearr =Newbyte[utflen*2]; Dis.chararr =Newchar[utflen*2]; } Chararr = Dis.chararr; Bytearr = Dis.bytearr; }else {Bytearr =NewByte[utflen]; Chararr =NewChar[utflen]; }int C, CHAR2, CHAR3;IntCount =0;int chararr_count=0; In.readfully (Bytearr,0, Utflen);while (Count < Utflen) {c = (int) bytearr[Count] &0xFF;if (C >127)Breakcount++; Chararr[chararr_count++]= (char) C; }while (Count < Utflen) {c = (int) bytearr[Count] &0xFF;Switch (c >>4) {Case0:Case1:Case2:Case3:Case4:Case5:Case6:Case7:/* 0xxxxxxx*/count++; Chararr[chararr_count++]= (char) C;BreakCase12:Case13:/* 110x xxxx 10xx xxxx*/Count + =2;if (Count > Utflen)ThrowNew Utfdataformatexception ("Malformed input:partial character at end"); CHAR2 = (int) bytearr[count-1];if (Char2 &0XC0)! =0x80ThrowNew Utfdataformatexception ("Malformed input around byte" +count); Chararr[chararr_count++]= (Char) ((C &0x1F) <<6) | (Char2 &0x3F));BreakCase14:/* 1110 xxxx 10xx xxxx 10xx xxxx */Count + =3;if (Count > Utflen)ThrowNew Utfdataformatexception ("Malformed input:partial character at end"); CHAR2 = (int) bytearr[count-2]; CHAR3 = (int) bytearr[count-1];if ((Char2 &0XC0)! =0x80) | | ((Char3 &0XC0)! =0x80))ThrowNew Utfdataformatexception ("Malformed input around byte" + (count-1)); chararr[chararr_count++]= (char) (((C & 0x0F) << Span class= "Hljs-number" >12) | ((Char2 & 0x3f) << 6) | ((Char3 & 0x3f) << 0)); break; default: /* 10xx xxxx, 1111 xxxx */ Throw new utfdataformatexception ( "Malformed input around byte" + count); }} //the number of chars produced May is less than Utflen return Span class= "Hljs-keyword" >new String (Chararr, 0, Chararr_count);}     

It all seemed very clear.

Original: http://my.oschina.net/diligentSt/blog/147933

Android Development writeUTF and readUTF principles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.