Differences between byte stream and byte stream storage in Java, using several common types of data to compare the differences between byte stream and byte stream
Int A = 5;
Boolean B = true;
Char c = 'G ';
String d = "hello ";
Print the data of the above types to the file using the ghost stream:
Printwriter dos = new printwriter (New bufferedwriter (New filewriter ("C: // buffertest.txt ")));
Dos. Print ();
Dos. Print (B );
Dos. Print (C );
Dos. Print (d );
The result is as follows:
A is 5
B is true
C is g
D. Hello.
The character stream is completely consistent with the characters we entered.
Let's look at the byte stream.
Dataoutputstream dos = new dataoutputstream (New fileoutputstream ("C: // streamtest.txt "));
Dos. writeint ();
Dos. writeboolean (B );
Dos. writechar (C );
Dos. writeutf (d );
Dos. writechars (d );
Dos. writebytes (d );
The result is a binary file. Open it in the hexadecimal editor.
A is 00 00 00 05, Int Is four bytes
B is 01, and a Boolean variable is a byte.
C is 00 47, char is two bytes
D. Print three different functions in the file respectively.
The first one is 00 06 E4 BD A0 E5 A5 Bd. The first 00 06 is the addition of writeutf, which is the number of bytes. The next six bytes are the UTF Encoding of "hello, 3 bytes for each Chinese Character
The second one is 4f 60 59 7d. This is the Unicode code of "hello" Big endian. Each Chinese character contains 2 bytes.
The third is 60 7d, which is the low byte of two Chinese characters obtained from 4f 60 59 7d respectively.
Further description
Use NotePad to save different encoding files. The file header has some tags to identify the encoding type of the files. use NotePad to save the files of different encoding types, the encoding can be correctly recognized when you open it in Notepad. If you open the encoding in a hexadecimal editor, you will see that the mark used to identify the encoding type is written in the file header. The types are described as follows:
Ef bb bf UTF-8
FF Fe UTF-16/UCS-2, little endian
Fe FF UTF-16/UCS-2, big endian
FF Fe 00 00 UTF-32/UCS-4, little endian.
00 00 Fe FF UTF-32/UCS-4, big-Endian.
When the UTF-8 holds a character, it is 1-3 bytes in length, that is, 8bit-24bit.
The Code <= 007f is saved as 1 byte.
(Code> = 0080) & (Code <= 0x07ff), saved as 2 bytes
Code> 0800, saved as 3 bytes
The gb2312 encoding for "hello" is C4 E3 Ba C3, with more than 0800 Chinese characters. Therefore, each Chinese character is saved as 3 bytes.
Little endian: Low-address storage, low-byte storage, x86 is in this Order
Big endian: the low address stores high bytes, and the network byte order is in this order.
This article from: It Knowledge Network (http://www.itwis.com) detailed source reference: http://www.itwis.com/html/java/j2se/20080428/1367.html