Io and Chinese garbled problem in Java

Source: Internet
Author: User

This blog for the first time, so has been tangled how to start, simply go straight to the topic, some of their own summary of the record down, first from the simple learning to start, I hope you can slowly adhere to, later content can be written better and more meaningful.

Actually in the work of Chinese garbled problem encountered not much, that is because the company for the convenience of development so documents are unified code. But I think there is a need to understand a little bit about its principles.

IO is the input and output stream, with object-oriented understanding, is the input, output stream object, mainly used to manipulate the file object. So let's talk a little bit about the concept of a file, the Document object. In Java, file is not a specific file in our normal life, but a path object, such as file File=new file ("D:\\aaa"); This is a file object, perhaps it represents a folder, perhaps this path does not exist, but this code does indeed create a file object that represents the path. This kind of writing is just not used. Because we can usually more is the operation of a text, pictures and so on, such as file F=new file ("Aaa.txt");

It simply says what Io,file is, and then we'll talk about how these pictures, text, videos and other information are stored on our storage devices. The personal understanding is that no matter what type of file is stored in binary form, the minimum unit is 1 byte, or 8-bit 01 composition. So if we want to copy a file, as long as the operation of the byte stream is good, that is, all the bytes in a file to get, write to another file is OK, in fact, in theory, but for the character of the file is more special. This is why there will be a problem with Chinese garbled appearance. ASCII code table Everyone is very familiar with, at least have heard that it should be a very early appearance of a code table, it was originally used to represent 26 English letters and some special symbols (because the computer only recognizes the binary, so the character with the corresponding byte to replace, form a code table). But with the development of computers, ASCII should not be enough, and many countries should also have their own set of coding schemes, so there are different coding tables. There is a common gbk,utf-8, while the JVM uses Unicode encoding by default, that is, a Chinese character is represented in 2 bytes, UTF-8 is not necessarily, maybe 3 bytes represents a Chinese character, or more. So there is a problem, the same character in the different code table corresponding to the number and content of the byte code is not the same. So how to solve?

We copy a picture from a disk onto disk B, just take all the bytes of a to B. But it is also possible to manipulate a text in the same way, provided that the text encoding in A and B is the same. Because the picture does not have a problem with byte encoding. But I want to from the network or the server to transfer the Chinese how to do it, certainly not only through the byte to achieve (because we can not meet the problem to manually change the way the file encoding it). So Java provides a character stream object, that is, on the basis of the byte stream to add the encoding settings, to solve the problem of garbled.

Needless to say, in a few small cases to illustrate:

1, first create a new aa.txt,bb.txt under the current project. Write a few Chinese characters in the AA. Will find both of these ways possible

A, using a character stream

FileReader fr=new FileReader ("Aa.txt");
FileWriter fw=new FileWriter ("Bb.txt");
int C;
while ((C=fr.read ())!=-1) {
Fw.write (c);
}
Fr.close ();
Fw.close ();

b, using a byte stream

FileInputStream fis=new FileInputStream ("Aa.txt");
FileOutputStream fos=new FileOutputStream ("Bb.txt");
int b;
while ((B=fis.read ())!=-1) {
Fos.write (b);
}
Fis.close ();
Fos.close ();

2, at this time if a AA encoding is UTF-8, then we have to change the BB code GBK look, the same run the above two methods, all garbled.

The reason is because of the two files encoded differently, resulting in Chinese check the Code table is different, so garbled.

3, so when both sides of the file are encoded differently, we can specify the encoding corresponding to their file when reading and writing.

Here's how it's implemented:

InputStreamReader isr=new InputStreamReader (New FileInputStream ("Aa.txt"), "Utf-8");
OutputStreamWriter osw=new OutputStreamWriter (New FileOutputStream ("Bb.txt"), "GBK");
Char[] Arr=new char[1024];
int Len;
while ((Len=isr.read (arr))!=-1) {
String S=new string (Arr,0,len);
System.out.println (s);
Osw.write (s);
}
Isr.close ();
Osw.close ();

Although the code is simple, it is simple to explain that InputStreamReader and OutputStreamWriter are all objects that manipulate characters from the API to continue reader and writer.

It is mainly used to convert bytes into characters and characters into bytes. So from the construction can also be found that the input is a byte stream object. Reads bytes into characters with Utf-8, and then converts the characters to bytes written in GBK encoding.

The following lines do not explain, are the basis of the method. In the construction of the anonymous inner class object, there are decorative design patterns, this is a simple way to understand the next.

About the byte stream and character stream in fact there are many very useful classes, such as Bufferedinputstream,bufferedreader and so on, again do not repeat.

Coding problems with the JVM and the system platform are not explained at the time.

You can use string to try and observe the bytecode and encoding problems of the string at compile and run time.

Io and Chinese garbled problems in Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.