[Java learning notes] Coding Learning

Source: Internet
Author: User

Author: gnuhpc
Source: http://www.cnblogs.com/gnuhpc/

1. ASCII code
In the 1960s s, the United States developed a set of character codes to define the relationship between English characters and binary characters. This is called ASCII code, which has been used till now.
The ASCII code consists of a total of 128 characters. For example, the space is 32 (Binary 00100000), and the uppercase letter A is 65 (Binary 01000001 ). These 128 symbols (including 32 control symbols that cannot be printed) only occupy the last seven digits of one byte, and the first one digit is set to 0.
2. Unicode
If there is an encoding, all the symbols in the world will be included. Every symbol is given a unique encoding, so the garbled problem will disappear. This is Unicode, as its names all represent. This is the encoding of all symbols.
Unicode is, of course, a large collection. The current size can contain more than 1 million characters. Each symbol is encoded differently. For example, U + 0639 represents the Arabic letter ain, U + 0041 represents the English capital letter A, and U + 4e25 represents the Chinese character "strict ". You can query a specific symbol table at unicode.org or a special Chinese character table.
3. UTF-8

With the popularity of the Internet, a unified encoding method is strongly required. UTF-8 is the most widely used Unicode implementation method on the Internet. Other implementations also include UTF-16 and UTF-32, but are basically not needed on the Internet. Repeat, the relationship here is that UTF-8 is one of the Unicode implementation methods.
The biggest feature of UTF-8 is that it is a variable length encoding method. It can use 1 ~ The four bytes indicate a symbol, and the length of the byte varies according to different symbols.
UTF-8 coding rules are very simple, only two:
1) for a single-byte symbol, the first byte is set to 0, and the last seven digits are the Unicode code of this symbol. Therefore, for English letters, the UTF-8 encoding and ASCII code are the same.
2) for the n-byte symbol (n> 1), the first N bits of the first byte are set to 1, and the N + 1 bits are set to 0, the first two bytes are set to 10. The remaining unmentioned binary bits are all Unicode codes of this symbol.

4. Application:

In Java, if we use Java. Io. filereader or Java. Io. filewriter to read and write files, we will find that in filereader and filewriter we can only get the encoding method, but cannot set it. In this way, the encoding settings in filereader and filewriter can only be subject to some lower-layer settings, so it is easy to see garbled characters when reading and writing files encoded in multiple languages. The solution is to use Java. Io. fileinputstream/Java. Io. inputstreamreader and Java. Io. fileoutputstream/Java. Io. outputstreamwriter. In inputstreamreader and outputstreamwriter, you can read and write UTF-8 files by specifying the encoding method. Of course, we can improve the efficiency through java. Io. bufferedreader and Java. Io. bufferedwriter.
For example:
Java. Io. bufferedwriter writer = NULL;
Java. Io. fileoutputstream writerstream = new java. Io. fileoutputstream (filename );
Writer = new java. Io. bufferedwriter (New java. Io. outputstreamwriter (writerstream, "UTF-8 "));
// Do something
// Writing File
Writer. Close ();

You can also use the following methods,

For example:Use Java to convert the file encoding from GBK to utf8

Private Static void transferfile (string srcfilename, string destfilename) throws ioexception {
String line_separator = system. getproperty ("line. separator ");
Fileinputstream FCM = new fileinputstream (srcfilename );
Stringbuffer content = new stringbuffer ();
Datainputstream in = new datainputstream (FCM );
Bufferedreader d = new bufferedreader (New inputstreamreader (in, "GBK "));
String line = NULL;
While (line = D. Readline ())! = NULL)
Content. append (LINE + line_separator );
D. Close ();
In. Close ();
FCM. Close ();
Writer ow = new outputstreamwriter (New fileoutputstream (destfilename), "UTF-8 ");
Ow. Write (content. tostring ());
Ow. Close ();
}

 

Author: gnuhpc
Source: http://www.cnblogs.com/gnuhpc/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.