[Java learning notes] Coding Learning

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: gnuhpc
Source: http://www.cnblogs.com/gnuhpc/

1. ASCII code
In the 1960s s, the United States developed a set of character codes to define the relationship between English characters and binary characters. This is called ASCII code, which has been used till now.
The ASCII code consists of a total of 128 characters. For example, the space is 32 (Binary 00100000), and the uppercase letter A is 65 (Binary 01000001 ). These 128 symbols (including 32 control symbols that cannot be printed) only occupy the last seven digits of one byte, and the first one digit is set to 0.
2. Unicode
If there is an encoding, all the symbols in the world will be included. Every symbol is given a unique encoding, so the garbled problem will disappear. This is Unicode, as its names all represent. This is the encoding of all symbols.
Unicode is, of course, a large collection. The current size can contain more than 1 million characters. Each symbol is encoded differently. For example, U + 0639 represents the Arabic letter ain, U + 0041 represents the English capital letter A, and U + 4e25 represents the Chinese character "strict ". You can query a specific symbol table at unicode.org or a special Chinese character table.
3. UTF-8

With the popularity of the Internet, a unified encoding method is strongly required. UTF-8 is the most widely used Unicode implementation method on the Internet. Other implementations also include UTF-16 and UTF-32, but are basically not needed on the Internet. Repeat, the relationship here is that UTF-8 is one of the Unicode implementation methods.
The biggest feature of UTF-8 is that it is a variable length encoding method. It can use 1 ~ The four bytes indicate a symbol, and the length of the byte varies according to different symbols.
UTF-8 coding rules are very simple, only two:
1) for a single-byte symbol, the first byte is set to 0, and the last seven digits are the Unicode code of this symbol. Therefore, for English letters, the UTF-8 encoding and ASCII code are the same.
2) for the n-byte symbol (n> 1), the first N bits of the first byte are set to 1, and the N + 1 bits are set to 0, the first two bytes are set to 10. The remaining unmentioned binary bits are all Unicode codes of this symbol.

4. Application:

In Java, if we use Java. Io. filereader or Java. Io. filewriter to read and write files, we will find that in filereader and filewriter we can only get the encoding method, but cannot set it. In this way, the encoding settings in filereader and filewriter can only be subject to some lower-layer settings, so it is easy to see garbled characters when reading and writing files encoded in multiple languages. The solution is to use Java. Io. fileinputstream/Java. Io. inputstreamreader and Java. Io. fileoutputstream/Java. Io. outputstreamwriter. In inputstreamreader and outputstreamwriter, you can read and write UTF-8 files by specifying the encoding method. Of course, we can improve the efficiency through java. Io. bufferedreader and Java. Io. bufferedwriter.
For example:
Java. Io. bufferedwriter writer = NULL;
Java. Io. fileoutputstream writerstream = new java. Io. fileoutputstream (filename );
Writer = new java. Io. bufferedwriter (New java. Io. outputstreamwriter (writerstream, "UTF-8 "));
// Do something
// Writing File
Writer. Close ();

You can also use the following methods,

For example:Use Java to convert the file encoding from GBK to utf8

Private Static void transferfile (string srcfilename, string destfilename) throws ioexception {
String line_separator = system. getproperty ("line. separator ");
Fileinputstream FCM = new fileinputstream (srcfilename );
Stringbuffer content = new stringbuffer ();
Datainputstream in = new datainputstream (FCM );
Bufferedreader d = new bufferedreader (New inputstreamreader (in, "GBK "));
String line = NULL;
While (line = D. Readline ())! = NULL)
Content. append (LINE + line_separator );
D. Close ();
In. Close ();
FCM. Close ();
Writer ow = new outputstreamwriter (New fileoutputstream (destfilename), "UTF-8 ");
Ow. Write (content. tostring ());
Ow. Close ();
}

Author: gnuhpc
Source: http://www.cnblogs.com/gnuhpc/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Java learning notes] Coding Learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Java learning notes] Coding Learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support