The problem of character encoding in Java Foundation---->java (i)

Source: Internet
Author: User

Here is a summary of the character encodings in Java, after all, this problem is often encountered in the project.

Encoding format of the file I. Binary byte problem with Chinese
 Public StaticString Charset_text = "I love ll";//16 binary representation of printed bytesPrivate voidPrintbinarys (byte[] buffer) {     for(byteB:buffer) {System.out.print (integer.tohexstring (b& 0xff) + "");    } System.out.println ();} @Test Public voidCharset_1 ()throwsException {/*** Utf-8 is a Chinese three byte, GBK is a Chinese two byte*/Printbinarys (Charset_text.getbytes ());//e6 E7 B1 4c 4cPrintbinarys (Charset_text.getbytes ("Utf-8"));//e6 E7 B1 4c 4cPrintbinarys (Charset_text.getbytes ("GBK"));//CE D2 b0 AE 4c 4cPrintbinarys (Charset_text.getbytes ("iso-8859-1"));//3f 3f 4c 4c}

Second, the character encoding and decoding method
Private voidPrintstringbycharset (byte[] buffer, String charsetname)throwsunsupportedencodingexception {System.out.println (NewString (buffer, charsetname));}
@Test Public voidCharset_2 ()throwsException {/*** 1, charset_text.getbytes ("Utf-8"): Return is E6 E7 B1 4c 4c. Description Charset_text utf-8 encoded into E6, E7 B1 4c 4c * 2, new String (buffer, CharsetName): E6, E7, 4c 4c B1, Utf-8 way to decode 。 Get the Charset_text, so there is no garbled. * 3, as for iso-8859-1 garbled:*/Printstringbycharset (Charset_text.getbytes (),"Utf-8");//I love ll .Printstringbycharset (Charset_text.getbytes ("Utf-8"), "Utf-8");//I love ll .Printstringbycharset (Charset_text.getbytes ("GBK"), "Utf-8");// ??? LLPrintstringbycharset (Charset_text.getbytes ("iso-8859-1"), "iso-8859-1");// ?? LLPrintstringbycharset (Charset_text.getbytes ("iso-8859-1"), "Utf-8");// ?? LL}

Third, about the file encoding and read the contents of the file

It is important to note that the files are stored in binary bytes.

 Public StaticString File_path = "C:/users/76801/desktop/charset/huhx.txt"; Public StaticString Filename_path = "C:/users/76801/desktop/charset/linux.txt";//reading the contents of a file into a binary arrayPrivate voidprintbinaryfromfile (String filePath, String charsetname) {File file=NewFile (FilePath); Try{InputStream stream=Newfileinputstream (file); byte[] buffers =New byte[Stream.available ()];        Stream.read (buffers);        Stream.Close ();    Printstringbycharset (buffers, charsetname); } Catch(Exception e) {e.printstacktrace (); }} @Test Public voidCharset_3 ()throwsexception{//the encoding of the file is Utf-8 no DomPrintbinaryfromfile (File_path, "UTF-8");//I Love you, China. Printbinaryfromfile (File_path, "GBK");//I Love you, Juan Ricoh Ã? //converted to gbk2312, the editor displays the content: I Love you, Juan Ricoh Ã. But delete the content, self-input will not appear garbled. At this time, it is actually changing the real content. Switching the encoding does not change the actual content. Printstringbycharset ("I love You, China.") ". GetBytes (" Utf-8 ")," GBK ");//I Love you, Juan Ricoh Ã?    /*** 1. The contents of a file are first written in Utf-8 encoding: I love You,l Ling. * 2, change its encoding for GBK content display: I love you,l phenomena layer €* 3, in the following test, the normal display. Accordingly, we know that the binary of this file is UTF-8 encoded: 6c 6f, 4 6f, 2c 4c E7 8e B2 E3 80 82 *, change its encoding format, notepad++ does not change its own binary content.     Only the mechanism to be displayed is to decode the GBK 6c 6f, 6f, 2c 4c E7 8e B2 E3 80 82. */Printbinarys ("I Love You,l Ling." ". GetBytes (" Utf-8 "));//6c 6f E7 6f, 2c 4c, 8e B2 E3Printstringbycharset ("I love You,l Ling. ". GetBytes (" Utf-8 ")," GBK ");//I love you,l phenomena layer??Printbinaryfromfile (Filename_path, "UTF-8");//I Love You,l Ling. }

The dynamic diagram for the demo is as follows:

Third, the encoding settings in eclipse affect Java's default encoding

At this point the encoding of the project is the encoding of the Utf-8,firstjava.java file is Iso-8859-1, and thesecondjava. java file is encoded UTF-8.

Here's the code for the test:

 Public classFirstjava { Public Static voidMain (string[] args) {String CharsetName= Charset.defaultcharset (). DisplayName ();//iso-8859-1System.out.println (CharsetName); }} Public classSecondjava { Public Static voidMain (string[] args) {String CharsetName= Charset.defaultcharset (). DisplayName ();//UTF-8System.out.println (CharsetName); }}

In Firstjava, if the file contains Chinese. The following error will be found when saving the file.

Friendship Link

The problem of character encoding in Java Foundation---->java (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.