A summary of the processing of character encoding problems in Java _java

Source: Internet
Author: User

When faced with a stream of byte throttling, if you do not specify its encoding, its actual meaning is not known.
This sentence should also be in the face of "character to Byte, byte turn character" problem time in mind. Otherwise garbled problems may ensue.
In fact, the nature of the garbled problem is encoding and decoding use is not a code, understand the truth is very good to solve the garbled problem.
The common times in Java are as follows:
1. The string class uses the constructor string (byte[] bytes) of byte[], and the string class provides two overloads at the same time
(1) String (byte[] bytes, Charset Charset)
(2) string (byte[] bytes, string charsetname) is used to specify the encoding.

2. The GetBytes function byte[] getBytes () of the string class also has the following two overloads:
(1) byte[] GetBytes (Charset Charset)
(2) byte[] GetBytes (String charsetname)
All without the need to specify the encoding is the use of the platform ' s default charset, you can use System.getproperty ("file.encoding"), Charset.defaultcharset () obtained.
3. PrintStream's print (string s) is also designed to this problem, and for this PrintStream constructor, in addition to PrintStream (file file) and PrintStream (file file, string Csn
Otherwise the string ' s characters are converted into bytes according to the platform ' s default character encoding,
DataOutputStream constructs do not have a method to specify the encoding, but it provides a writeutf (String str)

The first example illustrates the need to specify the encoding:
If a page specifies that the encoding is Utf-8, <meta http-equiv= "Content-type" content= "text/html"; Charset=utf-8 "/>, there is a form on the page, submitted to a servlet
Then the user entered the character of the word stream is according to the specified encoding encoding, for example, you entered the "Hello hi", if it is utf-8, then passed over is the following:

 
 

, we see the back of the Chinese characters each used 3 bytes, this can refer to Utf-8 related knowledge.
But if your page specifies GBK, it will be different:

 [104, 101, 108, 108, 111,-60,-29,-70,-61]

So the servlet side, when using Request.getparameter, should be called
string s = new string (bytes, response.getencoding ()), if you response not set the encoding, then the default encoding null will be converted to the Java platform GBK, then the Chinese will become garbled.
So in order to avoid garbled, JSP site generally set a filter, all the pages, Servet are set up a unified code. Response.setencoding, request.setencoding.

Inside a Java string is a char[], char is a utf-16 encoded cell with 16-bit storage. To do this, when you want to convert characters, strings into byte output to a file, a network, or a stream of bytes read from a file or network to a meaningful character, you need to understand what the code is.

Some experience
the 1.String class is always stored in Unicode encoding.
2. Note the use of string.getbytes ():
Without character set parameters, it relies on the JVM's character set encoding, which is generally GBK on Linux unicode,windows. (To change the JVM default character set encoding, use the option-dfile.encodeing=utf-8 when starting the JVM.)
For security reasons, it is recommended to always call with parameters, for example: String s; S.getbytes ("UTF-8").
The 3.Charset class is very handy,
(1) Charset.encode is encoded by encoding a string to output a byte array in the character set encoding format that you specify.
(2) Charset.decode is the decoding, that is, a byte array in your specified character set encoding format for decoding and output to a string.

Examples are as follows:

 String s = Charset.defaultcharset (). DisplayName ();
  String S1 = "I like you, My Love";
  
  Bytebuffer BB1 = Bytebuffer.wrap (S1.getbytes ("UTF-8"));

  For (Byte Bt:bb1.array ()) {
    System.out.printf ("%x", BT);
  }
  Char[] Use
  char[] charray={' I ', ' L ', ' o ', ' V ', ' e ', ' You '};

  Charbuffer usage
  charbuffer cb = Charbuffer.wrap (Charray);
  Reposition the pointer
  cb.flip ();

  String s2= new String (Charray);

  Bytebuffer usage
  bytebuffer bb2 = Charset.forname ("Utf-8"). Encode (CB);

  Bytebuffer Bb3 = Charset.forname ("Utf-8") using Charset encoding for the specified character set. Encode (S1);

  byte [] b  = Bb3.array ();

  Use CharSet to decode the specified character set to the string
  bytebuffer bb4= bytebuffer.wrap (b);

  String s2 = charset.forname ("Utf-8"). Decode (Bb4). toString ();

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.