Java and Java WEB with TOMCAT and ECLIPSE character garbled problems encountered during the learning process and a summary of the solution (ready to add)

Source: Internet
Author: User
Tags wrapper java web

The Java language has the characteristics of cross-platform, Unicode character set encoding.

However, in the development process of processing data involved in the character encoding problem fragmented, especially when dealing with Chinese characters when a inattention may appear a bunch of strange symbols, commonly known as garbled.

For garbled characters, the reason is that because the encoding and decoding process using the wrong character encoding scheme caused.

First of all, I will explain the decoding of the two concepts of understanding (if there are errors please Dalao pointed out, if there is doubt also welcome exchanges, grateful! ):

First, there is a string:

String str= "Hello, encode";

Use the String class method GetBytes (String charset); Specify a character encoding scheme by the string str in its bytearray form, called the encoding process:

byte[] B_str = str.getbytes ("Utf-8");

Use one of the construction methods of the String class overload string (byte[] b,string charset); Specifies that a character encoding scheme generates a string from a bytearray called the decoding process:

String str_b = new String (B_STR, "utf-8");

Similarly, in some classes and methods of converting between a byte stream and a character stream, this is understood as well:

Characters → bytes: encoding

byte → character: decode

For a description of the Java source file encoding:

When writing a Java source file, various encodings may be used, as long as the encoding scheme is specified using the compilation parameters at compile time.

-encoding Character Set

Whether it is. java or. JSP, where the Java source code is involved, you need to tell the compiler which character encoding scheme to use to process the source code.

Most of this work is now done by a variety of integrated development environments, but we still need to have some knowledge of it for a rainy day.

For coding in Java programs:

In a Java program, where coding problems are involved in file reading and writing, and the use of various input and output streams, this section focuses on the coding problems of Java program runtime.

When reading and writing files that store content as characters, the wrapper class is used to wrap the file input and output stream.

These characters read and write wrapper class surface does not involve the coding problem, in fact, the wrapper class generated character stream in the invocation of the underlying byte stream is also required to convert the data between the character-byte, but by default it uses the platform default character encoding scheme to complete the encoding-decoding process.

When we have special needs, we can take the initiative to control this process, let the wrapper class use our specified character encoding scheme for character-byte conversion.

This is the case of a character stream (the wrapper class of the output character PrintWriter, the wrapper class BufferedReader to buffer the read character), and the underlying stream (file IO stream) as an example:

We use InputStreamReader and OutputStreamWriter two intermediate classes to specify a character encoding scheme for character-byte conversions using writer or reader as a constructor in the wrapper class

try {file F = new File ("D:\\coding.txt"); FileOutputStream fos = new FileOutputStream (f); FileInputStream fis = new FileInputStream (f); PrintWriter pw = new PrintWriter (new OutputStreamWriter (FOS, "utf-8")); BufferedReader br = new BufferedReader (new InputStreamReader (FIS, "utf-8"));} catch (Exception e) {e.printstacktrace ();}

With the above code, we implemented the use of wrapper classes and the use of our specified character encoding scheme for more convenient character manipulation.

For other wrapping methods with this same byte-based stream, here is no longer a repeat.

It is worth noting that different character encoding schemes may lose bytes after the wrong decoding.

For example, the odd number of Chinese characters with UTF-8 encoding after GBK decoding and then decoding the UTF-8 to cause some garbled:

try {String str = "hahaha"; byte[] Encode_by_utf8 = str.getbytes ("Utf-8"); String DECODE_BY_GBK = new String (Encode_by_utf8, "GBK"); byte[] ENCODE_BY_GBK = decode_by_gbk.getbytes ("GBK"); String Decode_by_utf8 = new String (ENCODE_BY_GBK, "utf-8"); System.out.println (Decode_by_utf8);//output: haha??} catch (Exception e) {e.printstacktrace ();}

Describe the root cause: The two encoding schemes use different bytes to encode a single Chinese character, resulting in the loss of bytes in the intermediate process, although the overall encoding is restored but the missing boundary bytes result in garbled characters.

For more detailed analysis see this article: http://blog.csdn.net/beyondlpf/article/details/7519786

Eat, go back to write

Java and Java WEB with TOMCAT and ECLIPSE character garbled problems encountered during the learning process and a summary of the solution (ready to add)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.