Java Chinese character encoding details console output

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The default encoding of many files is ISO-8859-1, while the default encoding of the Chinese operating system is gb18030, the project code established in this workspace is gb18030. our commonly used encoding is UTF-8, this provides better international support for plug-ins. If the default encoding is not changed during JSP file writing, Chinese characters cannot be output normally and garbled characters are displayed. The default encoding of the eclipse workspace is the default Operating System encoding, which is consistent with that of the simplified Chinese operating system (Windows XP, Windows 2000). If the encoding is gb18030, the initial Java file is also gb18030.

Java file ------ compiled into bytecode ----- Java running configuration ----- output to the console

You must ensure that the internal conversion process of each link is correct to avoid garbled characters.

Modifying the project space encoding format in eclipse

1. Windows-> preferences... open the "Preferences" dialog box, left navigation tree, navigate to general-> workspace, right text fileencoding, select other, change to UTF-8, text fileencoding in its properties dialog box, which is a new project later, is the UTF-8.

2. Windows-> preferences... open the "Preferences" dialog box, navigation tree on the left, navigate to general-> contenttypes, context types tree on the right, open each subitem in text, enter "UTF-8" in, and click Update!

Other files related to Java application development, such as properties and XML, have been specified by the eclipse default, respectively for the ISO8859-1, UTF-8, such as the development of the need to change the encoding format can be specified here.

Myeclipse encoding settings

My myeclipse installation after the default encoding is gb18030, people outside is generally recommended to use UTF-8. If garbled characters are found after the project is imported, the encoding settings are incorrect.

Global encoding settings: toolbar --> window --> preferences --> General --> workspace --> textfile encoding to set the appropriate encoding.

Local encoding settings: Right-click the source code and choose --> General --> editors --> testeditors --> spelling --> encoding. Here we set the encoding of a single file.

We recommend that you use global encoding settings.

4. After the three steps above, the new Java file is UTF-8 encoding, eclipse compilation, running, debugging is no problem, but the RCP application product output, or plug-in output, there is always an error. Either the compilation fails (compile must be re-compiled during output), or the Chinese characters are garbled during the running of the output plug-in. Now you need to add a line in the build. Properties of the RCP application, or plug-in Plug-in project, javacdefaultencoding .. = UTF-8. Let the output compile know the UTF-8 encoding for the Java source file. This setting needs to ensure that all Java source files are UTF-8 encoding format, if not all, you can refer
Eclipse help (plug-indevelopment environment guide> reference> feature and plug-in buildconfiguration), it is recommended that all Java source files are UTF-8 encoding.

If the plug-in development, RCP application development based on other encoding, such as gb18030, want to convert to UTF-8, first, do the above work; then through the search encoding conversion tool, for example, based on the iconv batch conversion tool, convert the original encoding to UTF-8 encoding, note that only the conversion of Java source files, other types of files may already be more appropriate encoding; change text fileencoding from the original project property to the UTF-8.

System. Out. println uses the printstream class to output character data to the console. Printstream uses the default encoding method of the platform to output the character. In our Chinese system, the default mode is GBK. Therefore, Unicode characters in the memory are transcoded into GBK format and sent to the output service of the operating system. Because our operating system is a Chinese system, we use GBK encoding to print characters on the terminal display device. In this step, if our character is no longer GBK encoding, the terminal will display garbled characters.

-System. Out. println encodes the correct Unicode characters in the memory into GBK and then sends them to the eclipse console. Wait, we can see that in the common tag of the run configuration dialog box, the character encoding of the console is set to UTF-8! The problem lies here. System. Out. println has encoded the character into GBK, And the console still reads the character in UTF-8 format, will naturally appear garbled.

Set the character encoding in the console to GBK to solve the garbled problem.

(Here, we add that the console encoding of eclipse inherits the workspace settings. Generally, the console encoding does not have the GBK option and cannot be entered. We can first enter GBK In the workspace encoding settings, and then we can see the GBK options in the settings on the console, after setting, change the character encoding of the workspace to UTF-8 .)

From coding to browsing on a browser, JSP pages have a total of four character encoding/decoding times.

　　1. Save the JSP file with some character encoding

2. Tomcat uses the specified encoding to read the JSP file and compile it.

3. Tomcat sends HTML content with specified encoding to the browser

4. the browser parses HTML content with the specified Encoding

Once an error occurs, garbled characters are displayed. We will analyze how each character encoding is set in sequence.

"% @ Page Language =" Java "contenttype =" text/html; charset = UTF-8 "pageencoding =" UTF-8 "%" at the beginning of the JSP file, wherePageencodingIt is used to tell Tomcat the character encoding used for this file. This encoding should be consistent with the encoding used for saving files in eclipse. Tomcat uses this encoding method to read and compile JSP files.

-In the page tagContenttypeSet the encoding used by Tomcat to send HTML content to the browser. This encoding will be specified in the HTTP response header to notify the browser.

For example, we use the following code in the JSP file:

Bufferedreader reader = new bufferedreader (New filereader ("D: \ test.txt "));

String content = reader. Readline ();

Reader. Close ();

'% = Content %'

Test.txt stores Chinese characters, but garbled characters are displayed in the browser. This is a common problem. We continue to use the previous method to analyze the input and output streams step by step.

1. test.txt saves Chinese Characters in some encoding method, such as UTF-8.

2. bufferedreaderreads the byte content of test.txt directly and constructs the string by default. After analyzing the code of bufferedreader, we can see that bufferedreader calls the read method of filereader, and filereader callsThe native read method of fileinputstream. The so-called native method is the underlying method of the operating system. Therefore, fileinputstream reads files in GBK mode by default. Because we store test.txt with a UTF-8,Therefore, reading the file content here using GBK is incorrectly encoded.

3. '% = content %' is actually out. print (content), here again use the HTTP output stream jspwriter, so the string content is encoded as a byte array by the UTF-8 specified in the JSP page tag is sent to the browser side.

4. the browser decodes characters in the way specified in the HTTP header, and whether it is decoded with GBK or UTF-8, the display is garbled.

We can see that the character encoding conversion in the second step error, the UTF-8 string is read into the memory as GBK.

There are two methods to solve this problem. One is to save test.txt as GBK, fileinputstream can correctly read Chinese characters; the other is to use inputstreamreader to convert character encoding, for example:

　Inputstreamreader sr = new inputstreamreader (New fileinputstream ("d :\\ test.txt"), "UTF-8 ");

Bufferedreader reader = new bufferedreader (SR );

In this way, Java uses UTF-8 to read character data from the file.

In addition, we can add dfile after the Java command. the encoding parameter specifies the default character encoding used by the VM to read files, such as Java-dfile. encoding = UTF-8 test. In this way, we use system in Java code. getproperty ("file. encoding) UTF-8.

4. After JSP reads the Chinese parameters in request. getparameter, garbled characters are displayed on the page.

In Java Web applications, Chinese processing of parameters in the request object has always been a common and most difficult monster. This is often just done, and there is garbled code. This process is the main cause of this complexity.The number of characters in codec is notIt is often used, and browsers, Web servers, especially tomcat, cannot provide us with satisfactory support.

First, we will analyze the garbled characters of parameters uploaded in get mode.

For example, enter the following URL in the address bar of the browser: http: // localhost: 8080/test. jsp? Param = Hello everyone

Our JSP code processes the param parameter as follows:

% String text = request. getparameter ("Param"); %

'% = Text %'

In this simple two-sentence code, we are likely to see such garbled code on the page :? Ó ????

There are many articles and methods on the Internet for processing garbled characters in request. getparamter, which are also correct, but there are too many methods that people have never understood. Here we will analyze what is going on.

First, let's take a look at the encoding settings related to the request object:

　　1. character encoding of JSP files

2. Request the character encoding of the source page with the URL parameter

3. In the advanced settings of IE, the option "sending URL addresses in UTF-8 mode"

4. Configure uriencoding in Tomcat server. xml

5. function request. setcharacterencoding ()

6. js's encodeuricomponent function and Java's urldecoder class

It's no wonder that people are dizzy with so many related encoding settings. Here we will give you an analysis based on various situations.

From this table, we can see that, Ie's "sending URL addresses in UTF-8 mode" settingThe parsing of parameter is not affected, but the request URL from the page is different from the URL entered from the address bar.

According to the phenomena listed in this table, you only need to use smartsniff to capture several network packages and investigate the source code of Tomcat a little. The following conclusions can be drawn:

1. "sending URLs in UTF-8 format" in iesettings only takes effect on the path part of the URL and does not apply to query strings. That is to say, if this option is selected, it is similar to http: // localhost: 8080/test/. jsp? Param = Hello everyone, the previous "Hello everyone" will be converted into UTF-8 format, and the last one will not change. The UTF-8 format mentioned here should be UTF-8 + escape, that is, % B4 % F3 % BC % D2 % Ba % C3.

　So what encoding is used to query the Chinese characters in a string and transfer them to the server? The answer is the system default encoding, that is, GBK.That is to say, in our Chinese operating system, the query string sent to the Web server is always encoded in GBK.

2. Request a URL through link, Location redirection, or opening a new window on the page. What encoding is used for Chinese characters in the URL? A: The encoding type of the page. That is to say, if we access http: // localhost: 8080/test. jsp from a link on a source JSP page? Param = Hello everyone, this URL, if the source JSP page encoding is UTF-8, then hello everyone the encoding of these words is UTF-8.

In the address bar, directly enter the URL address or paste it from the system clipboard to the address bar. This input is not initiated from the page, but by the operating system, therefore, this encoding is only the default encoding of the system and is irrelevant to any page. We also found that, in different browsers, pages opened through links, if you press enter on the address bar, the results will be different. IE will not change after you press the Enter key, but may change with garbled characters or garbled characters. If you press enter on IE, the actual sent URL is the memory URL that was previously remembered, and the URL sent from the current address bar on aoyou is retrieved again.

3.If the uriencoding of Tomcat is not set, the URL is decoded using the ISO-8859-1 by default, and the configured encoding method is used to decode the URL. This decoding includes both the path part and the query string part.It can be seen that this parameter is the most critical setting for the Chinese parameters passed in get mode. However, this parameter is only valid for parameters passed in get mode and is invalid for post. After analyzing the source code of Tomcat, we can see that when requesting a page, Tomcat will try to construct a request object. In this objectReads the uriencoding value from server. xml., AndQuerystringencoding variable assigned to parameters classAnd this variable will be resolvedThe get parameter in request. getparameter is used to guide character decoding.

4.The request. setcharacterencoding function is only valid for post parameters and invalid for get parameters.This function must be used before the first request. getparameter call. This is because the parameters class has two character encoding parameters, one is encoding and the other is querystringencoding, while setcharacterencoding sets encoding, which is used only when post parameters are parsed.

Therefore, we usually need to separate post and get character encoding,The built-in filter of Tomcat can only process postIn additionSet uriencoding to set get. This is troublesome and uriencoding cannot dynamically differentiate Encoding Based on the content. It is always a problem.

　When investigating Tomcat code, another parameter usebodyencodingforuri in server. XML was found to solve this problem.If this parameter is set to true, Tomcat willUse the character encoding set by request. setcharacterencoding to parse the get parameter. In this way, the setcharacterencodingfilter can process both get and post parameters.

1. garbled

Occasion: When the page itself has Chinese Characters

Solution: servlet: resp. setcontenttype ("text/html; charset = GBK ");

JSP: <% @ page contenttype = "text/html; charset = gb2312" %>

Note: It must be written before printwriter out = resp. getwriter ();

Occasion: Solve the Problem of garbled get:

Solution: Modify server. xml à uriencoding = "GBK"

Occasion: Solves the garbled content submitted in post Mode

Solution: request. setcharacterencoding ("GBK ");

Note: It must be written before the first parameter is accessed.

Do not call response. setcharacterencoding ("GBK ");

Occasion: <JSP: Param name = "user" value = "<% = S %>"/>. The URL contains Chinese parameters.

Solution: <% request. setcharacterencoding ("GBK"); %>

Note:

Character Set

Character encoding

Corresponding Language

ASCII

ASCII (7-bit)

English

ISO-8859-1

ISO-8859-1 (8 bits)

Latin letters

Gb2312

Gb2312 (16 bits)

Simplified Chinese

GBK

GBK (16 bits)

Simplified Chinese

Unicode

UTF-8 (up to three bytes)

Multi-language

Minute

Java file ------ compiled into bytecode ----- Java running configuration ----- output to the console

You must ensure that the internal conversion process of each link is correct to avoid garbled characters.

Modifying the project space encoding format in eclipse

Myeclipse encoding settings

Global encoding settings: toolbar --> window --> preferences --> General --> workspace --> textfile encoding to set the appropriate encoding.

Local encoding settings: Right-click the source code and choose --> General --> editors --> testeditors --> spelling --> encoding. Here we set the encoding of a single file.

We recommend that you use global encoding settings.

Set the character encoding in the console to GBK to solve the garbled problem.

From coding to browsing on a browser, JSP pages have a total of four character encoding/decoding times.

　　1. Save the JSP file with some character encoding

2. Tomcat uses the specified encoding to read the JSP file and compile it.

3. Tomcat sends HTML content with specified encoding to the browser

4. the browser parses HTML content with the specified Encoding

Once an error occurs, garbled characters are displayed. We will analyze how each character encoding is set in sequence.

-In the page tagContenttypeSet the encoding used by Tomcat to send HTML content to the browser. This encoding will be specified in the HTTP response header to notify the browser.

For example, we use the following code in the JSP file:

Bufferedreader reader = new bufferedreader (New filereader ("D: \ test.txt "));

String content = reader. Readline ();

Reader. Close ();

'% = Content %'

1. test.txt saves Chinese Characters in some encoding method, such as UTF-8.

4. the browser decodes characters in the way specified in the HTTP header, and whether it is decoded with GBK or UTF-8, the display is garbled.

We can see that the character encoding conversion in the second step error, the UTF-8 string is read into the memory as GBK.

　Inputstreamreader sr = new inputstreamreader (New fileinputstream ("d :\\ test.txt"), "UTF-8 ");

Bufferedreader reader = new bufferedreader (SR );

In this way, Java uses UTF-8 to read character data from the file.

4. After JSP reads the Chinese parameters in request. getparameter, garbled characters are displayed on the page.

First, we will analyze the garbled characters of parameters uploaded in get mode.

For example, enter the following URL in the address bar of the browser: http: // localhost: 8080/test. jsp? Param = Hello everyone

Our JSP code processes the param parameter as follows:

% String text = request. getparameter ("Param"); %

'% = Text %'

In this simple two-sentence code, we are likely to see such garbled code on the page :? Ó ????

First, let's take a look at the encoding settings related to the request object:

　　1. character encoding of JSP files

2. Request the character encoding of the source page with the URL parameter

3. In the advanced settings of IE, the option "sending URL addresses in UTF-8 mode"

4. Configure uriencoding in Tomcat server. xml

5. function request. setcharacterencoding ()

6. js's encodeuricomponent function and Java's urldecoder class

It's no wonder that people are dizzy with so many related encoding settings. Here we will give you an analysis based on various situations.

1. garbled

Occasion: When the page itself has Chinese Characters

Solution: servlet: resp. setcontenttype ("text/html; charset = GBK ");

JSP: <% @ page contenttype = "text/html; charset = gb2312" %>

Note: It must be written before printwriter out = resp. getwriter ();

Occasion: Solve the Problem of garbled get:

Solution: Modify server. xml à uriencoding = "GBK"

Occasion: Solves the garbled content submitted in post Mode

Solution: request. setcharacterencoding ("GBK ");

Note: It must be written before the first parameter is accessed.

Do not call response. setcharacterencoding ("GBK ");

Occasion: <JSP: Param name = "user" value = "<% = S %>"/>. The URL contains Chinese parameters.

Solution: <% request. setcharacterencoding ("GBK"); %>

Note:

Character Set

Character encoding

Corresponding Language

ASCII

ASCII (7-bit)

English

ISO-8859-1

ISO-8859-1 (8 bits)

Latin letters

Gb2312

Gb2312 (16 bits)

Simplified Chinese

GBK

GBK (16 bits)

Simplified Chinese

Unicode

UTF-8 (up to three bytes)

Multi-language

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java Chinese character encoding details console output

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support