Causes of Java coding and Solutions

Last Update:2018-03-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. file encoding

Unicode is the preferred encoding. Unicode is a global character encoding standard.

Conclusion: the conversion between GBK and unicode is performed through the gbk unicode ing table.

The conversion between UTF-8 and unicode is based on the Conversion rule Formula

Therefore, unicode is the core intermediary. To convert Gbk to UTF-8, convert it to unicode first. Then convert unicode to UTF-8, and vice versa.

Ii. XML Encoding

　　★The encoding attribute should be specified as the encoding used when the document is saved.

★Encoding attribute declared by encoding

★W3C defines three rules for the XML parser to correctly read the encoding of XML files:

1. If the text block has a BOM (byte sequence mark, in general, if it is saved in unicode format, it contains BOM, and ANSI does not ),
Defines the file encoding (the encoding format selected when saving the file ).

2. If there is no BOM, view the encoding attribute declared by XML encoding.

3. If neither of the above is available, it is assumed that the XML file uses UTF-8 encoding.

　　★Eclipse and Other Editors save files according to the encoding attributes declared by the XML encoding.

Iii. String Encoding

　　★For String s = "Hello! .

1. If the source code file is GBK encoded and the operating system (windows) uses the default environment encoding GBK, the JVM will parse the byte array into characters according to GBK encoding during compilation, then, convert the character to a byte array in unicode format for internal storage.
If the source code file is UTF-8 encoding, we need to notify the compiler source code format, javac-encoding UTF-8 ..., during compilation, the JVM parses the code into characters according to UTF-8, and then converts it to a byte array in unicode format,
No matter what format the source code file is, the unicode byte array obtained after the same string is completely consistent. When displayed, it is converted to GBK for display (related to the OS environment)

2. System. out. println (new String (s. getBytes (), "UTF-8"); // error, because getBytes () uses GBK encoding by default, while UTF-8 encoding is used during parsing, certainly an error.

　★How to correctly convert GBK to UTF-8? (Actually unicode to UTF-8)

Note: Not new String (s. getBytes ("GBK"), "UTF-8); // The source code file is in GBK format, or the string is read from the GBK file, converted to string into unicode format

// The string is in unicode format in the memory

String gbkStr = "Hello! "; // Use getBytes to convert unicode strings into byte arrays in UTF-8 format

Byte [] utf8Bytes = gbkStr. getBytes ("UTF-8"); // then decodes the new string from this byte array with UTF-8

String utf8Str = new String (utf8Bytes, "UTF-8 ");

That is, new String (s. getBytes ("UTF-8"), "UTF-8 ");

★New String (s. getBytes ("iso-8859-1"), "GBK ")

Generally used to convert the original GBK encoding, is converted into a iso-8859-1, now back to GBK.
Note: not all conversions are reversible, and the byte array of the iso-8859-1 is single-byte, so they can be converted.

★Encoding can be specified during read/write.

Iv. JAVA coding problems

　　★File Loading
Java file encoding format: it is consistent with the operating system by default, but can be modified.
Compile to Class file: class file encoding is fixed to UTF-8
Load class file to JVM: Unicode
Memory: Unicode
Understanding: regardless of the file encoding format, it is the same to load to JVM.

　★Network Transmission is in bytes, so all data must be serialized as bytes.Data serialization in Java must inherit the Serializable interface.

　　★When reading resource files from the network, regardless of the current java file encoding value,The obtained bytes are only related to the encoding of the read resource file.

　　★If you are aware of the resource encoding format,You only need to use this encoding when converting to a string.Line.

　　Therefore, the key issue lies in determining the encoding method of resource files.

5. encodeURI and encodeURIComponent

　　★For different browsers, encoding is required.Both use UTF-8 encoding rules.

★Difference: encodeURI is used for the whole URL, and reserved symbols in the url are not encoded; encodeURIComponent is used for parameter segments, and the encoding is more thorough: Reserved symbols in the url are encoded.

★Two Encoding Problems.

(1) If the server code is UTF-8, then the following is NOT garbled. Otherwise, garbled characters appear.
Front-end: var url1 = encodeURI (url );
Background: String name = request. getParameter ("name ");

(2) No garbled characters are detected regardless of the server code.
Front-end: url1 = encodeURI (url); encode Chinese characters in the url as ASCII codes.
Url2 = encodeURI (url1); encode the ASCII code

Background: // tomcat is automatically decoded here. Tomcat configuration file is not set, then the default is ISO-8859-1
String name1 = request. getParameter ("name ");
String name2 = java.net. URLDecoder. decode (name1, "UTF-8 ");

Whether by GBK or UTF-8 or ISO-8859-1 is good, can get url1 correctly. Because ASCII code encoding with GBK, UTF-8, ISO-8859-1 encoding results are the same.

　　★For POST data submission, the browser submits data based on the ContentType ("text/html; charset = GBK") of the webpage ")To encode the data in the form.

Server: request. setCharacterEncoding () sets the encoding and obtains the correct number through request. getParameter.

6. garbled characters in JAVA

★The browser uses UTF-8 for parsing:

(Manual) ==> right-click in the browser and select UTF-8 as the encoding format
(Intelligent) => write in files such as: <meta name = "content-type" content = "text/html; charset = UTF-8">
Simulate the response Header using the <meta> label to tell the browser to use UTF-8 encoding for parsing.
(Intelligent) ==> response. setContentType ("text/html; charset = UTF-8 ");
Or response. setHeader ("content-type", "text/html; charset = UTF-8 ");
Or response. getOutputStream (). write ("<meta http-equiv = 'content-type' content = 'text/html; charset = UTF-8 '>". getBytes ());

The purpose is to control the browser behavior, that is, to control the browser to use UTF-8 Decoding

Frequently used:
<Meta name = "content-type" content = "text/html; charset = UTF-8"> or <meta charset = "UTF-8">
<% @ PageEncoding = "UTF-8" %>
<? Xml encoding = "UTF-8"?>
★Response. setCharacterEncoding ("UTF-8"); sets the code table for response to store data. It is used to encode the response stream output by response. getWriter.

This solution is not required for response. getOutputStream (), because it means to send the data in the response object to the browser with the byte stream decoded by the UTF-8

★Response. setContentType ("text/html; charset = UTF-8"); The setCharacterEncoding method is also called internally, which is equivalent to setCharacterEncoding ("UTF-8"); and setHeader ("content-type ", "text/html; charset = UTF-8 ");
★Response. setCharacterEncoding can overwrite the previous response. setContentType

★1. By default, IE and WEB servers are encoded by ISO-8859-1, and setCharacterEncoding can be used to set character encoding
2. URL supports only ISO-8859-1 by default

★Get request: the QueryString parameter content defaults to the encoding method to ask the ISO8859-1, and the use of request. setCharacterEncoding ("UTF-8") cannot solve the problem.

1. modify the configuration file of the tomcat server <Connector> node URIEncoding = "UTF-8"
2. If URIEncoding is not set, use new String (username. getBytes ("ISO-8859-1"), "UTF-8 ");

3. useBodyEncodingForURI = "true": Use the Charset defined in ContentType in the Header.

4. url, cookie, and ajax get requests, generally using URLEncoder

★Post request. setCharacterEncoding ("UTF-8"); valid only for Post requests

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Causes of Java coding and Solutions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Causes of Java coding and Solutions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support