WDB Development garbled Summary article

Source: Internet
Author: User
Tags java web

Java Web development process often encountered garbled, this article we discuss the causes of garbled and solve ideas.

A complete Web request will have a 4-time codec transformation, as shown below.

First: The client (usually the browser) converts the character into a TCP byte stream to the server.

Here there is a character-to-byte conversion.

Second time: The server reads the TCP byte stream from the client and converts it into a string.

Here is a one-time byte-to-character conversion.

Third time: The server converts the resulting string to a TCP byte stream to the client.

Here again there is a character-to-byte conversion.

Fourth time: The client reads the response byte stream sent from the service side. Converted to a string display.

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M00/8B/F6/wKioL1hdUjOzKo_uAAAkuIAeVVg499.png-wh_500x0-wm_3 -wmp_4-s_346268572.png "title=" Aaaa.png "alt=" Wkiol1hdujozko_uaaakuiaevvg499.png-wh_50 "/>

A complete Web request is over.

Smart you have found that the first conversion and the second conversion are a pair of corresponding codecs. The third and fourth conversions are a pair of corresponding codecs. That is, the first time the character set encoding, the second time with the same character set decoding.

The third time you can choose a different character set than the first two times, but the fourth time must be the same as the third. Yes, you've already started.

How do we find the first coded character. The author of the Web client program knows exactly what characters he uses to send Web requests, and we don't say much. We only talk about browsers here, because the vast majority of requests are made by the browser.

When the browser submits a post or get form, it is encoded with the current page of the browser.

View the current page encoding for Chrome, 360 speed browser, and more: Click the menu icon on the right side of the browser, then move the mouse to tools → encode to view or change the encoding mode for the current page.

The current page encoding is determined by the fourth conversion when the page is fetched by the browser. The browser determines which encoding to use, based on the response header and response body.

Found no, above we said the first conversion determines the encoding of the second conversion, and the third determines the encoding of the fourth conversion. And here, for the fourth time, the code for the first conversion is decided. A circular conversion was formed.

A1=A2, A3=A4, a4=a1 so a1=a2=a3=a4. Proving that selecting the same character is sufficient condition to complete the correct encoding of the conversion.

Having finished the first code, let's talk about the second decoding.

The request message sent by the client is divided into three parts: the request line, the request header, and the post body. There are two places where garbled characters are possible, the parameter part of the request URL and the post body. (Why are English characters not garbled?) ASCII code, the majority of the character set of the English encoding are the same).

When the server resolves the two parts, it has its own character set. For Tomcat, urlencoding specifies the encoding to parse the URL parameter portion. Request.getcharsetencoding () Specifies the character set that resolves the post body.

After the second decoding, say a third time coding.

The service side will character the word to the client and must be converted into a byte stream. What's the code for it? The JSP page has two settings options:pageencoding and contentType. Did you notice?

In general they will appear at the same time. pageencoding The contenttype is the encoding of the JSP file, while the service side encodes the word character to the client's character set. This character set is written in the Content-type field of the response message header. Content-type: "TEXT/JAVASCRIPT;CHARSET=GBK". Only contenttype exist, good understanding. Pageencoding's appearance, and ContentType have a bit of emotional entanglements. Remember is a little. You know that JSP files need to be compiled into Java files.

This process: 1. Read the JSP file, 2. Convert Java class string, 3. Write to Java file. The files are all byte streams. The JSP file is read using the pageencoding character set to decode. The code written to Java is UTF-8 (because Javac uses UTF-8 to compile Java to Class).

This emotional entanglement is in, ContentType does not exist, the pageencoding character set will replace him.

The fourth time below is when the browser displays the final result.

The browser uses the Content-type field in the response header to parse the response header and display it.

If the content-type does not exist, the browser will use:
<meta http-equiv=content-type content= "text/html;charset=gb2312" >

The specified character set to decode.

Finished, it sounds like a very simple oh, but why there are so many garbled cases? Common situations:

    1. Ajax requests.

The encoding of the AJAX request is programmed by the program. Not in the ring family of A1 to A4. Program

The member failed to understand the various coding parameters, so the error was analyzed by the above 4 steps. If not understand, I can only hehe.

    1. Urlencoding are not the same.

The post body is parsed by the request.getcharsetencoding () character set, which is program controlled. The request parameters for the URL are parsed by the urlencoding character set. The urlencoding is usually set differently by the server, such as Tomcat defaults to iso8859-1. Note this during the migration process.

    1. The JSP file is not saved in the correct format.

JSP in the pageencoding designated as gbk,jsp file but saved to UTF-8, conversion is garbled. Follow up without looking, certainly disorderly. Solve garbled problem must first eliminate this problem.

    1. Coding is not uniform.

A project several coding, what kind of team ah. Within the subsystem can be, a cross-boundary, finished.

If there is garbled, how to troubleshoot?

The general dichotomy, to see if the server display is correct, the general parameter System.out output to the console or log (be sure to note the log file when you open the code, the original output is right, you instead open the mess).

If you can see the correct string, usually the third or fourth time conversion is incorrect, if you see garbled, the first two times the conversion is incorrect. The second scenario is overwhelming, because the programmer's involvement is small in the former case.

I only say the second case:

    1. First determine whether the URL parameter, or the post parameter is garbled.

    2. Get Urlencoding or request.getcharsetencoding based on 1.

    3. Determine the encoding of the client request by viewing the browser, or by using tools such as HttpWatch or tcpdump. In our country basically gbk,utf-8. UTF-8 basic with 3 bytes for Chinese, GBK in two places, very good distinction.

    4. It would be nice to change to the same.

2016-12-24 Night, Suzhou


This article is from the "No Thieves" blog, please be sure to keep this source http://guojuanjun.blog.51cto.com/277646/1885688

WDB Development garbled Summary article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.