Java Web garbled analysis and solution (i)--get request garbled

Source: Internet
Author: User
Tags rfc

Introduction:

In the beginning of the web, garbled is our most often encountered is the most basic problem, an experienced program ape is easy to solve, the beginner is easy to be trapped in the mire. And many times, even if we solve the garbled problem is unclear, often foggy.

In fact, garbled problem is very simple, is the client and the server using a different character set caused. That is, we send the file is using the character encoding and parsing the file encoding inconsistent. So just figuring out how our files are encoded and decoded is easy to solve. Analysis garbled, we from the request garbled and response garbled to analyze, request garbled and need to according to get and post to analyze separately.

Request garbled--get

The requested encoding is issued by the browser, when using the Get method to request the server information, according to the HTTP protocol, the request packet is not the requested body (that is, "request body does not exist") . So we can only put the request parameters in the URL. Therefore, using get to communicate with the server, the coding aspect of our concern is the browser to the URL encoding, and the server to the URL decoding process.

about URLs

URLs are a technique that we often touch and are very simple, and the URL technique is simply a string. In fact, the structure of the URL is very complex, but usually the usage is relatively simple. For a detailed description of the URL, refer to the following article:

I am the teleport door!!!

The specification of the URL and is defined in the RFC 1738 documentation. Through the URL we get communication protocol, host domain name, processing port, application path, path parameters, query parameters, page fragments and other information. For example:

Http://user:[email protected]/a/b;q=1/c?d=2;sessionid=qewfewrwer#2

according to the above URL, we can get the following information:

Part

Data

Server API

Scheme

http

With req. Getscheme

User

User

No, I don't know.

Pass

Pass

No, I don't know.

Host Address

example.com

Req.getservername

Port

80

Req.getserverport

Path

/a/b;q=1/c

Req.getcontextpath

Query parameters

D=2;sessionid=qwefewrwer

Req.getquerystring

Fragement

2

Development, we often use the path and query parameters, these two parameters, the remaining parameters use less, but in the restful code, like path parameters may be used.


The browser's encoding of the path part

The path information is used to match the processing path, which is rarely included in the path with the Chinese parameter. The RFC document does not explicitly specify the encoding for path. However, according to other articles, the browser encoding path is generally used UTF-8 encoding, the latest URI standard has defined the URI encoding using UTF-8 encoding.

Definition: The simple path part is the application route part, is the URL to remove the protocol, domain name, port and the rest of the query information.


The server decodes the path part: (Three scenarios)

Typically, our requests are sent to the Web container first (as in Tomcat below), the URL is decoded by the Web container, and for the Tomcat container we can conf/ The URL decoding parameter is added to the Server.xml Connector tag, and the default container uses iso-8859-1 decoding of the URL.

<connector port= "8080" protocol= "http/1.1"         connectiontimeout= "20000"         redirectport= "8443"/>

Above is the default settings for Tomcat, you can add the Uriencoding property to the tag to specify the URL decoding scheme. (PS: Tag notation is URI not URL)

If you do not want to use this hard-decoding scheme, you can also specify another property: Usebodyencodingforuri, which is used to tell the Web container, If request specifies a decoding scheme, the URL is decoded using the encoding specified by request.setcharacterencoding.

The second scenario has not been tested and can be tried if necessary. For more information, refer to the Tomcat official documentation below:

Http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q2

In addition, if you do not want to modify the container's global configuration, after all, sometimes the container may not be more than one of our applications, then we can also use the following procedure to Extract Parameters :

String path = Req.getserverpath ();//manual extraction, not suitable for frame path = new String (path.getbytes ("Iso8859-1", "UTF-8"));//re-assemble

    The above approach, we want to determine the Web container to decode the URL is iso8859-1, because do not exclude others modify the container configuration or the container configuration itself is more wonderful possibilities.

  browser-to-QueryParameter encoding

Query parameters and path are not the same, missing query parameters, the Web container can be targeted to our handlers, but the lack of path is not. In addition, the reserved characters for path and query parameters are not the same.

definition: in simple terms, the query parameters are immediately followed by the path, the following section, with & to connect the query parameters.

because of the different path and query parameters, some browsers are inconsistent with the encoding of the query parameters and the path part. Specific use of how to encode the confusion, you can refer to the following article:

Another portal, cheer up!

According to the above article summarizes the law:

(1) path part or the URL part other than the query parameter, each browser is encoded with UTF-8;

(2) query parameters, each browser according to the operating system encoding decision;

The above article is older, the law may not be practical, but also can explain certain problems. For some articles, the query parameters will be based on the page encoding to decide, I did not do the experiment, but this conclusion is certainly one-sided. The reasons are as follows:

The meta parameter of the page is used to encode the page to the browser, and secondly, when sending data using the Post method, the browser encodes the request Body according to the meta encoding. And get the way, we can be launched when there is no page, so the browser could not find the META tag, there is no reference to page encoding.

The browser on the query parameters in the end using which way to encode, I did not find professional, authoritative, credible answer, but I think this is the specific situation specific analysis, do a small experiment on the line. After all, the times are progressing, and manufacturers are more likely to use UTF-8 coding in uniform. And there's a solution that doesn't depend on the browser code behind it.

  server decoding of the QueryParameter

Query parameters are also part of the URL, so the Web container decodes query parameters wisely, decoding and path using the same scheme encoding, so the solution is the same.


garbled appearance:

When processing query parameters, we often use req.getparameters (), to get a parameter, there is very few people behind this method to care about how it works, and it is not necessary. This part is the most prone to garbled, after all, its parameters may be user input, not designed by us. In the Get mode, this garbled do not panic, first we have to analyze the browser to query parameters in the end what kind of encoding. Simple (and complex), Chrome F12 open developer Tools


Find the Network tab, you can see the request URL is shown in the k=%e4%b8%ad%e5%9b%bd, the% removed, you can get 6 16 binary number, Baidu under the Unicode code table, you can see that they happen to be "China" and "country" Unicode encoding. So you can guess that the browser is using UTF-8 encoding. This method of judging requires familiarity with character encoding. However, it is not difficult to find some character encoding of the article science is easy to see the law.

PS: Do not see the URL code through the address bar of the browser, many browser address bar will be decoded to the URL display.

After the server-side, first determine the decoding scheme that your Web container uses for the URL, and then select string ("Param.getbytes", "iso8895-1", "UTF-8") or the Usebodyencodingforuri, uriencoding scheme would be fine.


Summary:

When using get method garbled, the most important thing is to find out the browser URL encoding method, if you use JS programming, In the browser you can use the encodeURIComponent function to encode the Chinese parameters and then assemble the parameters. The Java side uses the Urldecoder.decode method to decode. JS end to two times encoding, otherwise the first URL encoding will be decoded by the Web container, the parameters obtained may still be garbled. can refer to:

Transfer!!!!!



Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Java Web garbled analysis and solution (i)--get request garbled

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.