URL encoding and garbled code solution for get and post submission

Source: Internet
Author: User

1. What is URL encoding.

URL encoding is a format used by a browser to package form input. The browser obtains all names and their corresponding values from the form, they are encoded by name/value as part of the URL or sent to the server separately.


2. url encoding rules.

Each pair of name/value is separated by &, and each pair of name/value from the form is separated by =. If the user does not enter a value, the name will still appear, but there is no value.

The URL encoding is to add % before the hexadecimal number of the character ASCII code. For example, the URL encoding of \ (her hexadecimal number is expressed as 5c) is % 5c.


3. A brief introduction to Garbled text and HTTP requests

In fact, garbled code occurs frequently in Web development. With the above encoding, let's take a look at the garbled code below.

1) Garbled text is a common problem during web development. The main cause is that non-ASCII code is used in the URL, which causes Garbled text during server background program parsing.

2) The most common Chinese character in a URL is in the parameter value of querystring and the value of servletpath.

3) The following figure shows the HTTP request process:



Step 1: The browser encodes the URL to the server;

Step 2: the server will encode the displayed content after decoding these requests to the client browser;

Step 3: the browser displays the webpage according to the specified code


4) detailed analysis of how get submits code and how the server decodes and garbled code Solutions

For the get method, we know that its submission is to append the request data to the end of the URL as a parameter, so that dependency garbled characters will easily occur, because the data name and value are likely to be transmitted as non-ASCII codes.

After the URL is spliced, the browser will encode it and then send it to the server. For specific rules, see URL encoding rules.

Here we will detail the problems that may occur during the encode process. In this process, we need to understand that the characters that require the URL encode are generally non-ASCII characters, so we can see that garbled characters are mainly made up of Chinese or special characters appended to the URL, and the other needs to know what encoding method the URL encode uses to encode the characters, in fact, this encoding method is determined by the browser. Different browsers and different settings of the same browser affect the URL encoding. To avoid unnecessary encoding, we can use Java code or javaspcript code for unified control.

After the URL encode is completed, the URL becomes a character in the ASCII range, and then the iso-8859-1 is converted to binary and sent along with the request header.

After arriving at the server, the server will first decode with the iso-8859-1, the data obtained by the server is the request header characters within the ASCII range, the request URL contains the parameter data, if it is Zhongwei or special characters, then, the % XY after encode (The hexadecimal number in the encoding rule) passes the request. setcharacterencoding () does not work. At this time we can find the root cause of garbled characters is that the client is generally through the use of UTF-8 or GBK and other data encode, to the server but with the iso-8859-1 mode decoder obviously does not work.

There are two solutions,

One is to use the getbytes method of the string class for encoding conversion. The specific Java code is:

New String (request. getparameter ("name"). getbytes ("iso-8859-1"), "Client encoding method ")


Type 2: modify the configuration information in the XML code of the server:

<Connector Port = "8080" protocol = "HTTP/1.1" maxthreads = "150" connectiontimeout = "20000"

Redirectport = "8443" uriencoding = "client encoding"/>

 

5) detailed analysis of post submission encoding and server decoding and garbled Solutions

For Post mode, the parameter value pairs in the form are sent to the server through the request packet. In this case, the browser will send the request to the server based on the contenttype ("text/html; charset = GBK") of the webpage ") and then send it to the server.

In the server-side program, we can use

Request. setcharacterencoding () sets the encoding, and then

Request. getparameter to obtain the correct data.

If garbled characters appear here, they can be directly resolved through request. setcharacterencoding.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.