This kid talked about the nature of JSP garbled code!

Source: Internet
Author: User

The reprinted content is as follows:

 

By default, Tomcat is all encoded in ISO-8859-1, No matter what display your page, Tomcat will eventually convert all the characters for you to ISO-8859-1. then, when GBK is used for translation on another target page, the original error code will be translated into GBK encoding, and the text will be garbled.

So we need to first get the "character" (no matter what) are expressed with byte array, and use ISO-8859-1 for translation, get a byte array in the ISO-8859-1 encoding environment. for example, AB indicates [64, 65]. encode the array with GBK and translate it into a string.

Then we can get a code conversion process.
Assume that the GBK code ("you")-> urlencode is changed to-> (% 3f % 2f) -> Tomcat automatically redirects you one ISO-8859-1-> get (23 43 68 23 42 68 each symbol is represented as an encoding in the ISO-8859-1) -> receive page ---> convert to byte array of ISO-8859-1 again,] ---> convert to readable text with GBK ---> (% 3f % 2f "----> convert to (" you ")

Except the UTF-16, other character set definitions are repeated.

For example, assume that the value of the Chinese character "I" is 22530 (I did not check the specific number)
In Japanese, the value of "watermark" may be 22530 (also assumed) or "?" in Korean.

Transmission over the network cannot be carried in high bytes, because the bottom layer of the network only recognizes the unsigned char, which is equivalent to the byte in Java, so
22530 the int must be converted to a byte array,

Byte [0] = (22530 >>& 0xff;
Byte [1] = 22530 & 0xff;
I did not calculate the specific number. Assume It is byte [125,231].

When such a byte is uploaded to the server, does it indicate "I", "Japanese", or other shit?
Generally, the communication protocol will tell the character set. For example, HTTP will tell the server at the request:
Contenttype = "xxxxxxxxxx"; charset = "gkb ";
At this time, the server will know that the received [125,231] Is gkb "me" instead of other text.

The above is a standard communication process. However, if some poorly-performing programmers do not notify the server-side character set when submitting a request, the server will be unable to do so.
You have to guess the default one based on the most common character set.

This is not bad. the most terrible thing is that when the programmer level and knowledge of the write server are poor, it will be terrible. Just like a programmer who writes older versions of Tomcat, he was born in the West and thought that all people in the world use 26 letters and some symbols, so no matter what the client submits, he calculates by ISO-8859-1, the results can be imagined.

No way, who let us use GBK people will not write tomcat, so first let the bad programmer error generated string with the ISO-8859-1 back
[125,231], and then use gkb to generate a string.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.