Codec (encoding and decoding) common sense

Source: Internet
Author: User

Character encoding

Character encoding is a binary number to correspond to character set characters , common character encoding methods are: iso-8859-1 (not supported in Chinese),gb2312,gbk,utf-8 and so on. In Javaweb, you often encounter the need to encode /decoding the scene has response encoding/request encoding/url code:

Response encoding

The server sends the data to the client by the response object, and if the response data is a binary stream, there is no need to consider the encoding issue. If the response data is a character stream, you must consider the encoding problem :

Response.getwriter () uses iso-889-1 to send data by default, and the character set does not support Chinese, so it is bound to be garbled when encountering Chinese .

When you need to send Chinese, you need to use:

Response.setcharacterencoding ("UTF-8");

Getwriter () ...

Set encoding, because the UTF-8 encoding is already set before the getwriter () output, so the output characters are UTF-8 encoded, but we do not tell the client what encoding to use to read the response data, so we need to set the encoding information in the response header (using Content-type) :

Response.setcontenttype ("Text/html;charset=utf-8");

Getwriter () ...

Note: This code adds not only the encoding information in the response header, but also the invocation of a response.setcharacterencoding ("UTF-8");

Request encoding

1. browser Address bar encoding

in the browser address bar book character data, which is encoded by the browser and sent to the server, so if you enter Chinese in the Address bar, its encoding is determined by the browser :

Browser Coding
Ie/firefox GB2312
Chrome UTF-8

2. Page requests

If you send data through a page's hyperlink/table-to-server, its encoding is determined by the encoding of the current page:

<meta http-equiv= "Content-type" content= "text/html; Charset=utf-8 ">

3. GET

When a client sends a GET request, regardless of how the data is encoded by the client, the server is decoded with iso-8859-1 (tomcat8.x instead of UTF-8), which requires us to request.getparameter () Get the data and then convert it to the correct encoding:

Private map<string, string> Converttoparametermap (HttpServletRequest request) throws unsupportedencodingexception {

enumeration<string> names = Request.getparameternames ();

map<string, string> parameters = new hashmap<string, string> ();

if (names! = null) {

while (Names.hasmoreelements ()) {

String name = Names.nextelement ();

String value = request.getparameter (name);

Parameters.put (name, New String (Value.getbytes ("iso-8859-1"), "UTF-8"));

}

}

return parameters;

}

4. POST

When a client sends a POST request, the server also defaults to using ios-8859-1 decoding, but the data for the post is routed through the request body, so the POST request can specify the request body encoding by request.setcharacterencoding ():

Private map<string, string> Converttoparametermap (HttpServletRequest request) throws IOException {

map<string, string> parameters = new hashmap<string, string> ();

if (Request.getmethod (). Equals ("POST")) {

Request.setcharacterencoding ("UTF-8");

enumeration<string> names = Request.getparameternames ();

while (Names.hasmoreelements ()) {

String key = Names.nextelement ();

Parameters.put (Key, Request.getparameter (key));

}

} else {

enumeration<string> names = Request.getparameternames ();

while (Names.hasmoreelements ()) {

String key = Names.nextelement ();

String value = Request.getparameter (key);

Parameters.put (Key, New String (Value.getbytes ("iso-8859-1"), "UTF-8"));

}

}

return parameters;

}

URL encoding

The network standard RFC 1738 stipulates that:

“... Only alphanumerics [0-9a-za-z], the special characters "$-_.+!*" (), "[not including the quotes-ed], and reserved charact ERs used for their reserved purposes could be used unencoded within a URL. "

"Only letters and numbers [0-9a-za-z], some special symbols" $-_.+!* "()," [not including double quotes], and some reserved words can be used without encoding directly for the URL. " ”

If the URL has Chinese characters, it must be encoded and used, and the URL encoding process is actually very simple:

First, you need to specify a character encoding, decoding the string to get byte[], and then put less than 0 bytes +256, and then convert it to 16, and then add a% before the end.

This coding process has been packaged as a ready-made library in Java and can be used directly:

Urlencoder Description
static String encode(String s, String enc) Translates a string into application/x-www-form-urlencoded format using a specific encoding scheme.
Urldecoder Description
static String decode(String s, String enc) Decodes a application/x-www-form-urlencoded string using a specific encoding scheme.

Note: In the Web, the Tomcat container automatically recognizes that the URL is encoded and automatically decoded.

Codec (encoding and decoding) common sense

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.