Summary of Chinese transcoding issues, Chinese summary
1. Encoding basics 1.1 Encoding
ISO-8859-1 encoding is single-byte encoding, downward compatible with ASCII, its encoding range is 0x00-0xFF, 0x00-0x7F completely consistent with ASCII, 0x80-0x9F between control characters, 0xA0-0xFF between text symbols
Single-byte, that is, one byte corresponds to one encoding and cannot encode Chinese Characters
1.2
GBK
1) Chinese characters can be encoded. One Chinese character is encoded in two bytes.
2) encode more Chinese characters than GB2312
1.3
GB2312
Can encode Chinese characters. A Chinese character is encoded in two bytes.
1.4
UTF-8
It can encode Chinese characters. A Chinese character is encoded in three bytes.
Chinese characters, letters, special characters, gbk and UTF-8 can be converted to each other.
2. Web System Conversion code 2.1
Principle
There is a process of encoding and decoding.
The network transmission sender must encode the string into bytes.
It can be UTF-8, gbk, and so on. Encoding cannot be lost during conversion to bytes.
The receiver must use the same encoding method as the sender. Otherwise, garbled characters may occur.
Generally, the server determines a encoding and decoding method,
Then inform the client of the encoding and decoding methods.
Network Transmission code 2.2.1 receive browser POST requests
Set the browser encoding and decoding mode to UTF-8
For example:
<%@ page pageEncoding="utf-8" contentType="text/html; charset=utf-8" language="java"%>
Server decoding method 1:
String name = new String(request.getParameter("name").getBytes("ISO-8859-1"),"UTF-8");
Server decoding method 2:
request.setCharacterEncoding("UTF-8");2.2.2 receive GET requests from browsers
For example:
Http: // localhost: 8888/webtest/EncodeServlet? Name = Hello
The browser will urlEncode the url, encoded in UTF-8
Server decoding method:
String name = new String(request.getParameter("name").getBytes("ISO-8859-1"),"UTF-8");
Request is used here. setCharacterEncoding ("UTF-8"); To set decoding, does not work, because the get request put parameters after the url for url encode, the web Container decodes the url before calling servlet, and the default decoding method is iso-8859-1.
2.2.3 respond to the browser
Response:
Response is the byte encoding method when the Response is sent to the client, the default is ISO-8859-1
You can view the information as follows:
Response. getCharacterEncoding ();
Set the encoding method of the response stream:
Response. setCharacterEncoding ("UTF-8 ");
Set the encoding and decoding methods of the browser:
Response. setContentType ("text/html; charset = UTF-8 ");
Jsp settings:
<% @ Page pageEncoding = "UTF-8" contentType = "text/html; charset = UTF-8" language = "java" %>
PageEncoding: sets the jsp file storage encoding.
Charset in contentType: sets the encoding and decoding of browser-side transmission.
Decoding when parsing the response, encoding when sending the request
The response stream and encoding must be consistent with the browser decoding method to avoid garbled characters.
2.2.4 HTTPClient Encoding
Configure Struts. xml as follows:
<Constant name = "struts. i18n. encoding" value = "UTF-8"> </constant>
2.4
Spring
Control Code
The configuration in Web. xml is as follows:
<filter><filter-name>encodingFilter</filter-name> <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class><init-param><param-name>encoding</param-name><param-value>UTF-8</param-value></init-param><init-param><param-name>forceEncoding</param-name><param-value>true</param-value></init-param></filter>
Encoding sets the server encoding and decoding methods.
ForceEncoding indicates the forced encoding method.
3. Convert string to byte Transcoding
String s = "s Han"; byte [] bytes1 = s. getBytes ("ISO-8859-1"); // lost character byte [] bytes2 = s. getBytes ("GBK"); byte [] bytes3 = s. getBytes ("UTF-8 ");4. Convert byte to string
String s1 = new String (bytes1, "UTF-8"); // lost String s2 = new String (bytes2, "GBK"); String s3 = new String (bytes3, "UTF-8 ");