This chapter mainly analyzes the principle of Java coding and decoding, and the problems of Chinese transcoding to make a simple summary
Directory
1 Basics of coding
ISO-8859-1 encoding
GBK
GB2312
UTF-8
2 Web System conversion encoding
principle
Servlet Network transfer encoding
STRUTS2 Control Code
Spring Control Code
3 String bytes
4-byte-to- string
1 Basics of Coding
ISO-8859-1 encoding
iso-8859-1 encoding is single byte encoded, backwards compatible with ASCII, whose encoding range is 0x00-0xff,0x00-0x7f between full and ASCII, 0x80-0x9f is the control character , 0xa0- between 0xFF is a text symbol
single byte, i.e. one byte corresponding to one encoding, cannot encode Chinese characters
GB2312
Can encode Chinese characters, a Chinese character encoded with 2 bytes
GBK
1) can encode Chinese characters, a Chinese character encoded with 2 bytes
2) encode more Chinese characters than GB2312
UTF-8
1) can encode Chinese characters, a Chinese character encoded with 3 bytes
2) The range contains Chinese characters, letters, special symbols, GBK and utf-8 can be converted to each other
2 Web System conversion encoding
Principle
We analyze the service-and client-side patterns, for example, the browser is the client and the Web server is the service side.
Here is the process of encoding and decoding, the client side needs to encode the string into bytes, can be ISO-8859-1,UTF-8,GBK, and so on, the default is Iso-8859-1,
And the encoding cannot be lost during the conversion to bytes. The server needs to be decoded with the same encoding as the sender, otherwise it will appear garbled.
Typically, the service side determines how to encode and decode, and then tells the client how to encode and decode.
Servlet Network transfer encoding
Receiving a browser POST request
In the case of JSP, the server sends the JSP generated HTML to the client.
Set the browser encoding and decoding method to UTF-8
For example:
<%@ page pageencoding= "Utf-8" contenttype= "text/html; Charset=utf-8 "language=" java "%>
Service-side decoding mode 1:
String name = new String (Request.getparameter ("name"). GetBytes ("Iso-8859-1"), "UTF-8");
Service-side decoding mode 2:
Request.setcharacterencoding ("UTF-8");
Receive browser GET requests
such as: Http://localhost:8888/webtest/EncodeServlet?name= Hello
The browser will urlencode the URL and encode it as UTF-8
Service-Side decoding method:
String name = new String (Request.getparameter ("name"). GetBytes ("Iso-8859-1"), "UTF-8");
Setting the decoding here with request.setcharacterencoding ("UTF-8") does not work because the GET request is to spell the parameters behind the URL for URL encode, and the Web container decodes the URL before calling the servlet. and the default decoding method is Iso-8859-1
Responding to a browser
Response Set the encoding:
Response refers to the encoding of bytes when responding to a client, by default Iso-8859-1
This can be viewed in the following ways:
Response.getcharacterencoding ();
To set how the response stream is encoded:
Response.setcharacterencoding ("UTF-8");
To set the encoding and decoding method of the browser:
Response.setcontenttype ("Text/html;charset=utf-8");
JSP settings:
<%@ page pageencoding= "Utf-8" contenttype= "text/html; Charset=utf-8 "language=" java "%>
Pageencoding: Setting the JSP file storage encoding
ContentType inside the CharSet: Set the encoding and decoding of the browser-side transfer
Decoding when parsing a response, encoding when sending a request
To keep the response stream and encoding and browser decoding the same way, not garbled
HttpClient Setting the Encoding
STRUTS2 Control Code
The following configuration is done in Struts.xml:
<constant name= "struts.i18n.encoding" value= "Utf-8" ></constant>
Spring Control Code
The configuration in Web. XML is as follows:
<filter><filter-name>encodingFilter</filter-name><filter-class> Org.springframework.web.filter.characterencodingfilter</filter-class><init-param><param-name >encoding</param-name><param-value>utf-8</param-value></init-param><init-param ><param-name>forceencoding</param-name><param-value>true</param-value></ Init-param></filter>
Where encoding sets the service-side encoding and decoding methods
Forceencoding indicates how the encoding is enforced
3 String to byte transcoding
String s = "s-han"; byte[] bytes1 = s.getbytes ("iso-8859-1");//Lost character byte[] Bytes2 = s.getbytes ("GBK"); byte[] Bytes3 = S.getbyte S ("UTF-8");
4-byte -to-string
string S1 = new String (bytes1, "utf-8");//missing string s2 = new String (Bytes2, "GBK"); String s3 = new String (Bytes3, "utf-8");
Summary of Java encoding (Chinese transcoding)