Coding and decoding problems in Java

Source: Internet
Author: User
Tags java web
Coding problems in Java
First, the encoding format:
1.ASCII a total of 128 2.iso-8859-1 a total of 256, covering most of the Western European language characters 3.gb2312 contains 682 symbols, 6,763 Chinese characters
4.GBK compatible GB3212, including 21,003 Chinese characters 5.gb13030 compatible GB2312 "Chinese character coded character set for information interchange" national standard
6.utf-16 Unicode encoding, representing a character in two bytes, in Java with UTF-16 as the character storage format in memory.
7.UTF-8 uses variable-length technology, each coding region has a different codewords length, and different types of characters can be composed of 1~6 bytes.
Two, Java coding scene
Encoding operations in 1.I/O operations
Write to File
String file= "C:/a.txt";
String charset= "UTF-8";
Fileoutputstram os =new fileoutputstram (file);
OutputStreamWriter writer=new OutputStreamWriter (os,charset);//If you do not set the charset here you will have a default value
try{
Writer.write ("This is the saved Chinese character");
}finally{
Writer.close ();
}


Reading files
FileInputStream is=new FileInputStream ();
InputStreamReader reader=new InputStreamReader (is,charset);//Read file CharSet must be consistent or compatible when writing files, otherwise it will be garbled
StringBuilder sb=new StringBuilder ();
char[] buf = new char[64];
int count = 0;
try{
while (count = Reader.read (buf)) {
Sb.append (Buf,0,count);
}
}finally{
Reader.close ();
}


2. Encoded operation in memory
Conversion between a string and a byte array
String str= "This is a Chinese character";
Byte[] B=srt.getbytes ("UTF-8");//Convert a string to a byte array, you must set the encoding format, otherwise use the default value
String N=new string (b, "UTF-8");


strings encoded into byte arrays decoded into character arrays
String str= "This is a Chinese character";
Charset charset=charset.forname ("UTF-8");
Bytebuffer Bytebuffer=charset.encode (str);
Charbuffer Charbuffer=charset.decode (Bytebuffer);


Iii. Comparison of coded formats
GBK and GB2312,GBK can handle all the Chinese characters, while GB2312 has many characters to handle, so you should choose GBK.
UTF-8 and UTF-16, are processing Unicode encoding, the two coding rules are not the same, relatively UTF-16 encoding efficiency, any characters are used two bytes 16 bits to store, character to Word


The conversion between sections is simpler, and the string manipulation is better, suitable for use between local disks and memory, and Java in-memory encoding uses UTF-16. UTF-16 encoded into a byte stream in the


If data loss occurs during transmission, all data will cause the decoding to fail. UTF-8 variable length encoding, for ASCII characters using single byte storage, suitable for network transmission, each character is


There is the beginning and end of the identification information, network transmission process, if the data loss, decoding is only lost without decoding. UTF-8 coding efficiency is between UTF-16 and GBK,


UTF-8 in coding efficiency and coding security on the balance, is the ideal Chinese encoding method.


Coding and decoding of design in Java Web
1.Web where codec exists:
Users send an HTTP request from the browser to the server, where coding is required: Url,cookie,parameter. The place that the server needs to decode after receiving the request has Uri,cookie


, the post form parameter. The server side may also need to read data from the database, read other files on the local or network, and so on. After the server finishes processing the request, it needs to encode the data again


It is then sent through the socket to the user's browser. The browser then decodes the corresponding information and displays it.
2.URL coding and decoding of the place
Components of a connection: links below
http://www.360buy.com/Group Buy/search?keyword= Shoes (this link is just an example)
This link can be divided into several parts: ①scheme:http:② domain name: Www.360buy.com③uri: Buy/search④querystring:keyword= Shoes
URI (/group buy/search) If there is Chinese, the encoding is set in Tomcat, <connector uriencoding= "UTF-8"/&gt, and if not defined, use the default iso-8859-1


Line codec.
QueryString (keyword= shoes) If there is Chinese, this part is encoded in Tomcat, <connector uriencoding= "UTF-8"


Usebodyencodingforuri= "true"/&gt, if so set, the encoding is consistent with the encoding of the URI, and if usebodyencodingforuri= "true" is not set, the default


encoding format, different browsers have different default values. Decoding is decoded when the first call to Request.getparameter ("..."), if set


Usebodyencodingforuri= "True", you can decode it by default by using the parameters set by uriencoding= "UTF-8".
When using GET request, the parameter is best not to use Chinese, more prone to garbled problems. The URI section also does not recommend the use of Chinese. Tomcat Best set to <connector


uriencoding= "UTF-8" usebodyencodingforuri= "true"/&gt, and keeps the codec consistent throughout the application.
3.post Request Codec:
When a client submits a form using post, the form is first encoded by the client using the CharSet set in ContentType, and the first time the server calls


Request.getparameter ("...") is also automatically decoded using the CharSet set in ContentType, and of course, the first time you call Request.getparameter


("..."), set the decoding method by Request.setcharacterencoding (CharSet). Post submission forms generally do not appear to be problematic.
4.HTTP Header encoding and decoding
First of all, decoding, the decoding in the header is also the first call Request,getheader () decoding, if the call has not been decoded, then call the messagebytes.tostring side


method, this method will decode using the default iso-8859-1 from byte to char. You cannot set other decoding formats for headers, so if you have non-ASCII characters in the header, you must


will be garbled. If you must have Chinese in the header, you must display it to convert to ASCII characters.
Coding and decoding in 5.HTTP body
Encoding: Encoded via Response.setcharacterencoding (charset), transmitted to the browser, through the header of the <meta http-equiv= "Content-type"


Content= the CharSet in the "text/html"; charset=utf-8/> to decode, and if not set, use the browser's default decoding method.

The coding problem in JS will be detailed later

Welcome to the Programmers Group: 134994493

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.