A discussion on the problem of the hardship character set in web development

Last Update:2014-11-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.lanceyan.com/tech/arch/web_luanma.html Remember just do javaweb development time by this coding problem get dizzy, often confused encoding normal A will code and disorderly. At that time, most of the project progress was known and did not know why. Then there is time to put the whole system, finally touched through the ins and outs.

In the CGI development of C + +, people like to use Latin, which is a byte encoding format, storage of MySQL space saving, and C + + is relatively easy to control to the byte level of the language. So the framework package is basically not a problem.

In the Java language, there are really a lot of places to deal with coding problems. A place that is not set up will be garbled and flying. The approximate summary includes the following sections: browser, server, database, operating system.

Browser:
If you use the template language, HTML needs to be set to display the character set. This applies to the browser to determine what encoding to display.

Extensions, the browser recognizes the order of the encodings:
1. If the HTTP header declares the CharSet, it will use the HTTP header,
2. If the HTTP header is not set, it will parse the META tag,
3. If meta is not there, the browser will identify the code based on whether auto detect is set.
4. Otherwise, the character encoding of the local UI will be used.

Server:
For dynamic languages such as JSP, the JSP header needs to set the encoding format, the Java EE Server parsing this JSP will be the entire page encoded as UTF-8 output, or in accordance with the system default encoding format iso-8859-1 output. The JSP format is as follows:

<%@ page language= "java" contentType = "text/html; Charset=utf-8 "
pageencoding ="UTF-8"%>

As we all know, the JSP corresponds to the servlet. The servlet's encoding corresponds to the following settings:

Public void Service(httpservletrequest request, httpservletresponse response)
Throws Servletexception,IOException{
Response. setcontenttype("Text/html;charset=utf-8");
}

And do not miss the common spring tools, code conversion filter, very practical. When you use struts, spring MVC, this filter helps you transform the encoding filters that are not set. The following settings:

<filter>
<filter-name> Set Character Encoding</filter-name>
<filter-class>
Org.springframework.web.filter.CharacterEncodingFilter
</filter-class>
< Init-param>
<param-name> encoding </param-name>
<param-value> utf-8 </param-value>
</init-param>
</filter>

What if there is garbled? The parameter passing of the Doget method will certainly have garbled problems. Just set the encoding character set in the Tomcat listener (files are typically stored in the/tomcat installation directory/conf/server.xml):

When you are developing it, don't forget that the Java file itself is encoded in the same format. Right-click on the class file to view the properties.

If you forget to change the encoding format of your files at development time, Windows defaults to GBK, and then to UTF8 encoded Linux. The file is huge, you can not change it one by one. In fact, it is very simple, just need to-DFILE.ENCODING=GBK the environment parameter setting of Java command to solve.

When compiling Java code, if you use ant you need to set up compiled character sets in Javac. So the printed log output to the file or console will not be garbled.

The charset set at Maven compile time:

< Artifactid> Maven-compiler-plugin</artifactid>
< version> 2.5</version>

< configuration>
< optimize> True</optimize>
< showdeprecation> False</showdeprecation>
< DebugLevel> Lines,source</debuglevel>
< source> 1.6</source>
< target> 1.6</target >
  < encoding> utf-8 </ Encoding >
  < Meminitial> 128m </meminitial >&NBSP;&NBSP;
< Maxmem> 768m </maxmem >

&NBS P </configuration>

Sqlmap's SQL Xml,sping XML is also required because it involves cross-platform. Top add:

Database:
Here is a list of the most common MySQL character set settings for everyone. Open the MySQL configuration file (Linux is generally in/etc/my.cnf, windows in the MySQL installation directory my.ini). Settings are as follows:

[Mysqld]
Default-character-set = UTF8

[MySQL]
Character_set_server = UTF8

JDBC needs to be set
Jdbc:mysql://192.168.0.237:3306/dzh_db?useunicode=true&characterencoding=utf-8

These are all set up in the general Chinese is not a problem.

But a recent problem has been very funny. Previously thought that all the characters as long as the set up all the data can be entered into the database, the result of some characters do not, such as ★ this type. Later these characters into bytecode, incredibly not three-bit utf8, I rub, sweating. Later queries can be processed by filtering UTF8 special characters.

PublicStaticStringUtf2string(BYTE buf[]){
int Len= buf.Length;
StringBufferSb=NewStringBuffer(len/2);
For(int I=0; I< Len; I++){

If(By2int(buf[I])< = 0x7F)
Sb.Append((Char) BUF[I]);
ElseIf(By2int(buf[I])< = 0xDF& & By2int(buf[I])> = 0xC0){
int BH= By2int(buf[I]& 0x1F);
int BL= By2int(buf[++i]& 0x3F);

Bl= By2int*b*< <6| Bl); Bh= By2int*b*> >2);
int C= BH< <8| Bl;
Sb.Append((Char) c);
}ElseIf(By2int(buf[I])< = 0xEF& & By2int(buf[I])> = 0xE0){
int BH= By2int(buf[I]& 0x0F);
int BL= By2int(buf[++i]& 0x3F);
int BLL= By2int(buf[++i]& 0x3F);

Bh= By2int*b*< <4| Bl> >2);
Bl= By2int(BL< <6| Bll);

int C= BH<< 8 |&NBSP;BL //space converted to half-width
if (C == 58865) {
C = < Span class= "nu0" >32
Sb.append ( ( char ) c
}
}
return sb.tostring< Span class= "Br0" ( /span>

or change the MySQL character set to UTF8MB4, remember this only mysql55 support Oh!

[Mysqld]
Default-character-set =UTF8MB4

[MySQL]
Character_set_server = Utf8mb4

Operating system:
Windows is GBK by default and generally does not need to be changed. But everyone wants to create a file for the UTF8 format what to do, it is impossible for each of us to create a file after the use of properties to change it? Too much trouble! After the eclipse is set up, the same type of file creation will be in UTF8 format.

Linux, can have two places to modify the basic is enough:
vi/etc/sysconfig/i18n
Modify

Lang= "ZH_CN. GB3212 "
Language= "Zh_CN.GB18030:zh_CN.GB2312:zh_CN"
Supported= "ZH_CN. GB18030:zh_CN:zh:en_US. Utf-8:en_us:en "

Vi/etc/profile

Export Lc_all= "ZH_CN. GB2312 "
Export lang= "ZH_CN. GB2312 "

original articles, reproduced please specify: reproduced from lanceyan.com

A discussion on the problem of the hardship character set in web development

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A discussion on the problem of the hardship character set in web development

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A discussion on the problem of the hardship character set in web development

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support