About the Java platform encoding __HTML5

Source: Internet
Author: User

Online Finishing + Personal Insights

Some points may be inaccurate and can be changed at any time.

Need to pay attention to the coding format where there are many, many of the default encoding format is not uniform, if set improperly, it will easily appear in Chinese garbled phenomenon.

About editor encoding: Files saved with Notepad and Eclipse are encoded by the operating system by default, that is, GBK.
Eclipse compares the cow, which can be based on the encoding you set up in the specific file to change the file to save the encoding, for example, you can set the JSP, XML, HTML encoding format, then eclipse will use the appropriate encoding format when saving. If you do not set encoding in the file, save it as the default GBK, for example, ". txt" and ". Java" that you write in Eclipse are GBK encoded by default.

About Java Encoding: Java internal string strings are all Unicode encoded, so the internal Java string can be said to be not encoded. But the Java platform is encoded, encoding is the operating system's default encoding, that is, GBK. That is, when Java compiles ". Java", it defaults to ". Java" as GBK encoded. Of course, you can manually specify the encoding to use at compile time, which is not mentioned here. The Java compiled ". Class" is UTF-8 encoded. Java can correctly complete the ". Java" to ". Class" encoding format conversion, but only if you must let Java know ". Java" the correct encoding format. For example, if your ". Java" is not the default GBK encoding, but rather UTF-8 encoded, and Java defaults to be GBK, then the compilation run, the Chinese will display garbled, because Java did not know ". Java" the correct encoding format.

About Java IO Encoding: Method InputStreamReader can specify what encoding format to use to convert read bytes to characters, and if the encoding format specified in the method is inconsistent with the actual encoding format of the byte stream, garbled code will appear.
The same is true when writing files. For example, for method OutputStreamWriter, if you set the encoding format for the write file to UTF-8 and write the data to ". txt", when we view the file in Notepad, because Notepad defaults to the GBK encoding of the operating system, Then open to see is garbled. The method BufferedReader, BufferedWriter adopts the Java default encoding format, that is, the operating system encoding, that is, GBK, can not manually set the method encoding format.

About the code of JSP: Pageencoding is the code of the JSP file itself, the JSP file is not like ". Java", ". Java" is used by the compiler to read the code format of the operating system.
The ContentType charset refers to the encoding of the content that the server sends to the client.
The CharSet presets for Pageencoding and ContentType are iso8859-1 (LATIN-1), so you have to set them manually. JSP to go through the "code" two times, the first phase will use Pageencoding, the second phase will be used UTF-8 to UTF-8, the third stage is from Tomcat out of the Web page, will use ContentType.
The first stage is JSP compiled into ". Java", which reads the JSP according to the pageencoding settings, and translates the encoding scheme specified by pageencoding into UTF-8 encoded ". Java", if pageencoding set wrong, Out of is garbled.
The second stage is the compilation of ". Java" to ". Class", which is UTF-8 to UTF-8.
The third phase is the Tomcat loading and execution phase of the Java binary code, output results, that is, seen on the client. At this stage the ContentType charset works. Request.setcharacterencoding ("Utf-8") can be used at the beginning of the Dopost or Doget method in the servlet, and response.setcharacterencoding ("Utf-8"); To set the encoding format.

About Tomcat encoding: Tomcat's default encoding is Iso-8859-1 (Latin-1). So in the page data transfer process, you must manually set the container encoding format, otherwise there will be garbled. You can set 2 parameters in the Connector tab of Tomcat's "server.xml" to change the encoding format:
(1) usebodyencodingforuri= "True" This parameter tells Tomcat to use the page encoding format (that is, the charset value in the attribute contenttype) to process post, recommended.
(2) uriencoding= "Utf-8" This parameter is forced to specify the encoding format for UTF-8, a bit malicious.

About MySQL encoding: MySQL's default encoding is Iso-8859-1 (Latin-1), you can modify. There are two places to set up in "My.ini":
(1) default-character-set This parameter tells MySQL, the client connects to the database, the data passed over the encoding format, if the encoding format set here and the page content of the actual coding format inconsistent, Then MySQL will mistakenly to pass over the data to decode, when the data stored in the database is garbled.
(2) Character-set-server This parameter is the encoding format used by the data stored in the database. It is best to specify the encoding format for the connection, such as Useunicode=true;characterencoding=utf-8, in the URL used to connect to the database.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.