ContentType, the difference between CharSet and pageencoding (RPM)

Source: Internet
Author: User

======================== say a ===========================

The ContentType property specifies the HTTP content type of the response.   If ContentType is not specified, the default is text/html. Syntax Response.ContentType [= ContentType] Parameter ContentType

Pageencoding is the encoding of the JSP file itself

ContentType CharSet refers to the content encoding when the server is sent to the client

JSP to pass two times "coding", the first stage will use Pageencoding, the second stage will use Utf-8 to Utf-8, the third stage is the Web page from Tomcat, with the ContentType.

The first stage is that the JSP is compiled into. Java, which reads the JSP according to the pageencoding setting, and the result is translated by the specified encoding scheme into a unified UTF-8 Java source code (ie. java), if the pageencoding set wrong, or not set, out of the Chinese garbled.

The second stage is compiled by Javac Java source code to Java bytecode, no matter what coding scheme is used when JSP is written, the result of this stage is all UTF-8 encoding Java source code.

Pageencoding: Sets the character set encoding in the JSP source file and in the response body.
ContentType: Sets the character set encoding and MIME type of the JSP source file and response body.

Visible, both pageencoding and ContentType can set the character set encoding in the JSP source file and in the response body. But there are also differences:
When setting the JSP source file character set, the priority is pageencoding>contenttype. If none is set, the default is Iso-8859-1.
When setting the character set of the response output, the priority is contenttype>pageencoding. If none is set, the default is Iso-8859-1.

It can be simply assumed that pageencoding is the encoding of the JSP file itself; ContentType's charset refers to the content encoding when the server is sent to the client. For example: pageencoding= "GBK". The meaning of this sentence is to tell the JVM that the JSP itself uses the "GBK" code, when the JSP compiled into the servlet to the JVM, the "GBK" encoding to the JSP Web source file translation into a unified UTF-8 form of Java bytecode. If not set, the JVM defaults to this encoding using ISO-8859-1. The CHARSET=GBK in contenttype refers to the output of this web file to the browser GBK. In this process, a JSP source file needs to pass through three stages, two times encoding, in order to complete a complete output.

First stage: Compile the JSP into a servlet (. java) file. The instruction used is pageencoding, according to pageencoding= "xxx" instructions, find the code rules for "XXX", When the server compiles a JSP file into a. java file, it reads the JSP according to Pageencoding's settings, and the result is a translation of the specified encoding scheme into a unified UTF-8 encoded Java source code (i.e.. java).
Second stage: From the servlet file (. java) to the Java bytecode file (. Class), from UTF-8 to UTF-8. In this phase, no matter what coding scheme is used in JSP writing, the result of this stage is all UTF-8 encoding Java source code. Javac uses UTF-8 's encoding to read the Java source code, compiled into a UTF-8 encoded binary (i.e.,. Class), which is the JVM's specification of a constant number string expressed within a binary code (Java encoding). This process is determined by the internal specification of the JVM and is not under the control of the outside world.
The third stage: from the server to the browser, which is used in the process of the instruction is contenttype. Server loading and execution from the second phase generated Java binary code, the output of the results, that is, the client can see the results, in this output process, by the ContentType attribute in the charset to specify, will UTF8 form of binary code in CharSet encoding form to output. If there is no artificial setting, the default is the form of iso-8859-1.

======================== saying two ===========================

"ContentType" (a string describing the content type.) The string is typically formatted as a type/subtype, where the type is a generic content category and the subclass is a specific content type)

In a nutshell, the server responds to the client in the "ContentType" type. This is easy to understand, but in the Baidu encyclopedia to see a bit of the problem, in ContentType there is a property is charset specified encoding, and pagencoding is also encoded, what is the difference between the two codes?

Read the information after a deep understanding!

Pageencoding is the encoding of the JSP file itself

ContentType CharSet refers to the content encoding when the server is sent to the client

JSP to pass two times "coding", the first stage will use Pageencoding, the second stage will use Utf-8 to Utf-8, the third stage is the Web page from Tomcat, with the ContentType

The first stage is that the JSP is compiled into. Java, which reads the JSP according to the pageencoding setting, and the result is translated by the specified encoding scheme into a unified UTF-8 Java source code (ie. java), if the pageencoding set wrong, or not set, out of the Chinese garbled.

The second stage is compiled by Javac Java source code to Java bytecode, no matter what coding scheme is used when JSP is written, the result of this stage is all UTF-8 encoding Java source code.

Javac uses UTF-8 's encoding to read Java source code, compiled into a UTF-8 encoding binary (i.e.. Class), which is the JVM's specification of a constant number string expressed within a binary code (Java encoding).

The third stage is the Java binary code that tomcat (or its application container) loads and executes phase two, and the output, which is what is seen on the client, is then hidden in phase one and phase two of the parameter contenttype to function.

The ContentType setting.

The Presets of pageencoding and ContentType are iso8859-1. And one of them is set, and the other is the same (TOMCAT4.1.27). But this is not absolute, it depends on the jspc of their respective ways of handling. and pageencoding not equal to ContentType,

<%@ page contenttype= "Text/html;charset=utf-8"%>

Remember the teacher in class talk when met the following this situation his approach is to change the utf-8 into a GBK,

<%@ page contenttype= "TEXT/HTML;CHARSET=GBK"%>

Seems to be the use of casually change one of the other to follow the principle of change. In fact, the formal law should be

<%@ page contenttype= "Text/html;charset=utf-8" pageencoding= "GBK"%>

But if changed to this, the server side received the Chinese is not garbled, but in the client open or garbled, because Charset=utf-8 "in charset specified, output to the client is UTF-8 encoding, so want to formally the law should be changed to

<%@ page contenttype= "TEXT/HTML;CHARSET=GBK" pageencoding= "GBK"%>

It seems like this, it's better

<%@ page contenttype= "TEXT/HTML;CHARSET=GBK"%>

Simple, it seems that in the future, I still use this simple notation!

Is purely an individual's understanding of self-study. If the error is to point out

=========================== three ===============================

noun interpretation and its role

1. ContentType: <%@ page contenttype= "text/html; Charset=utf-8 "%>

2. pageencoding:<%@ page pageencoding= "UTF-8"%>

3. html page charset:<meta http-equiv= "Content-type" content= "text/html; Charset=utf-8 ">

4. SetCharacterEncoding:request.setCharacterEncoding (), response.setcharacterencoding ()

5. SetContentType:response.setContentType ()

6. SetHeader:response.setHeader ()

7. JSP page encoding: The encoding of the JSP file itself

8. Web page display encoding: The encoding that the output stream of the JSP displays in the browser

9. Web page Input encoding: input box input font encoding

Request flow for Web server input: Request data for the appropriate browser for Web servers

One by one. Response flow for Web server output: Output data from the appropriate browser for Web servers

The interplay between them and the scope, as well as the sequential sequence of effects

1. Pageencoding: Only indicates the JSP page itself encoding format, and the page display encoding is not related;

When a container reads (a file) or (a database) or (a string constant) it converts to internal Unicode, and the page is displayed

Internal Unicode conversion to contenttype specifies the encoding after which the page content is displayed;

If the pageencoding attribute exists, then the character encoding of the JSP page is determined by pageencoding.

Otherwise, it is determined by the charset in the ContentType attribute, and if CharSet does not exist, the character encoding of the JSP page takes

The default iso-8859-1.

2. ContentType: Specifies the MIME type and the character encoding when the JSP page responds. The default value for MIME type is "text/html";

The default value for character encoding is "iso-8859-1". MIME types and character encodings are separated by semicolons;

The relationship between Pageencoding and ContentType:

1. Pageencoding content is only used for JSP output encoding, will not be sent out as a header; is to tell the Web Server

JSP page According to what encoding output, that is, the output of the Web server response stream encoding;

2. The first stage is JSP compiled into. Java, which reads the JSP according to the pageencoding setting, and the result is translated by the specified encoding scheme

into a unified UTF-8 Java source code (ie. java).

3. The second stage is compiled by Javac Java Source to Java bytecode, no matter what coding scheme is used when the JSP is written.

After this phase of the results are all UTF-8 encoding Java source code. JAVAC encoding Read with UTF-8

Java source code, compiled into UTF-8 encoding binary (ie. class), which is the JVM pair of constant numbers in binary code

(Java Encoding) the specification expressed within.

4. The third stage is the Java binary Code of Tomcat (or its application container) loading and executing phase two,

The result of the output, which is seen at the client, is that the parameter contenttype hidden in phase one and phase two is effective.

The same settings as the ContentType effect are also HTML page charset, response.setcharacterencoding (),

Response.setcontenttype (), Response.setheader (); Response.setcontenttype (),

Response.setheader (); Priority is the best, followed by response.setcharacterencoding ();

<% @page contenttype= "text/html; CHARESET=GBK "%>, and finally <meta http-equiv=" Content-type "

Content= "text/html; charset=gb2312 "/>

5. Web page input encoding: In the Settings page encoding <% @page contenttype= "text/html; CHARESET=GBK "%> at the same time, it also specifies the page input code;

If the display of the page is set to UTF-8, then all the user's page input will be encoded according to UTF-8; The server-side program reads

The input code should be set before taking the form input;

When the form is submitted, the browser converts the form field value to the byte value corresponding to the specified character set, and then according to the HTTP standard URL

The encoding scheme encodes the resulting bytes. But the page needs to tell the server how to encode the current page;

Request.setcharacterencoding (), can modify Serverlet Get request encoding, response.setcharacterencoding (),

Can modify the encoding of the Serverlet return result.

or use the following to illustrate:

    • pageencoding is set the JSP page source code character encoding format , if the value of the item is Utf-8, then the JSP source code can not write Chinese characters, if you use Eclipse and other tools, save he will be prompted to have a mistake, change to GBK is OK , that's the truth.
    • CharSet is the request server after the return of the content of the character encoding , even if pageencoding set the GBK, save, run the program, viewing the page will find that the Chinese characters just written can not display properly, the charset changed to GBK, normal

JSP to pass two times the "code", the first stage will use Pageencoding, the second stage will use Utf-8 to Utf-8, the third stage is the Web page returned by Tomcat, with the CharSet.

The first stage is that the JSP is compiled into. Java, which reads the JSP according to the pageencoding setting, and the result is translated by the specified encoding scheme into a unified UTF-8 Java source code (ie. java), if the pageencoding set wrong, or not set, out of the Chinese garbled.

The second stage is compiled by Javac Java source code to Java bytecode, no matter what coding scheme is used when JSP is written, the result of this stage is all UTF-8 encoding Java source code.

The third stage is Tomcat (or its application container) loading and executing phase, the output results, that is, seen on the client, then hidden in phase one and phase two of the parameters contenttype to play a function.

Note: When setting the JSP page source code character encoding, if there is pageencoding this item, then take the value of this item, if not, take the value of CharSet, if none, take iso8859-1. The Presets of pageencoding and ContentType are iso8859-1. And one of them is set, and the other is the same (TOMCAT4.1.27). But this is not absolute, it depends on how the respective JSP containers are handled.

For example, if you set the pageencoding in the JSP in Tomcat, then ContentType is set to the same encoding, but not in resion, resin will use the default, by looking at the compiled class servlet Java files can see this, and the problem is precisely here, so, in the JSP, if it is resin under the best still explicitly set these 2 properties.

Summary: Usually we set <%@ page contenttype= "text/html;charset=gb2312"%> on the JSP pages.

Http://www.cnblogs.com/kevin-yuan/archive/2011/12/31/2308479.html

ContentType, the difference between CharSet and pageencoding (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.