Go to resin3.1: Problems with UTF-8 JSP Processing

Last Update:2018-12-04 Source: Internet

Author: User

Tags ultraedit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From: http://cnxiaowei.javaeye.com/blog/262766

Today, I want to release a project that I have worked on before, And I have encountered the same problem with resin, the reason is the UTF-8 BOM problem described in this article.

The following is the original article:

========================================================== ========================================================== ===

I used to use resin-3.0.x as a server, recently want to upgrade to resin3.1, so on the official website download resin3.1.7a, unzip the configuration everything is normal, but the application re-deployment on the issue, the previous page is normal, but now an error is reported:

500 Servlet Exception

<script type="text/javascript"></script>[show] /index.jsp:1: contentType 'text/vnd.wap.wml; charset=utf-8' conflicts withprevious value of contentType 'text/html; charset=UTF-8'.  Check the .jspand any included .jsp files for conflicts.1:  <%@page contentType="text/vnd.wap.wml; charset=utf-8"%>2:  <%@page import="java.util.*"%>3:  <%!

According to the above prompt, it seems that the contenttype set in the first line in JSP is 'text/vnd. WAP. WML; charset = UTF-8, different from the preceding 'text/html; charset = UTF-8, but the first line of this file is <% @ page contenttype = "text/vnd. WAP. WML; charset = UTF-8 "%>, no setup of 'text/html; charset = UTF-8 ', this prompt is really confusing.

Later I thought it might be a problem with the utf8 file format. I opened the file with UE and saved it again. I chose a utf8 file without Bom. This time I will be able to display it normally. However, if there are so many files on the server, it is impossible to change them one by one. You have to find other solutions. I haven't found any clue on the Internet for a long time. It seems that few people have encountered this problem.

Finally, I had to download the source code and study it. I found out why.

When processing JSP files, resin first reads the first few bytes to determine the file format, if the first byte is 0xef, the second byte is 0xbb, and the third byte is 0xbf, the file is considered to be in utf8 format, and the contenttype is set to text/html on its own; charset = UTF-8, and then in the subsequent processing process, because the JSP program will have set contenttype instructions, encounter this command will find and the previous text/html; charset = UTF-8 is different, therefore, an exception is thrown. Without the BOM format of utf8, the preceding three-byte mark will not be processed.

Related code:

Java code

Case 0xef:
If (CH = stream. Read ())! = 0xbb ){
Stream. unread ();
Stream. unread ();
}
Else if (CH = stream. Read ())! = 0xbf ){
Throw error (L. L ("Expected 0xbfin UTF-8 header. UTF-8 pages with the initial byte 0xbb regular CT 0xbf immediately following. the 0xbb 0xbf sequence is used by some application to suggest UTF-8 encoding without a directive. "));
}
Else {
_ Parsestate. setcontenttype ("text/html; charset = UTF-8 ");
_ Parsestate. setpageencoding ("UTF-8 ");
Stream. setencoding ("UTF-8 ");
}
Break;

Conflicting code:

Java code

Else if (content_type.equals (name )){
String oldcontenttype = _ parsestate. getcontenttype ();
If (oldcontenttype! = NULL &&! Value. Equals (oldcontenttype ))
Throw error (L. L ("contenttype '{0} 'conflicts with previous value of contenttype' {1 }'. check. JSP and any encoded ded. JSP files for conflicts. ", value, oldcontenttype ));
_ Parsestate. setcontenttype (value );
String charencoding = parsecharencoding (value );
If (charencoding! = NULL)
Parsestate. setcharencoding (charencoding );
}

I really don't understand why resin is doing this. If it is a web site, it may not affect much. contenttype is originally text/html, however, if it is a WAP or other contenttype site, it will be difficult to determine the "intelligent" encoding method.

Attachment: UTF-8 files can be divided into no BOM and BOM two formats (reproduced)

What is Bom? "Ef bb bf" these three bytes are called Bom. The full name of BOM is "byte order Mard ". in UTF-8 files, Bom is often used to indicate that this file is a UTF-8 file, and BOM is really UTF16 used to represent the high and low byte sequence.
Prior to the byte stream, BOM indicates that the low byte sequence is used (the low byte is at the front), while utf8 does not need to consider the byte sequence, so it is possible to have Bom.

Microsoft's notepad word and so on can only correctly open the utf8 file containing Bom, and then ultraedit is exactly the opposite, and the bomutf8 file is mistakenly considered ASCII code.

The BOM for the UTF-8 is efbbbf, because the UE loads the UTF-8 file to UTF16, and the above efbbbf is fffe (BOM for Unicode-Le) in UTF16 ), ultraedit does not know BOM and adds another Bom, so there are two fffe.
The file is damaged.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More