Go to resin3.1: Problems with UTF-8 JSP Processing

Source: Internet
Author: User
Tags ultraedit

From: http://cnxiaowei.javaeye.com/blog/262766

 

 

Today, I want to release a project that I have worked on before, And I have encountered the same problem with resin, the reason is the UTF-8 BOM problem described in this article.

 

The following is the original article:

========================================================== ========================================================== ===

 

I used to use resin-3.0.x as a server, recently want to upgrade to resin3.1, so on the official website download resin3.1.7a, unzip the configuration everything is normal, but the application re-deployment on the issue, the previous page is normal, but now an error is reported:

 

500 Servlet Exception

 

<script type="text/javascript"></script>[show] /index.jsp:1: contentType 'text/vnd.wap.wml; charset=utf-8' conflicts withprevious value of contentType 'text/html; charset=UTF-8'.  Check the .jspand any included .jsp files for conflicts.1:  <%@page contentType="text/vnd.wap.wml; charset=utf-8"%>2:  <%@page import="java.util.*"%>3:  <%!

 

 

According to the above prompt, it seems that the contenttype set in the first line in JSP is 'text/vnd. WAP. WML; charset = UTF-8, different from the preceding 'text/html; charset = UTF-8, but the first line of this file is <% @ page contenttype = "text/vnd. WAP. WML; charset = UTF-8 "%>, no setup of 'text/html; charset = UTF-8 ', this prompt is really confusing.

Later I thought it might be a problem with the utf8 file format. I opened the file with UE and saved it again. I chose a utf8 file without Bom. This time I will be able to display it normally. However, if there are so many files on the server, it is impossible to change them one by one. You have to find other solutions. I haven't found any clue on the Internet for a long time. It seems that few people have encountered this problem.

Finally, I had to download the source code and study it. I found out why.

When processing JSP files, resin first reads the first few bytes to determine the file format, if the first byte is 0xef, the second byte is 0xbb, and the third byte is 0xbf, the file is considered to be in utf8 format, and the contenttype is set to text/html on its own; charset = UTF-8, and then in the subsequent processing process, because the JSP program will have set contenttype instructions, encounter this command will find and the previous text/html; charset = UTF-8 is different, therefore, an exception is thrown. Without the BOM format of utf8, the preceding three-byte mark will not be processed.

Related code:

Java code
  1. Case 0xef:
  2. If (CH = stream. Read ())! = 0xbb ){
  3. Stream. unread ();
  4. Stream. unread ();
  5. }
  6. Else if (CH = stream. Read ())! = 0xbf ){
  7. Throw error (L. L ("Expected 0xbfin UTF-8 header. UTF-8 pages with the initial byte 0xbb regular CT 0xbf immediately following. the 0xbb 0xbf sequence is used by some application to suggest UTF-8 encoding without a directive. "));
  8. }
  9. Else {
  10. _ Parsestate. setcontenttype ("text/html; charset = UTF-8 ");
  11. _ Parsestate. setpageencoding ("UTF-8 ");
  12. Stream. setencoding ("UTF-8 ");
  13. }
  14. Break;

 

Conflicting code:

 

Java code
  1. Else if (content_type.equals (name )){
  2. String oldcontenttype = _ parsestate. getcontenttype ();
  3. If (oldcontenttype! = NULL &&! Value. Equals (oldcontenttype ))
  4. Throw error (L. L ("contenttype '{0} 'conflicts with previous value of contenttype' {1 }'. check. JSP and any encoded ded. JSP files for conflicts. ", value, oldcontenttype ));
  5. _ Parsestate. setcontenttype (value );
  6. String charencoding = parsecharencoding (value );
  7. If (charencoding! = NULL)
  8. Parsestate. setcharencoding (charencoding );
  9. }

 

I really don't understand why resin is doing this. If it is a web site, it may not affect much. contenttype is originally text/html, however, if it is a WAP or other contenttype site, it will be difficult to determine the "intelligent" encoding method.

 

 

Attachment: UTF-8 files can be divided into no BOM and BOM two formats (reproduced)

What is Bom? "Ef bb bf" these three bytes are called Bom. The full name of BOM is "byte order Mard ". in UTF-8 files, Bom is often used to indicate that this file is a UTF-8 file, and BOM is really UTF16 used to represent the high and low byte sequence.
Prior to the byte stream, BOM indicates that the low byte sequence is used (the low byte is at the front), while utf8 does not need to consider the byte sequence, so it is possible to have Bom.

Microsoft's notepad word and so on can only correctly open the utf8 file containing Bom, and then ultraedit is exactly the opposite, and the bomutf8 file is mistakenly considered ASCII code.

The BOM for the UTF-8 is efbbbf, because the UE loads the UTF-8 file to UTF16, and the above efbbbf is fffe (BOM for Unicode-Le) in UTF16 ), ultraedit does not know BOM and adds another Bom, so there are two fffe.
The file is damaged.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.