Questions and Thoughts Caused by an interview question (zt)

Source: Internet
Author: User

I went to a company for an interview two days ago, and there was an XML question in it. The question was not difficult, but some questions were raised due to the encoding problem. After I came back, I tried to explore it.
Requirement: Convert the following XML files into HTML tables using XSL and display them in descending order by pubdate.

XML file
1 <? XML version = "1.0" ?>
2 <? XML-stylesheet type = "text/XSL" href = "books. XSL" ?>
3 < Books >
4 < Book ID = "1" Pubdate = "20050707" >
5 < Title > C # getting started with XML </ Title >
6 < Description > XML skills required by programmers </ Description >
7 </ Book >
8 < Book ID = "2" Pubdate = "20050607" >
9 < Title > XSL programming skills </ Title >
10 < Description > From XML to HTML </ Description >
11 </ Book >
12 < Book ID = "3" Pubdate = "20050607" >
13 < Title > . NET Framework frameworkProgramDesign </ Title >
14 < Description > Framework Program Design </ Description >
15 </ Book >
16 </ Books >

At that time, I used to create a text file on the desktop, and then manually enter:

XSLT
1 <? XML version = "1.0" ?>
2 < XSL: stylesheet Version = "1.0" Xmlns: XSL = "Http://www.w3.org/1999/XSL/Transform" >
3 < XSL: Template Match = "/" >
4 < Html >
5 < Body >
6 < Table Border = "1pt" >
7 < Tr >
8 < TD > Bookid </ TD >
9 < TD > Title </ TD >
10 < TD > Description </ TD >
11 < TD > Pubdate </ TD >
12 </ Tr >
13 < XSL: Apply-templates Select = "// Book" >
14 < XSL: Sort Select = "@ Pubdate" Order = "Descending"   />
15 </ XSL: Apply-templates >
16 </ Table >
17 </ Body >
18 </ Html >
19 </ XSL: Template >
20 < XSL: Template Match = "Book" >
21 < Tr >
22 < TD > < XSL: value- Select = "@ ID"   /> </ TD >
23 < TD > < XSL: value- Select = "Title/text ()"   /> </ TD >
24 < TD > < XSL: value- Select = "Description/text ()"   /> </ TD >
25 < TD > < XSL: value- Select = "@ Pubdate"   /> </ TD >
26 </ Tr >
27 </ XSL: Template >
28 </ XSL: stylesheet >

Note: The above XMLCodeDue to a problem with the XML Code insertion function of Ftb, there cannot be spaces before <and>, which is also the requirement of XML well-format.

Then I browsed the XML file in IE, and the prompt was: invalid characters were found in the text content. I checked the file and there were no invalid characters in the file, so I think it may be an XML encoding problem, then add encoding = "UTF-8" to the top of the command, it is reasonable to say that the UTF-8 is perfect for the character set involved in the question, but still prompts: invalid characters found in the text content, the change to encoding = "gb2312" is normal. Why?

I came back and tried it. There are two situations:
Case 1:
By default, the notepad in Windows saves the file in ANSI format. In this case, different encoding settings have different results:
Encoding = "Windows-1252", no error, but Chinese characters are garbled;
Encoding = "gb2312", displayed normally;
Encoding = "UTF-8", error, prompting invalid characters found in text content;
Case 2:
Save the XML file in another notepad package as the UTF-8 format,
Encoding = "Windows-1252", error, prompt does not support switching from current encoding to specified encoding;
Encoding = "gb2312", error, prompt does not support switching from current encoding to specified encoding;
Encoding = "UTF-8", normal;
Exception: do not set the value of encoding, the default is the UTF-8, so this situation falls into the UTF-8.

So how can we understand the above two situations?
W3C defines three rules for the XML parser to correctly read the encoding of XML files:
1. If the text block has a BOM (byte sequence mark. Generally, if it is saved in unicode format, it contains Bom, and ANSI does not), the file encoding is defined.
2. if Bom is not available, view the encoding attribute declared in XML.
3, if neither of the above, it is assumed that the XML file uses UTF-8 Encoding

Now, let's try to use the following three rules to explain the above situation:
Case 1 analysis:
Because XML is saved in ANSI format by default, if encoding = "UTF-8" is set, the XML Parser parses the file in UTF-8 format, because the file's substantive format is ANSI, while ANSI and UTF-8 except 128 characters of the same encoding, others are not the same and the parser considers invalid characters, therefore, the error "invalid characters are found in text content" is displayed. Set it to encoding = "Windows-1252". The parser uses Windows-1252 for parsing. ANSI is actually windows-1252, so if the code is the same, the parser can parse it smoothly. But how can I understand Chinese garbled characters? Windows-1252 is a single-byte character set, while Chinese requires dual-byte encoding, so it can only be parsed as garbled characters. How can I understand how to set encoding = "gb2312" to display normally? Is gb2312 a dual-byte simplified Chinese character set? Didn't we say that the substantive format of the file is single-byte ANSI? This is the contradiction.

Case 2 analysis:
Since the file is saved as a UTF-8 (one of Unicode encoding methods), the file contains Bom, which is interpreted according to w3g rules, the XML parser will parse XML in UTF-8 and ignore the encoding settings. In fact, the encoding settings cannot be ignored. Otherwise, the problem 2 cannot be explained.

Legacy problems:
This article does not fully analyze the situation 1 and 2. I don't know how everyone understands these situations?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.