Analyze the setting sequence of the Web character set

Source: Internet
Author: User
Analysis of setting ordered character sets for Web character sets

Author: 360 weboy
Sina Weibo: http://weibo.com/360weboy
Blog link: http://www.360weboy.com/php/fundament/charset.html

When data was transferred to a new system for an e-commerce website last week, we found that some product description characters always output garbled characters. after confirming that it was a character set problem, we conducted another survey on character sets and determined the methods that affect the character set on the page:
1. file encoding
2. default character set settings for Apache2
3. set the default character set in PHP. ini
4. manually output the header ('content-type: text/html; charset = xxx') in the PHP script ');
5. add the following content to the html page:
I tested the above five methods one by one, and determined the priority of the five methods for browser character set selection. First, I created a utf8-encoded test. php file with the following content:



I used chrome to access test. php. because the default character set of my chrome browser is not utf8, it should be gb2312 or gbk, so I saw the following garbled characters in the browser:



Comparison of header and meta priorities

Next, we will test the above four or five methods. I added them to the html page, and the content is displayed normally.





Remove the meta tag from the header and add the header ('content-type: text/html; charset = utf8'). The Content is displayed normally. In this case, the meta tag and header method have a high priority. I have set these two items on the page at the same time. set the header to gb2312 and the meta to utf8:



The result content is not displayed normally, indicating that the header has a high priority. The browser uses the character set in the http header and then the meta tag character set in the html page.



Impact of setting the default character set in php. ini

Next, let's look at the situation after setting the default character set in php. ini. In order to test the priority, we first change the header and the character set for meta on the page to gb2312, so the content must be garbled. Then, open the php. ini file, find the following settings, remove the quotation marks, and set the character set to utf8:



After setting, remember to restart the apache2 server. The result is as follows:



We can see that in php. after ini sets the default character set to utf8, it is added to the end of the response header Content-type, thus overwriting the gb2312 character set output by the header function in the php script, the browser considers the content as the utf8 character set based on the information in the header, and the final content is actually normal. In php. ini, the character set has a higher priority than the header function and meta tag.

Set the default character set in apache2

Finally, we can set the default character set in apache2 to test. This time, set the character set in header, meta, php. ini to gb2312, and set the default character set to utf8 in apache2:



Restart the apache2 server. the content is as follows:





It can be seen that the character set setting in apache2 does not affect the Content-Type header in the http response. Therefore, the browser considers that gb2312 should be used for decoding, leading to garbled characters. If the character set setting in php. ini is removed, the page will display normally. After testing, it is shown as follows:



It seems that the priority of the character set in apache2 is also smaller than the character set setting of the header function. Let's continue to remove the header settings:

It is proved that the character set in apache2 has a higher priority than the character set setting of the meta tag. Charset = utf8 is added to the http header.
Based on the above experiments, the priority sequence of character set settings is obtained: php. ini default character set settings> header function character set settings> apache2 default character set settings> meta tag character set settings


Reply to discussion (solution)

This result is not quite consistent.

Test limitations
1. only a single browser is used for testing.
2. only one language test is used.
3. only a single file encoding test is used.

The biggest problem is that all tests are positive tests, that is, the items are considered correct as expected. this is a logical taboo.

Why is there no problem in the experiment, but the conclusion is wrong?
Tested:
The priority of default_charset is smaller than that of the header function character set.

The priority is correct based on my tests with my friends. check the priority by yourself.

Snmr_com

Please advise. if you know where the test problem is located, can you tell me how the test is considered a standard .. Thank you!

Http://www.w3help.org/zh-cn/causes/HR9001
You can refer to this article, but this article is also old.

You use inductive deduction. the basis of full induction is to satisfy all possibilities and reach a consistent (unique) conclusion.
In addition, because the possibility of your proposition is not in line with the "good order" feature, you cannot use mathematical induction to prove the two steps.
Therefore, you need to traverse all possibilities for testing. if you do not traverse all possibilities, you can only add restrictions to the conclusion.
For example
Only one browser is involved, and the conclusion can be reduced to this browser.
The original document only uses one encoding, and the conclusion is only applicable to this encoding.

Basically, two or three conditions are added, so the conclusion is not of great practical significance, because the scope of application is very small.
So we still need to do more tests to support your conclusion.
Your proposition involves the following conditions: Browser, encoding of each priority, language family (especially different languages that are difficult to automatically recognize, such as dubyte, Japan, and Korea encoding), and document encoding, there are quite a lot of possibilities to make an arrangement and combination ...... If you can test it all over again, I still admire you.

Professional theoretical guidance, brothers admire... If you have time to complete it again .. Thanks for your reply.

The priority is correct based on my tests with my friends. check the priority by yourself. Php. ini



Run interface


Your test results do not represent my test results.
What's more, the header should be able to change the related preset values, or what else should he do?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.