Views on a pure JavaScript code for gb2312 Transcoding

Source: Internet
Author: User

Complete pasted on http://community.csdn.net/Expert/TopicView3.asp? Id = 4563329

Here I will focus on my points:

The encoding problem should not be solved in this way. The landlord's starting point is good, but there is a problem with the idea of solving the problem. At best, this is a patch under certain conditions.

I am talking about the "Train of Thought problem". Does the landlord ever think that encoding is actually an underlying (relative to the application you developed) facility.

You only use encoding in communication, such as stream Io. For B/S, it is basically the communication between browser and server. A very small number of applications may involve reading and writing local file systems (however, ActiveX or XPCOM is usually used in this case, you do not need to use Js for transcoding ). In essence, a good design must encapsulate this part of details.

In fact, in normal environments, the processing of encoding should be completed by browsers and servers. And there are technical specifications or agreements. For example, encoding is specified in the Content-Type header sent by the server.

The correct idea should be to learn how to use the browser and server facilities correctly, rather than making the wheel on your own. Your wheels run at the wrong level of the system and are not smooth (performance problems ). Your code can be used as a temporary patch only in rare cases (such as browser bugs, or you cannot modify server error settings.

The best practice for encoding is to use UTF-8 encoding (such as HTML, XML, JS, CSS...) as much as possible ...). Basically, modern browsers can recognize UTF-8 with Bom.

This is a bit interesting if it is part of the framework. Framework is a little different from general app development, providing such a function is also good. However, we recommend that you do not use this function in most cases. It is not because the framework has the ability to patch that all programmers who use the framework lose the opportunity to learn the "correct way.

For example, "Many Web servers use gb2312 for anti-encoding when processing requests ."
In this example, the key lies in the web server programs (Asp, JSP, etc .) it is best not to use gb2312 decoding. I have repeatedly expressed this point of view in csdn and many places where I discuss Java Chinese issues. In particular, the get method (the parameter code is included in the URL) should always be decoded with UTF-8, which is defined by a large number of standards (http-related RFC, W3C HTML specifications, and so on. In XMLHttpRequest, because it can only request the webpage of this domain, it can be considered that the JS developer should be able to influence the person on the server side, so there is basically no problem that the server cannot control. If the VM user finds that there is a problem with the virtual space settings (for example, the Content-Type is always sent with an error code), the Webmaster can be required to make the settings, if its service provider is unwilling or does not, it is a matter of service level or attitude. If it is mine, I will change it.

Isn't Development discussed based on the specific environment and situation? For Intranet or local applications with higher permissions, ActiveX and Moz can be used in IE to call encoding modules with higher performance. For general webapps, you should always follow various specifications, such as submitting URL parameters should always use UTF-8, this is the right path. We have too many programmers who always use various hack instead of taking a closer look at what part of the system should be done.

Gb2312codec under a single JS, which has completed certain functions, I think it is good as a backup tool of the framework, but it is obviously inappropriate to use it as a general cross-browser solution. It is like riding a horse on a highway in the city. Yes, you can reach your destination, but can it be used as a universal means of transportation in cities?

It is also the character processing, making a complete Chinese Character and Pinyin index. For highly interactive webapp development, if this library can be added, it will be much larger than gb2312codec!

Of course, the webpage can be gb2312 encoded. However, UTF-8 must be used for parameter submission.

We cannot blame the objective reasons completely. Have we tried to solve the problem in the correct way? In fact, transcoding is not required.

Although XMLHttpRequest does not automatically adjust the encoding when reading text files, if you read an XML file with an encoding Declaration, both IE and Moz will process the encoding correctly. Therefore, there is no need to use the so-called pure JS transcoding. So far, I have not seen a practical web application example. It can be said that pure JS transcoding is a reasonable solution.

Finally, the best solution to the encoding problem is that the server sends the correct Content-Type. This is stipulated by multiple specifications, that is, the Content-Type sent by the server has the highest priority for the character encoding that determines the User-Agent should adopt. Internet Explorer, Moz, and opera basically follow this standard (whether webpage or XMLHttpRequest ).

If you really need to handle multiple types of codes, the best practice is: as long as developers have the opportunity to affect server configuration, do not give up using this simplest, direct, and efficient way to solve the problem! For example, even if you are a virtual host user and your service provider is stupid and arrogant, you may not be able to correct the wrong global configuration, but you may have the opportunity to use it. the htaccess method is partially corrected (in fact, because your service provider is stupid, the probability of incorrect configuration is low, and its server is allowed by default. the opportunity for htaccess is huge ).

For the most common Apache 2, The addcharset command can even do better. You can use this command to automatically select character set encoding for a file with a specified suffix. In addition, under the optimal configuration of the installation environment, abc.gb.html will send gb2312 as the encoding, and everything will be OK. All you have to do is standardize the file name. Note: The default configuration of Apache uses content-negotiation. you can choose abc.gb.html to send the content to the Chinese user.
For various Java Web Container (Servlet/jsp), as long as your container is not too old, you can use filter to process encoding. Java has all the capabilities you need to process encoding, the only problem is whether you can use it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.