File Encoding Issues

Source: Internet
Author: User

File encoding problem: 20161201

This problem is a simple small problem encountered in development, and it is the first time that I have encountered it. At first the heart is very vague, there is no solution at all. At first, I thought it was different to open a uniform format file between systems that would use different encodings, so that led to garbled display problems. However, all are txt text files, open in different text software is different effect. For example, opening in Windows system is the default ANSI encoding that is used to open, so there is no coding problem. Normal display, however, when we send this text file to the UNIX system, the use of the Mac system, the use of text files Open is not a problem, but, when we use the Sumlime software open, there is a garbled problem.

Popular Science :

ANSI encoding generally refers to Windows-1252 encoding, which is an encoding of a 256-character set of characters, each of which is represented by a single byte. With the first 128 characters (00-7f) and ASCII 7bits encoded, there are some accented characters in the last 128 characters that are used in European countries. ANSI encoding in different languages under Windows also refers to the language of the Windows encoding page, such as the Chinese context refers to Windows-936 (that is, GB2312), Japanese environment is Windows-932 (JIS) encoding, etc., is also the first 128 characters ( 00-7f) and ASCII 7bits encoding, the other characters are represented by 2 bytes.

UTF-8 is a variable-length character encoding for Unicode, a character that can be represented by 1 to 4 bytes, where the character represented by one byte is the same as ASCII 7bits encoding, while most characters, including Chinese, are represented by 3 bytes.

So if there are only ASCII 7bits encoded in the text, these two encodings are compatible without distinction, but for other characters, the encoding is different, and Windows-1252 encoding can not be expressed in addition to 256 characters, such as Chinese character, Other ANSI encodings, such as Windows-936, can also represent only a subset of the characters in Unicode. The difference in encoding format makes it easy to understand that a program cannot be run, because the same set of characters expressed in different encodings is different or cannot be represented, except for those characters in ASCII 7bits encoding.

Workaround :

In order to be able to open the file normally under the Mac system, use Notepad to open the file under Windows, then select " Save As "-"select" " encode "-"utf-8". Save it and then send it to the Mac system, and now it will display properly. However, I encountered the problem is: The file is generated by the program, the default encoding is ANSI encoding, we want to use the encoding is UTF-8 encoding. Therefore, when the program generates the file, then use the program to convert the file encoding mode to Utf-8, so that the platform can be displayed properly.

Here the Code conversion program, before the online search to a Daniel wrote a tool class, you can achieve the conversion between various encodings. However, my requirement is simply to convert the ANSI encoding into UTF8 encoding. So I took the part I needed and added it to my program.

$str Iconv $str);

The above is the PHP code, this is a simple code to achieve my needs. The idea is to read the contents of the file into a string once, then convert to Utf-8 encoding, then write to the file.

Iconv function:

Internationalized Character and encoding support

Iconv function
iconv_get_encoding-getting the internal configuration variables of the iconv extension
iconv_mime_decode_headers-decoding multiple MIME header fields at once
iconv_mime_decode-decoding a MIME header field
Iconv_mime_encode-composes A MIME header field
iconv_set_encoding-setting current settings for character encoding conversions
iconv_strlen-returns the number of characters in a string
Iconv_strpos-finds position of first occurrence of a needle within a haystack
Iconv_strrpos-finds the last occurrence of a needle within a haystack
Iconv_substr-the part of the truncated string
iconv-strings are converted as required character encodings
ob_iconv_handler-converting character encoding with output buffer handler

File Encoding Issues

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.