Solution to PHP file garbled reading

Source: Internet
Author: User
Tags coding standards
Solution to PHP file garbled reading

When you use PHP to read the file 3.txt, garbled characters occur. D:/3.txt is a UTF-8 file. Code:

  1. $ F1 = fopen ('d:/3.txt ', 'r ');
  2. $ Str = fread ($ f1, 10000 );
  3. Fclose ($ f1 );
  4. Echo substr ($ str, 1, 3 );

To mark this text as utf text, Microsoft adds three bytes: ord ($ charset [1]) = 239 & ord ($ charset [2]) = 187 & ord ($ charset [3]) = 191 the above code deletes the php code of BOM.

The screenshot can start from the fourth place. Garbled characters may occur if you extract data from the first, second, or third places. A Chinese character in UTF-8 encoding may be represented in three bytes. If you have Chinese characters, do not use ANSI encoding. otherwise, garbled characters may occur during reading.

ANSI code (in the encyclopedia)

Unicode and ansi are both forms of character code. To enable the computer to support more languages, we usually use 0x80 ~ 2 bytes in the 0xFF range to 1 character. For example, in the Chinese operating system, the byte [0xD6, 0xD0] is used for storage. Different countries and regions have developed different standards, resulting in respective coding standards such as GB2312, BIG5, and JIS. These two bytes are used to represent the extended Chinese character encoding methods of a single character. they are called ANSI encoding. In a simplified Chinese system, ANSI encoding represents GB2312 encoding. in a Japanese operating system, ANSI encoding represents JIS encoding.

Different ANSI encodings are incompatible. when information is exchanged internationally, texts in two languages cannot be stored in the same ANSI encoded text.

If it is an English letter or a symbol, it is encoded as 1 byte and the maximum bit is 0. if it is a Chinese character, the maximum bit must be 1 and the size is 2 bytes. From this point of view, if an ansi text file is stored in a Chinese computer, if there is a Japanese or Korean, it may cause a code conflict, that is to say, we cannot use ansi encoding in notepad to store the Chinese-Japanese mixed text. The notepad in the computer is developed for the Chinese version of the system. If you want to use it in common, you have to store the txt file as a Unicode text file. Therefore, if you want to make something international, you can use Unicode for convenience. In fact, most of the current operating systems use Unicode encoding. if we use ansi encoding, the system still needs to convert it to Unicode during internal processing, which leads to lower code efficiency. It's easy to use Unicode!

Php garbled code:

  1. $ Content = file_get_contents ("http://bbs.it-home.org /");
  2. $ Pattern = "// imsU ";
  3. $ Match = array ();
  4. Preg_match_all ($ pattern, $ content, $ match );
  5. Print_r ($ match );

Garbled characters occur.

Add header ("Content-type: text/html; charset = utf-8.

Used to set the html encoding format to UTF-8

The solution to garbled code is shown in three places: 1. database encoding 2. page encoding 3. connection encoding

If the three items are consistent, no garbled code will occur.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.