When you use PHP to read the file 3.txt, garbled characters occur. D:/3.txt is a UTF-8 file. Code:
- $ F1 = fopen ('d:/3.txt ', 'r ');
- $ Str = fread ($ f1, 10000 );
- Fclose ($ f1 );
- Echo substr ($ str, 1, 3 );
To mark this text as utf text, Microsoft adds three bytes: ord ($ charset [1]) = 239 & ord ($ charset [2]) = 187 & ord ($ charset [3]) = 191 the above code deletes the php code of BOM. The screenshot can start from the fourth place. Garbled characters may occur if you extract data from the first, second, or third places. A Chinese character in UTF-8 encoding may be represented in three bytes. If you have Chinese characters, do not use ANSI encoding. otherwise, garbled characters may occur during reading. ANSI code (in the encyclopedia) Unicode and ansi are both forms of character code. To enable the computer to support more languages, we usually use 0x80 ~ 2 bytes in the 0xFF range to 1 character. For example, in the Chinese operating system, the byte [0xD6, 0xD0] is used for storage. Different countries and regions have developed different standards, resulting in respective coding standards such as GB2312, BIG5, and JIS. These two bytes are used to represent the extended Chinese character encoding methods of a single character. they are called ANSI encoding. In a simplified Chinese system, ANSI encoding represents GB2312 encoding. in a Japanese operating system, ANSI encoding represents JIS encoding. Different ANSI encodings are incompatible. when information is exchanged internationally, texts in two languages cannot be stored in the same ANSI encoded text. If it is an English letter or a symbol, it is encoded as 1 byte and the maximum bit is 0. if it is a Chinese character, the maximum bit must be 1 and the size is 2 bytes. From this point of view, if an ansi text file is stored in a Chinese computer, if there is a Japanese or Korean, it may cause a code conflict, that is to say, we cannot use ansi encoding in notepad to store the Chinese-Japanese mixed text. The notepad in the computer is developed for the Chinese version of the system. If you want to use it in common, you have to store the txt file as a Unicode text file. Therefore, if you want to make something international, you can use Unicode for convenience. In fact, most of the current operating systems use Unicode encoding. if we use ansi encoding, the system still needs to convert it to Unicode during internal processing, which leads to lower code efficiency. It's easy to use Unicode! Php garbled code:
- $ Content = file_get_contents ("http://bbs.it-home.org /");
- $ Pattern = "// imsU ";
- $ Match = array ();
- Preg_match_all ($ pattern, $ content, $ match );
- Print_r ($ match );
-
Garbled characters occur. Add header ("Content-type: text/html; charset = utf-8. Used to set the html encoding format to UTF-8 The solution to garbled code is shown in three places: 1. database encoding 2. page encoding 3. connection encoding If the three items are consistent, no garbled code will occur. |