: This article mainly introduces the php character conversion class, support ANSI, Unicode, Unicodebigendian, UTF-8, UTF-8 + Bom mutual conversion, for PHP tutorials interested in students can refer to it. Php character encoding conversion class, supports ANSI, Unicode, Unicode big endian, UTF-8, UTF-8 + Bom mutual conversion.
Four common text file encoding methods
ANSI code:
No file header (symbolic bytes starting with the file encoding)
ANSI-encoded letters and numbers take up one byte, and Chinese characters take up two bytes
Enter a line break in a single byte. the hexadecimal format is 0d 0a.
UNICODE encoding:
File header, expressed in hexadecimal format as FF FE
Each character is encoded in two bytes.
Enter the line break, double byte, expressed as 000d 000a in hexadecimal format
Unicode big endian encoding:
The file header in hexadecimal format is fe ff.
The subsequent encoding places the character's high position in front and the low position in the back, which is exactly the same as the Unicode encoding.
Enter the line break, double byte, in hexadecimal format: 0d00 0a00
UTF-8 code:
File header, in hexadecimal format: EF BB BF
UTF-8 is a Unicode variable length character encoding, numbers, letters, carriage return, line feed are expressed in one byte, Chinese characters accounted for 3 bytes
Enter a line break in a single byte. the hexadecimal format is 0d 0a.
Conversion principle: first convert character encoding to UTF-8, and then from the UTF-8 to the corresponding character encoding.
CharsetConv. class. php
_ Allow_charset) {$ this-> _ in_charset = $ in_charset;} // check the output code if (in_array ($ out_charset, $ this-> _ allow_charset )) {$ this-> _ out_charset = $ out_charset ;}} /** convert * @ param String $ str the String to be converted * @ return String the converted String */public function convert ($ str) {$ str = $ this-> convToUtf8 ($ str); // Convert it to utf8 $ str = $ this-> convFromUtf8 ($ str ); // Convert from utf8 to the corresponding encoding return $ str;}/** convert the encoding to the UTF-8 encoding * @ param String $ str * @ re Turn String */private function convToUtf8 ($ str) {if ($ this-> _ in_charset = 'utf-8') {// The encoding is UTF-8, return $ str;} switch ($ this-> _ in_charset) {case 'utf-8bom ': $ str = substr ($ str, 3); break; case 'ansi ': $ str = iconv ('gbk', 'utf-8 // IGNORE', $ str); break; case 'unicode ': $ str = iconv ('utf-16le', 'utf-8 // IGNORE ', substr ($ str, 2); break; case 'unicodebe ': $ str = iconv ('utf-16be', 'utf-8 // IGNORE ', s Ubstr ($ str, 2); break; default: break;} return $ str ;} /** convert the UTF-8 encoding to the output encoding * @ param String $ str * @ return String */private function convFromUtf8 ($ str) {if ($ this-> _ out_charset = 'utf-8') {// The output code is UTF-8 and does not need to be converted to return $ str ;} switch ($ this-> _ out_charset) {case 'utf-8bom ': $ str = "\ xef \ xbb \ xbf ". $ str; break; case 'ansi ': $ str = iconv ('utf-8', 'gbk // IGNORE', $ str); break; case 'unicode ': $ str = "\ xff \ xf E ". iconv ('utf-8', 'utf-16le // IGNORE ', $ str); break; case 'unicodebe': $ str = "\ xfe \ xff ". iconv ('utf-8', 'utf-16be // IGNORE ', $ str); break; default: break;} return $ str ;}// class end?>
Demo: convert unicode big endian to UTF-8 + bom
convert($str);file_put_contents('response/utf-8bom.txt', $response, true);?>
Source code: Click to view
The above introduces the php character conversion class, supports ANSI, Unicode, Unicode big endian, UTF-8, UTF-8 + Bom mutual conversion, including the content of the aspect, if you are interested in PHP tutorials.