======= first introduce the next bom==============
Bytes Encoding Form
EF BB BF UTF-8
FF FE UTF-16 aka UCS-2, Little Endian
FE FF UTF-16 aka UCS-2, Big endian
FF FE UTF-32 aka UCS-4, Little endian.
FE FF UTF-32 aka UCS-4, Big-endian.
=======================
Reading a Unicode CSV file
function Fopen_utf8 ($filename) {
$encoding = ';
$handle = fopen ($filename, ' R ');
$bom = Fread ($handle, 2);
Fclose ($handle);
Rewind ($handle);
if ($bom = = Chr (0xff). chr (0xFE) | | $bom = = = Chr (0xfe). Chr (0xFF)) {
UTF16 Byte Order Mark present
$encoding = ' UTF-16 ';
} else {
$file _sample = fread ($handle, 1000) + ' E '; Read the 1000 bytes
+ E is a workaround for mb_string bug
Rewind ($handle);
$encoding = Mb_detect_encoding () ($file _sample, ' UTF-8, UTF-7, ASCII, Euc-jp,sjis, Eucjp-win, Sjis-win, JIS, Iso-2022-jp ' );
}
if ($encoding) {
Stream_filter_append ($handle, ' Convert.iconv. ') $encoding. ' /utf-8 ');
}
return ($handle);
}
Generate Unicode CSV (this PHP file must be a UTF-8 encoded file without a BOM)
? View Code PHP
$content =iconv ("UTF-8", "Utf-16le", $content);
$content = "\xff\xfe". $content; Add BOM
Header ("Content-type:text/csv;charset=utf-16le");
Header ("content-disposition:attachment; Filename=test.csv ");
again, an action class that operates in ANSI code with "," separated
<?php
The Unicode BOM is U+feff, but after encoded, it'll look like this.
Define (' Utf32_big_endian_bom ', Chr (0x00). chr (0x00). chr (0xFE). Chr (0xFF));
Define (' Utf32_little_endian_bom ', Chr (0xFF). chr (0xFE). chr (0x00). Chr (0x00));
Define (' Utf16_big_endian_bom ', Chr (0xFE). Chr (0xFF));
Define (' Utf16_little_endian_bom ', Chr (0xFF). Chr (0xFE));
Define (' Utf8_bom ', Chr (0xEF). chr (0xBB). Chr (0xBF));
function detect_utf_encoding ($filename) {
$text = file_get_contents ($filename);
$first 2 = substr ($text, 0, 2);
$first 3 = substr ($text, 0, 3);
$first 4 = substr ($text, 0, 3);
if ($first 3 = Utf8_bom) return ' UTF-8 ';
ElseIf ($first 4 = Utf32_big_endian_bom) return to ' utf-32be ';
ElseIf ($first 4 = Utf32_little_endian_bom) return to ' Utf-32le ';
ElseIf ($first 2 = Utf16_big_endian_bom) return to ' utf-16be ';
ElseIf ($first 2 = Utf16_little_endian_bom) return to ' Utf-16le ';
}
?>