In PHP, you can use the mb_detect_encoding () () function to determine whether a string is a certain encoding, the specific use of methods and considerations are as follows:
Note that to use the Mb_detect_encoding function, you must turn on the mbstring extension in PHP (open the php.ini configuration, restart the service).
The method of judging is as follows:
if (mb_detect_encoding ($str, ' UTF-8 ', true))
{
is a character in UTF-8 format
}
Some people on the Internet say this function is not very prepared, in fact, the general still no problem.
example, using mb_detect_encoding () to determine whether a character is uft-8 encoded.
$encode = mb_detect_encoding ($q, Array (' GB2312 ', ' GBK ', ' UTF-8 '));
echo $encode. " <br/> ";
if ($encode = = "GB2312")
{
$q = Iconv ("GBK", "UTF-8", $q);
}
else if ($encode = = "GBK")
{
$q = Iconv ("GBK", "UTF-8", $q);
}
else if ($encode = = "EUC-CN")
{
$q = Iconv ("GBK", "UTF-8", $q);
}
else//cp936
{
$q = Iconv ("GB2312", "UTF-8", $q);
}
However, there is a mishap in mb_detect_encoding, often appear to judge inaccurate situation. Perhaps this can be solved:
It is inefficient to use iconv conversion and to judge whether it is equivalent.
function Is_utf8 ($STR) {
if ($str = = Iconv (' UTF-8 ', ' Utf-8//ignore ', $str)) {
Return ' UTF-8 ';
}
}
A variety of coding scenarios
function detect_encoding ($STR) {
foreach (Array (' GBK ', ' UTF-8 ') as $v) {
if ($str = = Iconv ($v, $v. '//ignore ', $str)) {
return $v;
}
}
}
After obtaining the string encoding information in the above way, we can use Iconv or mb_convert_encoding to convert the encoded
Example
<?php
/**
* Detection file encoding
* @param string $file file path
* @return String|null return the encoded name or null
*/
function detect_encoding ($file) {
$list = Array (' GBK ', ' UTF-8 ', ' utf-16le ', ' utf-16be ', ' iso-8859-1 ');
$str = file_get_contents ($file);
foreach ($list as $item) {
$tmp = mb_convert_encoding ($str, $item, $item);
if (MD5 ($tmp) = = MD5 ($STR)) {
return $item;
}
}
return null;
}
/**
* Automatically parse encoded read file
* @param string $file file path
* @param string $charset Read encoding
* @return string to return read content
*/
function Auto_read ($file, $charset = ' UTF-8 ') {
$list = Array (' GBK ', ' UTF-8 ', ' utf-16le ', ' utf-16be ', ' iso-8859-1 ');
$str = file_get_contents ($file);
foreach ($list as $item) {
$tmp = mb_convert_encoding ($str, $item, $item);
if (MD5 ($tmp) = = MD5 ($STR)) {
Return mb_convert_encoding ($str, $charset, $item);
}
}
Return "";
}
Example
I created three files: Text1.txt text2.txt text3.txt
Saved in ASCII UTF-8 UNICODE encoding, respectively
The code is as follows:
<?php
Define (' Utf32_big_endian_bom ', Chr (0x00). chr (0x00). chr (0xFE). Chr (0xFF));
Define (' Utf32_little_endian_bom ', Chr (0xFF). chr (0xFE). chr (0x00). Chr (0x00));
Define (' Utf16_big_endian_bom ', Chr (0xFE). Chr (0xFF));
Define (' Utf16_little_endian_bom ', Chr (0xFF). Chr (0xFE));
Define (' Utf8_bom ', Chr (0xEF). chr (0xBB). Chr (0xBF));
function detect_utf_encoding ($text) {
$first 2 = substr ($text, 0, 2);
$first 3 = substr ($text, 0, 3);
$first 4 = substr ($text, 0, 3);
if ($first 3 = Utf8_bom) return ' UTF-8 ';
ElseIf ($first 4 = Utf32_big_endian_bom) return to ' utf-32be ';
ElseIf ($first 4 = Utf32_little_endian_bom) return to ' Utf-32le ';
ElseIf ($first 2 = Utf16_big_endian_bom) return to ' utf-16be ';
ElseIf ($first 2 = Utf16_little_endian_bom) return to ' Utf-16le ';
}
function getfileencoding ($STR) {
$encoding =mb_detect_encoding ($STR);
if (empty ($encoding)) {
$encoding =detect_utf_encoding ($STR);
}
return $encoding;
}
$file = ' text1.txt ';
Echo getfileencoding (file_get_contents ($file)); Output ASCII
echo ' <br/> ';
$file = ' text2.txt ';
Echo getfileencoding (file_get_contents ($file)); Output UTF-8
echo ' <br/> ';
$file = ' text3.txt ';
Echo getfileencoding (file_get_contents ($file)); Output Utf-16le
echo ' <br/> ';
?>
Note: The php.ini in the Extension=php_mbstring.dll before the number removed, restart Apache can be.