PHP under Mb_detect_encoding function to detect whether a string is UTF-8 encoded

Source: Internet
Author: User
Tags foreach chr ini md5


In PHP, you can use the mb_detect_encoding () () function to determine whether a string is a certain encoding, the specific use of methods and considerations are as follows:

Note that to use the Mb_detect_encoding function, you must turn on the mbstring extension in PHP (open the php.ini configuration, restart the service).

The method of judging is as follows:

if (mb_detect_encoding ($str, ' UTF-8 ', true))
{
is a character in UTF-8 format
}

Some people on the Internet say this function is not very prepared, in fact, the general still no problem.

example, using mb_detect_encoding () to determine whether a character is uft-8 encoded.

$encode = mb_detect_encoding ($q, Array (' GB2312 ', ' GBK ', ' UTF-8 '));
echo $encode. " <br/> ";
if ($encode = = "GB2312")
{
$q = Iconv ("GBK", "UTF-8", $q);
}
else if ($encode = = "GBK")
{
$q = Iconv ("GBK", "UTF-8", $q);
}
else if ($encode = = "EUC-CN")
{
$q = Iconv ("GBK", "UTF-8", $q);
}
else//cp936
{
$q = Iconv ("GB2312", "UTF-8", $q);
}

However, there is a mishap in mb_detect_encoding, often appear to judge inaccurate situation. Perhaps this can be solved:


It is inefficient to use iconv conversion and to judge whether it is equivalent.
function Is_utf8 ($STR) {
if ($str = = Iconv (' UTF-8 ', ' Utf-8//ignore ', $str)) {
Return ' UTF-8 ';
}
}
A variety of coding scenarios
function detect_encoding ($STR) {
foreach (Array (' GBK ', ' UTF-8 ') as $v) {
if ($str = = Iconv ($v, $v. '//ignore ', $str)) {
return $v;
}
}
}

After obtaining the string encoding information in the above way, we can use Iconv or mb_convert_encoding to convert the encoded

Example

<?php
/**
* Detection file encoding
* @param string $file file path
* @return String|null return the encoded name or null
*/
function detect_encoding ($file) {
$list = Array (' GBK ', ' UTF-8 ', ' utf-16le ', ' utf-16be ', ' iso-8859-1 ');
$str = file_get_contents ($file);
foreach ($list as $item) {
$tmp = mb_convert_encoding ($str, $item, $item);
if (MD5 ($tmp) = = MD5 ($STR)) {
return $item;
}
}
return null;
}
/**
* Automatically parse encoded read file
* @param string $file file path
* @param string $charset Read encoding
* @return string to return read content
*/
function Auto_read ($file, $charset = ' UTF-8 ') {
$list = Array (' GBK ', ' UTF-8 ', ' utf-16le ', ' utf-16be ', ' iso-8859-1 ');
$str = file_get_contents ($file);
foreach ($list as $item) {
$tmp = mb_convert_encoding ($str, $item, $item);
if (MD5 ($tmp) = = MD5 ($STR)) {
Return mb_convert_encoding ($str, $charset, $item);
}
}
Return "";
}

Example

I created three files: Text1.txt text2.txt text3.txt

Saved in ASCII UTF-8 UNICODE encoding, respectively

The code is as follows:

<?php
Define (' Utf32_big_endian_bom ', Chr (0x00). chr (0x00). chr (0xFE). Chr (0xFF));
Define (' Utf32_little_endian_bom ', Chr (0xFF). chr (0xFE). chr (0x00). Chr (0x00));
Define (' Utf16_big_endian_bom ', Chr (0xFE). Chr (0xFF));
Define (' Utf16_little_endian_bom ', Chr (0xFF). Chr (0xFE));
Define (' Utf8_bom ', Chr (0xEF). chr (0xBB). Chr (0xBF));

function detect_utf_encoding ($text) {
$first 2 = substr ($text, 0, 2);
$first 3 = substr ($text, 0, 3);
$first 4 = substr ($text, 0, 3);

if ($first 3 = Utf8_bom) return ' UTF-8 ';
ElseIf ($first 4 = Utf32_big_endian_bom) return to ' utf-32be ';
ElseIf ($first 4 = Utf32_little_endian_bom) return to ' Utf-32le ';
ElseIf ($first 2 = Utf16_big_endian_bom) return to ' utf-16be ';
ElseIf ($first 2 = Utf16_little_endian_bom) return to ' Utf-16le ';
}
function getfileencoding ($STR) {
$encoding =mb_detect_encoding ($STR);
if (empty ($encoding)) {
$encoding =detect_utf_encoding ($STR);
}
return $encoding;
}
$file = ' text1.txt ';
Echo getfileencoding (file_get_contents ($file)); Output ASCII
echo ' <br/> ';

$file = ' text2.txt ';
Echo getfileencoding (file_get_contents ($file)); Output UTF-8
echo ' <br/> ';
$file = ' text3.txt ';
Echo getfileencoding (file_get_contents ($file)); Output Utf-16le
echo ' <br/> ';
?>

Note: The php.ini in the Extension=php_mbstring.dll before the number removed, restart Apache can be.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.