Php Chinese word segmentation source code minor issues This post was last edited by zhuzhaodan in December 2014-03-20define ('_ SP _', & nbsp; chr (0xFF ). chr (0xFE); & nbsp; define ('ucs2', & n small issues with php Chinese word segmentation source code
This post was last edited by zhuzhaodan on 22:26:08
define('_SP_', chr(0xFF).chr(0xFE));
define('UCS2', 'ucs-2be');
What is the role of these two constants? _ SP _ defined as chr (0xFF). chr (0xFE). What does it mean? I cannot find FF in the ascii table. What are the corresponding codes of FE?
The code below is as follows:
// Load the sub-Dictionary
$ Hw = '';
$ Ds = file ($ dicAddon); // A 17-row txt Dictionary file
Foreach ($ ds as $ d)
{
$ D = trim ($ d );
If ($ d = '') continue;
$ Estr = substr ($ d, 1, 1 );
If ($ estr = ':'){
$ Hw = substr ($ d, 0, 1 );
}
Else
{
$ Spstr = _ SP _;
$ Spstr = iconv (UCS2, 'utf-8', $ spstr); // How can ucs2 encoding appear? What's going on
$ Ws = explode (',', $ d); // Chinese characters in each row, which are divided into arrays by commas
$ Wall = iconv ('utf-8', UCS2, join ($ spstr, $ ws); // it is combined into a string with _ SP ??, Then it is converted to ucs2?
$ Ws = explode (_ SP _, $ wall); // split it into an array? What do you mean !!!
Foreach ($ ws as $ estr)
{
$ This-> addonDic [$ hw] [$ estr] = strlen ($ estr );
}
}
}
This code is to load the dictionary file, but I do not understand the logic of the ELSE code? Who can talk about it?
------ Solution --------------------
BOM is the encoding type declaration, and _ SP _ is interpreted as BOM to help you understand
You are not "the FF and FE codes cannot be found in the ascii table."
Check again
define('_SP_', chr(0xFF).chr(0xFE));
define('UCS2', 'ucs-2be');
$spstr = _SP_;
$spstr = iconv(UCS2, 'utf-8', $spstr);
echo bin2hex($spstr);
Obtain efbfbe
This is the UTF-8 BOM.
As for why he does this, you can see what's going on in the dictionary file.