UTF code
The UTF-8 is to encode the UCS in 8-bit units. The encoding method from UCS-2 to UTF-8 is as follows:
UCS-2 encoding (16-in-system)
UTF-8 byte stream (binary)
0000-007f
0xxxxxxx
0080-07ff
110xxxxx 10xxxxxx
0800-ffff
1110xxxx 10xxxxxx 10xxxxxx
For example, the Unicode encoding of the word "Han" is 6c49. 6c49 between 0800-FFFF, so be sure to use the 3-byte template: 1110xxxx 10xxxxxx 10xxxxxx. The 6c49 is written as binary: 0110 110001 001001, which in turn replaces the X in the template with the following: 11100110 10110001 10001001, or E6 B1 89.
Finally, Unicode and UTF8 to each other to fix.
If the UTF-8 encoded character ch is 3 bytes. XX yy ZZ
To get XX and 1F and operate a
Make YY and 7F and Operations B
To get the ZZ and 7F and Operations C
(64a+b) *64+c = CH (Unicode encoding)
echo.php nothing. is a few functions.
");
Writing to a Unicode file
$ucs 2data = Utf8tounicode ($data, "little");
$endian = Chr (0xFE). chr (0xFF);
$endian = Chr (0xFF). chr (0xFE);
$rt = file_put_contents ("Ucs2.txt", $endian. $ucs 2data);
19:32,utf8tounicode function OK.
20:09. Found little endian and big endian problems. and resolved.
Big endian The Unicode string stored in a way that neither the UE nor the EditPlus can
Recognition. Only Notepad is normally identified.
$rt = file_put_contents ("Usc2ys_data.txt", $ucs 2_ysdata);
Write to UTF8 file
$utf 8data = UnicodeToUtf8 ($ucs 2data); 20:52. Turn the string back to UTF8 code OK.
$rt = file_put_contents ("Utf8.txt", $utf 8data);
Echo (UrlEncode ($utf 8data)), Echo ("");
$esc = Utf8escape ($data);
Echot ($ESC);
$esc = Phpescape ($data);
Echot ($ESC);
$unesc = Phpunescape ($ESC);
Echot ($UNESC);
/**
* This function converts UTF8 encoded strings into Unicode encoded string
* parameter str, UTF8 encoded string.
* Parameter order, storage data format, is big endian or little endian, the default Unicode storage sequence is little.
* For example: "Big" Unicode code is 5927. Little mode of storage is: 27 59. The big way is in the same order: 59 27.
* Little must have FF FE at the beginning of the storage format file. The file at which the big is stored begins with FE FF. Otherwise. will cause serious confusion.
* This function only converts characters and is not responsible for increasing the head.
* Iconv converted strings are stored by the big endian.
* returns ucs2string, converted string.
* Thank you for nagging (xuzuning)
*/
function Utf8tounicode ($str, $order = "little")
{
$ucs 2string = "";
$n =strlen ($STR);
For ($i =0 $i 0x80) {//110xxxxx 10xxxxxx
$a = (ord ($str [$i]) & 0x3F) 0x80 && ord ($str [$i +2]) >0x80) {//1110xxxx 10xxxxxx 10xxxxxx
$a = (ord ($str [$i]) & 0x1F) to UTF8 encoded strings
* Parameter str, a Unicode encoded string.
* parameter order, the sequence of Unicode strings, for big endian or little endian.
* returns utf8string, converted string.
*
*/
function UnicodeToUtf8 ($str, $order = "little")
{
$utf 8string = "";
$n =strlen ($STR);
for ($i =0; $i turn back.
$i + +; Two bytes represent a Unicode character.
$c = "";
if ($val utf8string. = $c;
}
return $utf 8string;
}//End Func
/*
* Encode the UTF8 encoded string as a Unicode code type, equivalent to escape
* Only accept UTF8 code, because only the UTF8 code and Unicode between the formula conversion, the other code must search the code table to convert.
* Do not know whether to find the UTF8 code is completely correct. Confused ing
* The code value is computed for each character, although the call Utf2ucs. Efficiency is too low. However, the code is clear, if you embed that computational process.
* The code is not easy to read.
*/
function Utf8escape ($STR) {
Preg_match_all ("/[\xc0-\xe0].| [\xe0-\xf0]..| [\x01-\x7f]+/", $str, $r);
PRT ($R);
$ar = $r [0];
foreach ($ar as $k => $v) {
$ord = Ord ($v [0]);
if ($ordutf 8 yards
$ar [$k] = "%u". Utf2ucs ($v);
}
ElseIf ($ordutf 8 yards
$ar [$k] = "%u". Utf2ucs ($v);
}
}//foreach
return join ("", $ar);
}
/**
*
* Convert UTF8 encoded characters into ucs-2 codes
* parameter UTF8 the encoded character.
* Returns the Unicode code value for this character. Knowing the code value, you can use CHR to get the characters out.
*
* Principle: Unicode conversion to Utf-8 code is the algorithm. Head fixed bit or.
The inverse algorithm of the process is this function, the head fixed bit inversion and.
*/
function Utf2ucs ($STR) {
$n =strlen ($STR);
if ($n =3) {
$highCode = Ord ($str [0]);
$midCode = Ord ($str [1]);
$lowCode = Ord ($str [2]);
$a = 0x1F & $highCode;
$b = 0x7F & $midCode;
$c = 0x7F & $lowCode;
$ucsCode = (64* $a + $b) *64 + $c;
}
ElseIf ($n ==2) {
$highCode = Ord ($str [0]);
$lowCode = Ord ($str [1]);
$a = 0x3F & $highCode; 0x3f is the complement of 0XC0
$b = 0x7F & $lowCode; 0x7f is the complement of 0x80
$ucsCode = 64* $a + $b;
}
ElseIf ($n ==1) {
$ucscode = Ord ($STR);
}
Return Dechex ($ucsCode);
}
/*
* Useful: This function is used to reverse the character encoded by the JavaScript escape function.
* Key Regular Search I don't know if there's a problem.
* Parameter: A JavaScript-encoded string.
* such as: UnicodeToUtf8 ("%u5927") = big
* 2005-12-10
*
*/
function Phpunescape ($ESCSTR) {
Preg_match_all ("/%u[0-9a-za-z]{4}|%.{ 2}| [0-9a-za-z.+-_]+/", $escstr, $matches); PRT ($matches);
$ar = & $matches [0];
$c = "";
foreach ($ar as $val) {
if (substr ($val, 0, 1)!= "%") {//If it is an alphanumeric +-_. ASCII code
$c. = $val;
}
ElseIf (substr ($val, 1,1)!= "U") {//If the ASCII code of the non-alphanumeric +-_.
$x = Hexdec (substr ($val, 1,2));
$c. =CHR ($x);
}
else {//If the code is greater than 0xFF
$val = Intval (substr ($val, 2), 16);
if ($val%u ". Bin2Hex (Iconv (' GBK '," UCS-2 ", $chars [$i]. $chars [$i +1]));
$i + +;
}
}//foreach
return $ar;
}
?>
Current 1/2 page
12 Next read the full text