PHP utf-8 to Unicode function 1th/2 page _php tips

Source: Internet
Author: User
Tags ord string back
UTF code
The UTF-8 is to encode the UCS in 8-bit units. The encoding method from UCS-2 to UTF-8 is as follows:
UCS-2 encoding (16-in-system)
UTF-8 byte stream (binary)
0000-007f
0xxxxxxx
0080-07ff
110xxxxx 10xxxxxx
0800-ffff
1110xxxx 10xxxxxx 10xxxxxx
For example, the Unicode encoding of the word "Han" is 6c49. 6c49 between 0800-FFFF, so be sure to use the 3-byte template: 1110xxxx 10xxxxxx 10xxxxxx. The 6c49 is written as binary: 0110 110001 001001, which in turn replaces the X in the template with the following: 11100110 10110001 10001001, or E6 B1 89.
Finally, Unicode and UTF8 to each other to fix.
If the UTF-8 encoded character ch is 3 bytes. XX yy ZZ
To get XX and 1F and operate a
Make YY and 7F and Operations B
To get the ZZ and 7F and Operations C
(64a+b) *64+c = CH (Unicode encoding)
echo.php nothing. is a few functions.
");
Writing to a Unicode file
$ucs 2data = Utf8tounicode ($data, "little");
$endian = Chr (0xFE). chr (0xFF);
$endian = Chr (0xFF). chr (0xFE);
$rt = file_put_contents ("Ucs2.txt", $endian. $ucs 2data);
19:32,utf8tounicode function OK.
20:09. Found little endian and big endian problems. and resolved.
Big endian The Unicode string stored in a way that neither the UE nor the EditPlus can
Recognition. Only Notepad is normally identified.
$rt = file_put_contents ("Usc2ys_data.txt", $ucs 2_ysdata);
Write to UTF8 file
$utf 8data = UnicodeToUtf8 ($ucs 2data); 20:52. Turn the string back to UTF8 code OK.
$rt = file_put_contents ("Utf8.txt", $utf 8data);
Echo (UrlEncode ($utf 8data)), Echo ("");
$esc = Utf8escape ($data);
Echot ($ESC);
$esc = Phpescape ($data);
Echot ($ESC);
$unesc = Phpunescape ($ESC);
Echot ($UNESC);
/**
* This function converts UTF8 encoded strings into Unicode encoded string
* parameter str, UTF8 encoded string.
* Parameter order, storage data format, is big endian or little endian, the default Unicode storage sequence is little.
* For example: "Big" Unicode code is 5927. Little mode of storage is: 27 59. The big way is in the same order: 59 27.
* Little must have FF FE at the beginning of the storage format file. The file at which the big is stored begins with FE FF. Otherwise. will cause serious confusion.
* This function only converts characters and is not responsible for increasing the head.
* Iconv converted strings are stored by the big endian.
* returns ucs2string, converted string.
* Thank you for nagging (xuzuning)
*/
function Utf8tounicode ($str, $order = "little")
{
$ucs 2string = "";
$n =strlen ($STR);
For ($i =0 $i 0x80) {//110xxxxx 10xxxxxx
$a = (ord ($str [$i]) & 0x3F) 0x80 && ord ($str [$i +2]) >0x80) {//1110xxxx 10xxxxxx 10xxxxxx
$a = (ord ($str [$i]) & 0x1F) to UTF8 encoded strings
* Parameter str, a Unicode encoded string.
* parameter order, the sequence of Unicode strings, for big endian or little endian.
* returns utf8string, converted string.
*
*/
function UnicodeToUtf8 ($str, $order = "little")
{
$utf 8string = "";
$n =strlen ($STR);
for ($i =0; $i turn back.
$i + +; Two bytes represent a Unicode character.
$c = "";
if ($val utf8string. = $c;
}
return $utf 8string;
}//End Func

/*
* Encode the UTF8 encoded string as a Unicode code type, equivalent to escape
* Only accept UTF8 code, because only the UTF8 code and Unicode between the formula conversion, the other code must search the code table to convert.
* Do not know whether to find the UTF8 code is completely correct. Confused ing
* The code value is computed for each character, although the call Utf2ucs. Efficiency is too low. However, the code is clear, if you embed that computational process.
* The code is not easy to read.
*/
function Utf8escape ($STR) {
Preg_match_all ("/[\xc0-\xe0].| [\xe0-\xf0]..| [\x01-\x7f]+/", $str, $r);
PRT ($R);
$ar = $r [0];
foreach ($ar as $k => $v) {
$ord = Ord ($v [0]);
if ($ordutf 8 yards
$ar [$k] = "%u". Utf2ucs ($v);
}
ElseIf ($ordutf 8 yards
$ar [$k] = "%u". Utf2ucs ($v);
}
}//foreach
return join ("", $ar);
}
/**
*
* Convert UTF8 encoded characters into ucs-2 codes
* parameter UTF8 the encoded character.
* Returns the Unicode code value for this character. Knowing the code value, you can use CHR to get the characters out.
*
* Principle: Unicode conversion to Utf-8 code is the algorithm. Head fixed bit or.
The inverse algorithm of the process is this function, the head fixed bit inversion and.
*/
function Utf2ucs ($STR) {
$n =strlen ($STR);
if ($n =3) {
$highCode = Ord ($str [0]);
$midCode = Ord ($str [1]);
$lowCode = Ord ($str [2]);
$a = 0x1F & $highCode;
$b = 0x7F & $midCode;
$c = 0x7F & $lowCode;
$ucsCode = (64* $a + $b) *64 + $c;
}
ElseIf ($n ==2) {
$highCode = Ord ($str [0]);
$lowCode = Ord ($str [1]);
$a = 0x3F & $highCode; 0x3f is the complement of 0XC0
$b = 0x7F & $lowCode; 0x7f is the complement of 0x80
$ucsCode = 64* $a + $b;
}
ElseIf ($n ==1) {
$ucscode = Ord ($STR);
}
Return Dechex ($ucsCode);
}

/*
* Useful: This function is used to reverse the character encoded by the JavaScript escape function.
* Key Regular Search I don't know if there's a problem.
* Parameter: A JavaScript-encoded string.
* such as: UnicodeToUtf8 ("%u5927") = big
* 2005-12-10
*
*/
function Phpunescape ($ESCSTR) {
Preg_match_all ("/%u[0-9a-za-z]{4}|%.{ 2}| [0-9a-za-z.+-_]+/", $escstr, $matches); PRT ($matches);
$ar = & $matches [0];
$c = "";
foreach ($ar as $val) {
if (substr ($val, 0, 1)!= "%") {//If it is an alphanumeric +-_. ASCII code
$c. = $val;
}
ElseIf (substr ($val, 1,1)!= "U") {//If the ASCII code of the non-alphanumeric +-_.
$x = Hexdec (substr ($val, 1,2));
$c. =CHR ($x);
}
else {//If the code is greater than 0xFF
$val = Intval (substr ($val, 2), 16);
if ($val%u ". Bin2Hex (Iconv (' GBK '," UCS-2 ", $chars [$i]. $chars [$i +1]));
$i + +;
}
}//foreach
return $ar;
}
?>
Current 1/2 page 12 Next read the full text

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.