PHP utf-8 to Unicode functions 1th/2 page

PHP utf-8 to Unicode functions 1th/2 page _php Tutorial

Last Update:2016-07-21 Source: Internet

Author: User

Tags string back

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

UTF encoding
The UTF-8 is to encode the UCS as a 8-bit unit. The encoding from UCS-2 to UTF-8 is as follows:
UCS-2 encoding (16 binary)
UTF-8 byte stream (binary)
0000-007f
0xxxxxxx
0080-07ff
110xxxxx 10xxxxxx
0800-ffff
1110xxxx 10xxxxxx 10xxxxxx
For example, the Unicode encoding of the word "Han" is 6c49. 6c49 is between 0800-ffff, so I'm sure to use a 3-byte template: 1110xxxx 10xxxxxx 10xxxxxx. The 6c49 is written as binary: 0110 110001 001001, using this bitstream in turn instead of the template x, get: 11100110 10110001 10001001, that is, E6 B1 89.
Finally, the Unicode and the UTF8 each other to fix.
If the UTF-8 encoded character ch is 3 bytes. XX yy ZZ
Get the XX and 1F and operations a
Get yy and 7F and operation to B
Get the ZZ and 7F and Operations C
(64a+b) *64+c = CH (Unicode encoding)
echo.php nothing. is just a few functions.
");
Write Unicode file
$ucs 2data = Utf8tounicode ($data, "little");
$endian = Chr (0xFE). chr (0xFF);
$endian = Chr (0xFF). chr (0xFE);
$rt = file_put_contents ("Ucs2.txt", $endian. $ucs 2data);
19:32,utf8tounicode function OK.
20:09. Found little endian and big endian problems. and resolved.
The big endian means that the Unicode string, UE and EditPlus, cannot be
Recognition. Only Notepad is normally recognized.
$rt = file_put_contents ("Usc2ys_data.txt", $ucs 2_ysdata);
Write UTF8 file
$utf 8data = UnicodeToUtf8 ($ucs 2data); 20:52. Turn the string back to the UTF8 code OK.
$rt = file_put_contents ("Utf8.txt", $utf 8data);
Echo (UrlEncode ($utf 8data)); Echo ("");
$esc = Utf8escape ($data);
Echot ($ESC);
$esc = Phpescape ($data);
Echot ($ESC);
$unesc = Phpunescape ($ESC);
Echot ($UNESC);
/**
* This function converts UTF8 encoded strings to Unicode encoded string
* parameter str, UTF8 encoded string.
* Parameter order, storage data format, is big endian or little endian, the default Unicode storage order is little.
* For example: "Large" Unicode code is 5927. The little mode is stored as: 27 59. The big way is the same order: 59 27.
* FF FE is required at the beginning of the little storage format file. The file opening of the big storage mode is FE FF. Otherwise. will cause serious confusion.
* This function only converts characters and is not responsible for increasing the head.
* Iconv converted strings are stored by big endian.
* Returns the ucs2string, converted string.
* Thanks for nagging (xuzuning)
*/
function Utf8tounicode ($str, $order = "little")
{
$ucs 2string = "";
$n =strlen ($STR);
for ($i =0; $i 0x80) {//110xxxxx 10xxxxxx
$a = (ord ($str [$i]) & 0x3F) 0x80 && ord ($str [$i +2]) >0x80) {//1110xxxx 10xxxxxx 10xxxxxx
$a = (ord ($str [$i]) & 0x1F) converted to UTF8 encoded string
* parameter str, Unicode encoded string.
* Parameter order, Unicode string storage order, for big endian or little endian.
* Returns the utf8string, converted string.
*
*/
function UnicodeToUtf8 ($str, $order = "little")
{
$utf 8string = "";
$n =strlen ($STR);
for ($i =0; $i turn back.
$i + +; A two-byte representation of a Unicode character.
$c = "";
if ($val utf8string. = $c;
}
return $utf 8string;
}//End Func

/*
* Encode UTF8 encoded string into Unicode pattern, equivalent to escape
* Only accept UTF8 code, because there is only UTF8 code and Unicode between the formula conversion, the other code is to find the code table to convert.
* I don't know if it's exactly right to find the UTF8 code. Lost ing
* Although Utf2ucs is called to calculate the code value for each character. Too inefficient. However, the code is clear, if you embed that computational process.
* The code is not very easy to read.
*/
function Utf8escape ($STR) {
Preg_match_all ("/[\xc0-\xe0].| [\xe0-\xf0]..| [\x01-\x7f]+/], $STR, $r);
PRT ($R);
$ar = $r [0];
foreach ($ar as $k = = $v) {
$ord = Ord ($v [0]);
if ($ordutf 8 yards
$ar [$k] = "%u". Utf2ucs ($v);
}
ElseIf ($ordutf 8 yards
$ar [$k] = "%u". Utf2ucs ($v);
}
}//foreach
return join ("", $ar);
}
/**
*
* Convert UTF8 encoded characters to ucs-2 encoding
* parameter UTF8 encoded characters.
* Returns the Unicode code value for this character. Knowing the code value, you can use CHR to get the characters out.
*
* Principle: Unicode to Utf-8 code algorithm is. Head fixed bit or.
The inverse algorithm of this process is the function, the head fixed bit inverse and.
*/
function Utf2ucs ($STR) {
$n =strlen ($STR);
if ($n =3) {
$highCode = Ord ($str [0]);
$midCode = Ord ($str [1]);
$lowCode = Ord ($str [2]);
$a = 0x1F & $highCode;
$b = 0x7F & $midCode;
$c = 0x7F & $lowCode;
$ucsCode = (64* $a + $b) *64 + $c;
}
ElseIf ($n ==2) {
$highCode = Ord ($str [0]);
$lowCode = Ord ($str [1]);
$a = 0x3F & $highCode; 0x3f is the complement of 0XC0
$b = 0x7F & $lowCode; 0x7f is the complement of 0x80
$ucsCode = 64* $a + $b;
}
ElseIf ($n ==1) {
$ucscode = Ord ($STR);
}
Return Dechex ($ucsCode);
}

/*
* Usefulness: This function is used to reverse the encoded character of the escape function of JavaScript.
* Key Regular Lookup I don't know if there is a problem.
* Parameters: JavaScript-encoded strings.
* such as: UnicodeToUtf8 ("%u5927") = Large
* 2005-12-10
*
*/
function Phpunescape ($ESCSTR) {
Preg_match_all ("/%u[0-9a-za-z]{4}|%.{ 2}| [0-9a-za-z.+-_]+/], $ESCSTR, $matches); PRT ($matches);
$ar = & $matches [0];
$c = "";
foreach ($ar as $val) {
if (substr ($val, 0, 1)! = "%") {//If it is an alphanumeric +-_. ASCII code
$c. = $val;
}
ElseIf (substr ($val,)! = "U") {//If the ASCII code is non-alphanumeric +-_.
$x = Hexdec (substr ($val,));
$c. =CHR ($x);
}
else {//If the code is greater than 0xFF
$val = Intval (substr ($val, 2), 16);
if ($val%u ". Bin2Hex (Iconv (' GBK '," UCS-2 ", $chars [$i]. $chars [$i +1]);
$i + +;
}
}//foreach
return $ar;
}
?>

http://www.bkjia.com/PHPjc/319074.html www.bkjia.com true http://www.bkjia.com/PHPjc/319074.html techarticle the UTF encoding UTF-8 is to encode the UCS as a 8-bit unit. The encoding from UCS-2 to UTF-8 is as follows: UCS-2 encoding (16 binary) UTF-8 byte stream (binary) 0000-007f 0xxxxxxx 0080-07ff ...



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More