Phputf-8 to unicode functions page 12th. The UTF encoding UTF-8 is coded in 8 bits. The encoding method from UCS-2 to UTF-8 is as follows: UCS-2 encoding (hexadecimal) UTF-8 byte stream (binary) bytes -007f0xxxxxxx0080-07ff UTF encoding
The UTF-8 is coded in 8 bits. The encoding from UCS-2 to UTF-8 is as follows:
UCS-2
supporting Microsoft customers' use programs to access Cisco Unified Computing System (Cisco UCS ), cisco also provides customers with deep insight and comprehensive control over how Cisco Unified Data Center components work with Windows Server and application software.
Technical Innovation:
By combining the Cisco Nexus 1000V series (a comprehensive and scalable Virtual and cloud network platform) with Windows Server 2012 Hyper-V Extensible Switch an
specific Symbol correspondence table, may query unicode.org, or the specialized Chinese character correspondence table. The Unicode character set divides all characters into 17 levels (Plane) on a per-use basis, with 216 = 65,536 character code space at each level. The No. 0 level of BMP (basic Multilingual plane base multilingual plane), basically covers all the characters used in today's world. Other dimensions are either used to denote some ancient words or to be extended. The Unicode chara
expressed in the format of U-XXXXXXXX, while BMP encoding is usually usedIn the format of U + XXXX, X is a hexadecimal number. At the same time that ISO developed the UCs, another joint manufacturer organization was also working on developing such encoding, known as Unicode. Later, the two jointly developed a unified encoding, but released their respective standard documents, therefore, the UCS encoding an
Unicode is commonly known as unified code, universal code, single code, standard universal code.
Unicode development is under the responsibility of the non-profit organization unified code Alliance, which is committed to replacing the existing character encoding scheme with the Unicode scheme. Because some solutions often have only limited space and are not applicable to multilingual environments.
Unicode is recognized and widely used in the internationalization and localization of computer soft
principles of Unicode version 2 are the same, I will not talk about it much.
As mentioned above, we need to know the specific encoding method and determine the mark at the beginning of the text. below is the mark at the beginning of all codes.
Ef bb bf UTF-8Fe FF UTF-16/UCS-2, little endianFF Fe UTF-16/UCS-2, big endianFF Fe 00 00 UTF-32/UCS-4, little endian.00
character set, which is Unicode.The original Unicode standard UCS-2 uses two bytes to represent one character, so you can often hear the assertion that Unicode uses two bytes to represent a character. But soon some people think 256*256 too little, or not enough, so there is a UCS-4 standard, it uses 4 bytes to represent a character, but we use the most is still UCS
UCS-2 uses two bytes to represent one character, so you can often hear the assertion that Unicode uses two bytes to represent a character. But soon some people think 256*256 too little, or not enough, so there is a UCS-4 standard, it uses 4 bytes to represent a character, but we use the most is still UCS-2. The UCS (U
is a language developed by all the countries in the world if we describe all kinds of text coding as dialects of different places.
In this language environment, there will be no more language coding conflicts, under the same screen, can display any language content, this is the greatest advantage of Unicode.
So how is Unicode encoded? actually very simple.
is to encode all the text in the world in 2 bytes. You might ask, 2 bytes can represent up to 65,536 encodings, is it enough?
Most of th
', js_unescape ($ _ REQUEST ['p _ sort ']);
At this point, we have successfully reversed the js escape code.
As follows:
In addition, I found a function that uses PHP to implement js escape encoding:
The code is as follows:
Function phpescape ($ str)
{
$ Sublen = strlen ($ str );
$ RetrunString = "";
For ($ I = 0; $ I {
If (ord ($ str [$ I]) >= 127)
{
$ TmpString = bin2hex (iconv ("gb2312", "UCS-2", substr ($ str, $ I, 2 )));
// $ TmpString = substr
FE.Unicode character SetFunction: Unified coding for 650 languages of the world, compatible with iso-8859-1.Number of digits: The Unicode character set is encoded in multiple ways, utf-8,utf-16 and UTF-32, respectively.BIG5 Character SetFunction: Unify traditional Chinese characters encoding.Number of digits: represented by 2 bytes, representing 13,053 kanji.Range: High byte from A1 to F9, low byte from 40 to 7E,A1 to FE.GB18030 Character SetFunction: It solves the encoding of Chinese, Japanese
UTF Encoding the UTF-8 is 8-bit to encode the UCS. The encoding method from UCS-2 to UTF-8 is as follows:
UCS-2 coding (hexadecimal)
the UTF-8 byte stream (Binary)
0000-007f
0 xxxxxxx
0080-07ff
110 XXXXX 10 xxxxxx
0800-FFF
UCS-2 uses two bytes to represent one character, so you can often hear the assertion that Unicode uses two bytes to represent a character. But soon some people think 256*256 too little, or not enough, so there is a UCS-4 standard, it uses 4 bytes to represent a character, but we use the most is still UCS-2. The UCS (U
MBCS. Also, in the default locale of the simplified Chinese windows, refer to GBK. 1.3. UnicodeLater, someone began to think that too much coding caused the world to become too complex, so that the brain hurts, so we sit together and shoot the head to come up with a method: All language characters are expressed in the same character set, which is Unicode.The original Unicode standard UCS-2 uses two bytes to represent one character, so you can often h
standard UCS-2 uses two bytes to represent one character, so you can often hear the assertion that Unicode uses two bytes to represent a character. But soon some people think 256*256 too little, or not enough, so there is a UCS-4 standard, it uses 4 bytes to represent a character, but we use the most is still UCS-2. The UCS
assign page numbers, GBK is the No. 936 page, that is, CP936. Therefore, you can also use CP936 to represent GBK.
MBCS (Multi-Byte Character Set) is a generic term for these encodings. So far everyone has used double-byte, so it is sometimes called DBCS (Double-byte Character Set). It's important to be clear that MBCS is not a particular encoding, and in Windows, depending on the area you set up, MBCS refers to different encodings, and Linux cannot use MBCS as the encoding. You can't see MBCS
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.