This article gives you a detailed description of the character set in JavaScript, as well as the relevant knowledge of character set encoding and decoding. It is very meticulous. If you need it, you can refer to the next section.
1) Character and byte (Character)
A character is a general term for various characters and symbols, including garbled characters. A character corresponds to 1 ~ N Bytes, one byte corresponds to eight bits, each expressed as 0 or 1.
2) Character Set)
Character Set is a set of multiple characters. Each character set contains different numbers of characters. Common Character Set names include ASCII character set, GB2312 character set, and Unicode Character Set.
3) Character Encoding)
Character Set encoding is to convert a symbol into a computer-readable binary. decoding is to convert the binary into a human-readable symbol.
Character sets most correspond to a encoding method (for example, GBK corresponds to GBK encoding), But Unicode encoding has a variety of, including UTF-8, UTF-16, UTF-32 and UTF-7.
Currently, the most commonly used page is "UTF-8", UTF-8 uses one to four bytes for each character encoding, is a super set of ASCII, so the existing ASCII text does not need to be converted
Ii. browser hexadecimal
1) Use decimal and hexadecimal in HTML attributes
"8" can be used for decimal in HTML, and "Z" is used for hexadecimal notation. x is more than decimal, and a ~ is more than hexadecimal notation ~ F Represents 10 to 10 characters ~ 15.
2) the CSS attributes are in decimal and hexadecimal notation.
CSS is compatible with the HTML hexadecimal format. In addition, the hexadecimal format can also be expressed in the \ 6c format.
3) JavaScript code Encapsulation
The string octal and hexadecimal encoding modes can be directly executed through eval. octal is represented by "\ 56" and hexadecimal is represented by "\ x5c.
If Chinese characters are used in the Code and the hexadecimal encoding is required, only hexadecimal Unicode encoding can be performed. The format is \ u4ee3 \ u7801 ".
In the "Web Front-end hacker technology secrets", two methods are encapsulated for encoding and decoding. The following two methods are mainly used. The specific code can be viewed here.
Core code: "str. charCodeAt (char). toString (hexadecimal)" and "String. fromCharCode (parseInt (code, hexadecimal ))"
The charCodeAt () method returns an integer between 0 and 65535, representing the UTF-16 code unit at the given Index
The static String. fromCharCode () method returns the String created using the specified Unicode value sequence.
You can also use an online webpage to encode and decode "MonyerJS ".
4) Automatic HTML Decoding Mechanism
For example, if you enter the hexadecimal "Hello" in the webpage, it is automatically decoded as "hello ".
There are also some well-known space "" mechanisms.
3. browser Encoding
JavaScript has three pairs of functions that can be decoded by string encoding:
Escape/unescape, encodeURI/decodeURI, encodeURIComponent/decodeURIComponent.
The main difference is the number of unencoded characters.
1) escape has 69 unencoded characters
*, +,-,.,/, @, _, 0 ~ 9. ~ Z, ~ Z and escape is 0 ~ % U *** format is output when unicode values other than 255 are encoded.
2) encodeURI is not encoded with 82 characters
! , #, $, &, ', (,), *, +,-,.,/,:,;, = ,? ,@,_,~ , 0 ~ 9. ~ Z, ~ Z
3) encodeURIComponent does not encode 71 characters.
! ,',(,),*,-,.,_,~ , 0 ~ 9. ~ Z, ~ Z
For more information about JavaScript character set encoding and decoding, please refer to PHP Chinese network!