UTF-8 Web page introduced a section of JS after the display as garbled solution.

Source: Internet
Author: User

Some time ago there is a station, the news system needs the third party to push itself. After finding the JS code. Chain took to find is garbled.

That is, you need to link a section of JS and then replace a previously written news page. Directly in the quoted JS inside the Add a charset can be encoded to convert the line of the specific operation is as follows

The referenced JS code is

<script src= "http://news.abvde.edu.cn/user_newscolm.php?j=1.23.456.789" ></script>

Amended to

<script src= "http://news.abvde.edu.cn/user_newscolm.php?j=1.23.456.789" charset= "GBK" ></script>

On it, as for why not gb2312, this because some characters in gb2312 and in GBK, in order to expand compatibility, we directly use GBK to reduce errors.

Note that after using that JS, you may need to modify some of the static page generation and CSS styles.

The detailed GB2312 GBK GB18030 encoding differs as follows

Chinese characters in Unicode, GB2312, GBK and GB18030 GB18030 has two versions: gb18030-2000 and gb18030-2005. GB18030-2000 is the replacement version of GBK, its main feature is to increase the CJK Unified Chinese character on the basis of GBK. The main feature of gb18030-2005 is the addition of CJK Unified Chinese Character Expansion B on the basis of gb18030-2000. In this paper, we count the characters in GB18030 and look at the characters in other standards. 1 Chinese characters in Unicode

Of the 99,089 characters in Unicode 5.0, 71,226 characters are associated with Chinese characters. They are distributed as follows:

Block name Start code bit End Code Bit Number of characters
CJK Unified Chinese Characters 4E00 9FBB 20924
CJK Unified Chinese Character expansion a 3400 4db5 6582
CJK Unified Chinese Character Expansion B 20000 2a6d6 42711
CJK Compatible Chinese Characters F900 Fa2d 60W
CJK Compatible Chinese Characters FA30 fa6a 59
CJK Compatible Chinese Characters FA70 FAD9 106
CJK Compatible Chinese Character Supplement 2f800 2fa1d 542

If it is not a compatible Chinese character, the total number of Chinese characters currently supported by Unicode is 20924+6582+42711=70217.

Here's a little detail. In the early Unicode version, the CJK Unified Chinese character area is 0x4e00-0x9fa5, which is the 20,902 characters we often refer to. The current version of Unicode adds 22 characters, and the code bit is 0X9FA6-0X9FBB. They are:

So GB18030 supports these 22 characters. It will be discussed later. 2 GB2312

The 1980 GB2312 included a total of 7,445 characters, including 6,763 characters and 682 other symbols. The inner code range of the Chinese character area is high byte from B0-f7, low byte from A1-fe, and the code position occupied is 72*94=6768. 5 of these vacancies are d7fa-d7fe.

These 6,763 characters are not contiguous in Unicode and are distributed among the 20,902 Chinese characters in the CJK Unified Kanji Character Area (0X4E00-0X9FA5). 3 GBK

The 1995 Chinese character extension specification GBK1.0 contains 21,886 symbols, including 21,003 Chinese characters and 883 other symbols.

These 21003 Chinese characters include the 20,902 characters of CJK Unified Chinese character area. The remaining 101 Chinese characters include the addition of Chinese characters and 80 radicals, including 28 radicals and 52 Chinese characters. GBK encoding is from FE50-FE7E,FE80-FEA0. The following figure is marked with Unicode encoding.

The characters are not available in Unicode when the GBK is drawn up, so the code bit for the private area is used, and the 80-character code bit is 0xe815-0xe864. Later, Unicode will be included in 52 Chinese characters to "CJK Unified Chinese character expansion a". Of the 28 radicals, 14 radicals are included in the CJK Radical Supplemental area. So in the above illustration, these characters all have two Unicode encodings.

The 8 radicals in the pale yellow background above are included in the new area of the CJK Unified Chinese character area, the 0x9fa6-0x9fbb mentioned earlier. There are 6 of pale gray background of the radicals are included in Unicode to "CJK Unified Chinese Character Expansion B" (netizens slt correct).

Note that the 14 characters of light yellow and light gray should be mapped to the PUA code bit according to GB18030. The mapping relationship between these 14 characters and the PUA codes is only found by the netizens, not the standard Rules. If you follow GBK encoding, these 80 characters should all be mapped to the PUA code bit. GB18030 mapped 66 characters to a PUA code bit. However, in Windows, the default code page for the Simplified Chinese area is GBK, not GB18030. CJK Compatible Chinese character area selected out of 21 Chinese characters. See table below:

Chinese characters GBK Code Unicode encoding
Lang fd9c f92c
Cool fd9d F979
fd9e F995
Here fd9f F9e7
Neigh or as FDA0 F9f1
Wu FE40 fa0c
FE41 fa0d
FE42 fa0e
FE43 fa0f
FE44 FA11
FE45 FA13
FE46 FA14
Ceremony FE47 FA18
FE48 fa1f
FE49 FA20
fe4a FA21
fe4b FA23
fe4c FA24
Abduction Fe4d FA27
fe4e FA28
fe4f FA29
4 gb18030-2000 4.1 Vocabulary

GB18030-2000 's vocabulary is written in this way: the characters are encoded in Single-byte, Double-byte, and four bytes, respectively.
5.1 Single byte parts
In this standard, the Single-byte portion contains the GB 11383 0x00 to 0x7f all 128 characters and single-byte encoded Euro notation.
5.2 Double Byte part
In this standard, the two-byte part contains the contents as follows:
All CJK Unified Kanji characters in GB 13000.1.
The 21 kanji selected in the CJK Compatibility area of GB 13000.1.
GB 13000.1 and GB 2312 is not included in China's Taiwan region used graphics characters 139.
31 other characters included in GB 13000.1.
Non-kanji symbol in GB 2312.
19 vertical punctuation marks for GB 12345.
10 lowercase Roman digits not included in GB 2312.
GB 2312 is not included in the tonal Chinese Pinyin 5 as well as ɑ and ɡ.
Chinese character digit "0".
Ideographic literal descriptor 13.
Addition of Chinese characters and radicals/components 80.
A double-byte encoded Euro symbol.
5.3 Four bytes part
The four-byte portion of this standard contains all the characters in GB 13000.1, including CJK Unified Chinese character expansion A, in addition to the Double-byte characters mentioned above.
4.2 Chinese characters

As shown in the following table, gb18030-2000 contains 27,533 characters:

Category Code position Range Number of code digits Number of characters Character type
Two-byte portion First byte 0xb0-0xf7
Second byte 0xa1-0xfe
6768 6763 Chinese characters
First byte 0x81-0xa0
Second byte 0x40-0xfe
6080 6080 Chinese characters
First byte 0xaa-0xfe
Second byte 0x40-0xa0
8160 8160 Chinese characters
Four bytes part First byte 0x81-0x82
Second byte 0x30-0x39
Third byte 0x81-0xfe
Four bytes 0x30-0x39
6530 6530 CJK Unified Chinese Character expansion a

27533 is 6763+6080+8160+6530. The two-byte part of the 6763+6080+8160=21003 character is the GBK 21,003 characters.

In Unicode, CJK Unified Chinese character expansion A has 6,582 characters, why there are only 6,530 Chinese characters.

This is because in the GBK era, the Double-byte part has included CJK Unified Chinese character expansion A of 52 Chinese characters, so there are more than 6,530 Chinese characters. 5 gb18030-2005 5.1 Vocabulary

Gb18030-2005 's vocabulary is written in this way: The characters included in this standard are encoded in Single-byte, Double-byte, or four bytes respectively.
5.1 Single byte parts
In this standard, the single byte part includes the GB/T 11383-1989 0x00 to 0x7f all 128 characters.
5.2 Double Byte part
In this standard, the two-byte part contains the contents as follows:
GB 13000.1-1993 All CJK Unified Kanji characters. See Appendix A.
The 21 kanji selected from the CJK compatible area of GB 13000.1-1993. See Appendix A.
GB 13000.1-1993 included in the GB 2312 not included in China's Taiwan region of the use of 139 graphics characters. See Appendix A.
GB 13000.1-1993 Other characters included in the 31. See Appendix A.
Non-kanji symbol in GB 2312. See Appendix A.
19 vertical punctuation marks for GB 12345. See Appendix A.
10 lowercase Roman digits not included in GB 2312. See Appendix A.
GB 2312 is not included in the tonal Chinese Pinyin 5 as well as ɑ and ɡ. See Appendix A.
Chinese character digit "0". See Appendix A.
Ideographic literal descriptor 13. See Appendix A and Appendix B.
80 additions of Chinese characters and radicals/components to GB 13000.1-1993. See Appendix A and Appendix C.
A double-byte encoded Euro symbol. See Appendix A.
5.3 Four bytes part
The four-byte portion of this standard, which includes the above Double-byte characters, is a GB 13000 CJK Unified Chinese character Extension A, CJK Unified Chinese Character Expansion B and Chinese minority characters that have been encoded in GB13000. See Appendix D.

The main change of gb18030-2005 is the addition of CJK Unified Chinese character Expansion B. It also removes the Single-byte encoded Euro sign (0x80). 5.2 Chinese Characters

As shown in the following table, gb18030-2005 contains 70,244 characters:

Category Code position Range Number of code digits Number of characters Character type
Two-byte portion First byte 0xb0-0xf7
Second byte 0xa1-0xfe
6768 6763 Chinese characters
First byte 0x81-0xa0
Second byte 0x40-0xfe
6080 6080 Chinese characters
First byte 0xaa-0xfe
Second byte 0x40-0xa0
8160 8160 Chinese characters
Four bytes part First byte 0x81-0x82
Second byte 0x30-0x39
Third byte 0x81-0xfe
Four bytes 0x30-0x39
6530 6530 CJK Unified Chinese Character expansion a
First byte 0x95-0x98
Second byte 0x30-0x39
Third byte 0x81-0xfe
Four bytes 0x30-0x39
42711 42711 CJK Unified Chinese Character Expansion B

70244 is 6763+6080+8160+6530+42711. 6 Concluding remarks

GB2312 has 6,763 Chinese characters, GBK has 21,003 Chinese characters, gb18030-2000 has 27,533 characters, and gb18030-2005 has 70,244 characters.

Unicode 5.0, if not a compatible zone, there are currently 70,217 Chinese characters. Let's compare the Unicode 70217 characters with the 70244 characters in gb18030-2005:

gb18030-2005 Unicode 5.0 corresponds to Unicode Encoding
CJK Unified Kanji 20902 Chinese characters CJK Unified Kanji 20902 Chinese characters 0x4e00-0x9fa5
CJK Unified Chinese character expansion A's 6582 Chinese characters CJK Unified Chinese character expansion A's 6582 Chinese characters 0x3400-0x4db5
CJK Unified Chinese Character Expansion B 42711 Chinese characters CJK Unified Chinese Character Expansion B 42711 Chinese characters 0x20000-0x2a6d6
The 14 radicals in the CJK Radical Supplemental Area are not counted in 2E81, 2E84, 2E88, 2e8b, 2e8c, 2E97, 2ea7, 2EAA, 2EAE, 2EB3, 2eb6, 2 EB7, 2EBB, 2ECA
CJK compatible Kanji Area 21 characters not counted f92c, F979, F995, F9e7, F9F1, fa0c, fa0d, fa0e, fa0f, FA11, FA13, FA14, FA18, fa1f, FA20, FA21, FA23, FA24, FA27, FA28, FA29
"" 8 radicals CJK Unified Chinese Character Area new 8 characters 0x9fb4-0x9fbb
not counted into CJK Unified Kanji Area New 14 characters 0x9fa6-0x9fb3

CJK Unified Chinese Character Area new 0X9FA6-0X9FB3 these 14 characters have corresponding code bits in the GB18030 (0x82358f33-0x82358f39, 0x82359030-0x82359039, 0x82359130-0x82359134), but GB18030 does not list these characters. Appendix 1 GBK Additional 80 Chinese characters and radicals

GBK added 80 characters were originally placed in the PUA area, and later was included in Unicode. Therefore, it can be expressed either by encoding the PUA region or by a PUA encoding. The form in the body may not be duplicated, and a table is added here:

Chinese characters GBK Code pua code Non-PUA code
FE50 E815 2E81
FE51 E816 20087
FE52 E817 20089
FE53 E818 200CC
FE54 E819 2E84
FE55 e81a 3473
FE56 e81b 3447
FE57 e81c 2E88
FE58 e81d 2e8b
FE59 e81e 9fb4
fe5a e81f 359E
fe5b E820 361A
fe5c E821 360E
fe5d E822 2e8c

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.