Some time ago there is a station, the news system needs the third party to push itself. After finding the JS code. Chain took to find is garbled.
That is, you need to link a section of JS and then replace a previously written news page. Directly in the quoted JS inside the Add a charset can be encoded to convert the line of the specific operation is as follows
The referenced JS code is
<script src= "http://news.abvde.edu.cn/user_newscolm.php?j=1.23.456.789" ></script>
Amended to
<script src= "http://news.abvde.edu.cn/user_newscolm.php?j=1.23.456.789" charset= "GBK" ></script>
On it, as for why not gb2312, this because some characters in gb2312 and in GBK, in order to expand compatibility, we directly use GBK to reduce errors.
Note that after using that JS, you may need to modify some of the static page generation and CSS styles.
The detailed GB2312 GBK GB18030 encoding differs as follows
Chinese characters in Unicode, GB2312, GBK and GB18030 GB18030 has two versions: gb18030-2000 and gb18030-2005. GB18030-2000 is the replacement version of GBK, its main feature is to increase the CJK Unified Chinese character on the basis of GBK. The main feature of gb18030-2005 is the addition of CJK Unified Chinese Character Expansion B on the basis of gb18030-2000. In this paper, we count the characters in GB18030 and look at the characters in other standards. 1 Chinese characters in Unicode
Of the 99,089 characters in Unicode 5.0, 71,226 characters are associated with Chinese characters. They are distributed as follows:
Block name |
Start code bit |
End Code Bit |
Number of characters |
CJK Unified Chinese Characters |
4E00 |
9FBB |
20924 |
|
CJK Unified Chinese Character expansion a |
3400 |
4db5 |
6582 |
|
CJK Unified Chinese Character Expansion B |
20000 |
2a6d6 |
42711 |
|
CJK Compatible Chinese Characters |
F900 |
Fa2d |
60W |
|
CJK Compatible Chinese Characters |
FA30 |
fa6a |
59 |
|
CJK Compatible Chinese Characters |
FA70 |
FAD9 |
106 |
CJK Compatible Chinese Character Supplement |
2f800 |
2fa1d |
542 |
If it is not a compatible Chinese character, the total number of Chinese characters currently supported by Unicode is 20924+6582+42711=70217.
Here's a little detail. In the early Unicode version, the CJK Unified Chinese character area is 0x4e00-0x9fa5, which is the 20,902 characters we often refer to. The current version of Unicode adds 22 characters, and the code bit is 0X9FA6-0X9FBB. They are:
So GB18030 supports these 22 characters. It will be discussed later. 2 GB2312
The 1980 GB2312 included a total of 7,445 characters, including 6,763 characters and 682 other symbols. The inner code range of the Chinese character area is high byte from B0-f7, low byte from A1-fe, and the code position occupied is 72*94=6768. 5 of these vacancies are d7fa-d7fe.
These 6,763 characters are not contiguous in Unicode and are distributed among the 20,902 Chinese characters in the CJK Unified Kanji Character Area (0X4E00-0X9FA5). 3 GBK
The 1995 Chinese character extension specification GBK1.0 contains 21,886 symbols, including 21,003 Chinese characters and 883 other symbols.
These 21003 Chinese characters include the 20,902 characters of CJK Unified Chinese character area. The remaining 101 Chinese characters include the addition of Chinese characters and 80 radicals, including 28 radicals and 52 Chinese characters. GBK encoding is from FE50-FE7E,FE80-FEA0. The following figure is marked with Unicode encoding.
The characters are not available in Unicode when the GBK is drawn up, so the code bit for the private area is used, and the 80-character code bit is 0xe815-0xe864. Later, Unicode will be included in 52 Chinese characters to "CJK Unified Chinese character expansion a". Of the 28 radicals, 14 radicals are included in the CJK Radical Supplemental area. So in the above illustration, these characters all have two Unicode encodings.
The 8 radicals in the pale yellow background above are included in the new area of the CJK Unified Chinese character area, the 0x9fa6-0x9fbb mentioned earlier. There are 6 of pale gray background of the radicals are included in Unicode to "CJK Unified Chinese Character Expansion B" (netizens slt correct).
Note that the 14 characters of light yellow and light gray should be mapped to the PUA code bit according to GB18030. The mapping relationship between these 14 characters and the PUA codes is only found by the netizens, not the standard Rules. If you follow GBK encoding, these 80 characters should all be mapped to the PUA code bit. GB18030 mapped 66 characters to a PUA code bit. However, in Windows, the default code page for the Simplified Chinese area is GBK, not GB18030. CJK Compatible Chinese character area selected out of 21 Chinese characters. See table below:
Chinese characters |
GBK Code |
Unicode encoding |
Lang |
fd9c |
f92c |
Cool |
fd9d |
F979 |
秊 |
fd9e |
F995 |
Here |
fd9f |
F9e7 |
Neigh or as |
FDA0 |
F9f1 |
Wu |
FE40 |
fa0c |
嗀 |
FE41 |
fa0d |
﨎 |
FE42 |
fa0e |
﨏 |
FE43 |
fa0f |
﨑 |
FE44 |
FA11 |
﨓 |
FE45 |
FA13 |
﨔 |
FE46 |
FA14 |
Ceremony |
FE47 |
FA18 |
﨟 |
FE48 |
fa1f |
蘒 |
FE49 |
FA20 |
﨡 |
fe4a |
FA21 |
﨣 |
fe4b |
FA23 |
﨤 |
fe4c |
FA24 |
Abduction |
Fe4d |
FA27 |
﨨 |
fe4e |
FA28 |
﨩 |
fe4f |
FA29 |
4 gb18030-2000
4.1 Vocabulary
GB18030-2000 's vocabulary is written in this way: the characters are encoded in Single-byte, Double-byte, and four bytes, respectively.
5.1 Single byte parts
In this standard, the Single-byte portion contains the GB 11383 0x00 to 0x7f all 128 characters and single-byte encoded Euro notation.
5.2 Double Byte part
In this standard, the two-byte part contains the contents as follows:
All CJK Unified Kanji characters in GB 13000.1.
The 21 kanji selected in the CJK Compatibility area of GB 13000.1.
GB 13000.1 and GB 2312 is not included in China's Taiwan region used graphics characters 139.
31 other characters included in GB 13000.1.
Non-kanji symbol in GB 2312.
19 vertical punctuation marks for GB 12345.
10 lowercase Roman digits not included in GB 2312.
GB 2312 is not included in the tonal Chinese Pinyin 5 as well as ɑ and ɡ.
Chinese character digit "0".
Ideographic literal descriptor 13.
Addition of Chinese characters and radicals/components 80.
A double-byte encoded Euro symbol.
5.3 Four bytes part
The four-byte portion of this standard contains all the characters in GB 13000.1, including CJK Unified Chinese character expansion A, in addition to the Double-byte characters mentioned above.
4.2 Chinese characters
As shown in the following table, gb18030-2000 contains 27,533 characters:
Category |
Code position Range |
Number of code digits |
Number of characters |
Character type |
Two-byte portion |
First byte 0xb0-0xf7 Second byte 0xa1-0xfe |
6768 |
6763 |
Chinese characters |
First byte 0x81-0xa0 Second byte 0x40-0xfe |
6080 |
6080 |
Chinese characters |
First byte 0xaa-0xfe Second byte 0x40-0xa0 |
8160 |
8160 |
Chinese characters |
Four bytes part |
First byte 0x81-0x82 Second byte 0x30-0x39 Third byte 0x81-0xfe Four bytes 0x30-0x39 |
6530 |
6530 |
CJK Unified Chinese Character expansion a |
27533 is 6763+6080+8160+6530. The two-byte part of the 6763+6080+8160=21003 character is the GBK 21,003 characters.
In Unicode, CJK Unified Chinese character expansion A has 6,582 characters, why there are only 6,530 Chinese characters.
This is because in the GBK era, the Double-byte part has included CJK Unified Chinese character expansion A of 52 Chinese characters, so there are more than 6,530 Chinese characters. 5 gb18030-2005 5.1 Vocabulary
Gb18030-2005 's vocabulary is written in this way: The characters included in this standard are encoded in Single-byte, Double-byte, or four bytes respectively.
5.1 Single byte parts
In this standard, the single byte part includes the GB/T 11383-1989 0x00 to 0x7f all 128 characters.
5.2 Double Byte part
In this standard, the two-byte part contains the contents as follows:
GB 13000.1-1993 All CJK Unified Kanji characters. See Appendix A.
The 21 kanji selected from the CJK compatible area of GB 13000.1-1993. See Appendix A.
GB 13000.1-1993 included in the GB 2312 not included in China's Taiwan region of the use of 139 graphics characters. See Appendix A.
GB 13000.1-1993 Other characters included in the 31. See Appendix A.
Non-kanji symbol in GB 2312. See Appendix A.
19 vertical punctuation marks for GB 12345. See Appendix A.
10 lowercase Roman digits not included in GB 2312. See Appendix A.
GB 2312 is not included in the tonal Chinese Pinyin 5 as well as ɑ and ɡ. See Appendix A.
Chinese character digit "0". See Appendix A.
Ideographic literal descriptor 13. See Appendix A and Appendix B.
80 additions of Chinese characters and radicals/components to GB 13000.1-1993. See Appendix A and Appendix C.
A double-byte encoded Euro symbol. See Appendix A.
5.3 Four bytes part
The four-byte portion of this standard, which includes the above Double-byte characters, is a GB 13000 CJK Unified Chinese character Extension A, CJK Unified Chinese Character Expansion B and Chinese minority characters that have been encoded in GB13000. See Appendix D.
The main change of gb18030-2005 is the addition of CJK Unified Chinese character Expansion B. It also removes the Single-byte encoded Euro sign (0x80). 5.2 Chinese Characters
As shown in the following table, gb18030-2005 contains 70,244 characters:
Category |
Code position Range |
Number of code digits |
Number of characters |
Character type |
Two-byte portion |
First byte 0xb0-0xf7 Second byte 0xa1-0xfe |
6768 |
6763 |
Chinese characters |
First byte 0x81-0xa0 Second byte 0x40-0xfe |
6080 |
6080 |
Chinese characters |
First byte 0xaa-0xfe Second byte 0x40-0xa0 |
8160 |
8160 |
Chinese characters |
Four bytes part |
First byte 0x81-0x82 Second byte 0x30-0x39 Third byte 0x81-0xfe Four bytes 0x30-0x39 |
6530 |
6530 |
CJK Unified Chinese Character expansion a |
First byte 0x95-0x98 Second byte 0x30-0x39 Third byte 0x81-0xfe Four bytes 0x30-0x39 |
42711 |
42711 |
CJK Unified Chinese Character Expansion B |
70244 is 6763+6080+8160+6530+42711. 6 Concluding remarks
GB2312 has 6,763 Chinese characters, GBK has 21,003 Chinese characters, gb18030-2000 has 27,533 characters, and gb18030-2005 has 70,244 characters.
Unicode 5.0, if not a compatible zone, there are currently 70,217 Chinese characters. Let's compare the Unicode 70217 characters with the 70244 characters in gb18030-2005:
gb18030-2005 |
Unicode 5.0 |
corresponds to Unicode Encoding |
CJK Unified Kanji 20902 Chinese characters |
CJK Unified Kanji 20902 Chinese characters |
0x4e00-0x9fa5 |
CJK Unified Chinese character expansion A's 6582 Chinese characters |
CJK Unified Chinese character expansion A's 6582 Chinese characters |
0x3400-0x4db5 |
CJK Unified Chinese Character Expansion B 42711 Chinese characters |
CJK Unified Chinese Character Expansion B 42711 Chinese characters |
0x20000-0x2a6d6 |
The 14 radicals in the CJK Radical Supplemental Area |
are not counted in |
2E81, 2E84, 2E88, 2e8b, 2e8c, 2E97, 2ea7, 2EAA, 2EAE, 2EB3, 2eb6, 2 EB7, 2EBB, 2ECA |
CJK compatible Kanji Area 21 characters |
not counted |
f92c, F979, F995, F9e7, F9F1, fa0c, fa0d, fa0e, fa0f, FA11, FA13, FA14, FA18, fa1f, FA20, FA21, FA23, FA24, FA27, FA28, FA29 |
"" 8 radicals |
CJK Unified Chinese Character Area new 8 characters |
0x9fb4-0x9fbb |
not counted into |
CJK Unified Kanji Area New 14 characters |
0x9fa6-0x9fb3 |
CJK Unified Chinese Character Area new 0X9FA6-0X9FB3 these 14 characters have corresponding code bits in the GB18030 (0x82358f33-0x82358f39, 0x82359030-0x82359039, 0x82359130-0x82359134), but GB18030 does not list these characters. Appendix 1 GBK Additional 80 Chinese characters and radicals
GBK added 80 characters were originally placed in the PUA area, and later was included in Unicode. Therefore, it can be expressed either by encoding the PUA region or by a PUA encoding. The form in the body may not be duplicated, and a table is added here:
Chinese characters |
GBK Code |
pua code |
Non-PUA code |
|
FE50 |
E815 |
2E81 |
|
FE51 |
E816 |
20087 |
|
FE52 |
E817 |
20089 |
|
FE53 |
E818 |
200CC |
|
FE54 |
E819 |
2E84 |
|
FE55 |
e81a |
3473 |
|
FE56 |
e81b |
3447 |
|
FE57 |
e81c |
2E88 |
|
FE58 |
e81d |
2e8b |
|
FE59 |
e81e |
9fb4 |
|
fe5a |
e81f |
359E |
|
fe5b |
E820 |
361A |
|
fe5c |
E821 |
360E |
|
fe5d |
E822 |
2e8c |
|
|