The interchange technology of GB Code and BIG5 code

Source: Internet
Author: User
Tags chr range
Chinese and English are different in ASCII code, which uses two bytes to represent it. In fact, the text file is saved in the corresponding two bytes per Chinese character encoding, and the display problem is automatically resolved by the Chinese operating system.
Encoding is not uniform, we use GB code, and Taiwan is using BIG5 code. The BIG5 code file holds the corresponding BIG5 encoding of Chinese characters, and the GB code file holds the corresponding GB code. So the key to the conversion work is to have a file that records each BIG5 encoding corresponding to the GB encoded Code table.
GB Code encoding rule is this: each Chinese character is composed of two bytes, the first byte range from 0xa1-0xfe, a total of 96 kinds. The second byte range is 0xa1-0xfe, a total of 96 species. The two bytes can be used to define a total of 96 * 96=8836 characters. There are actually 6,763 characters in total.
BIG5 code Coding rules are: Each Chinese character is composed of two bytes, the first byte range from 0x81-0xfe, a total of 126 kinds. The second byte range is 0x40-0x7e,0xa1-0xfe, a total of 157 species. In other words, the two bytes can be used to define a total of 126 * 157=19782 characters. These characters are part of our commonly used, such as, ding, these words we call common characters, its BIG5 code range of 0xa440-0xc671, a total of 5,401. Less commonly used words, such as abuse, modulation, we call the second word, the range of 0XC940-0XF9FE, a total of 7,652, the rest is some special characters.
The principle of making a Code table file is this: first write all the GB encoding into a file, and then, with GB code to BIG5 code conversion function software, such as Ucdos under the CONVERT.EXE, the file converted to BIG5 code file, that is, the Code table file.
The following program can write all GB codes to file Gb.txt (all of the following programs are written in FoxPro and can be easily converted to other languages)

fp = fopen ("Gb.txt", 2)
For i=161 to 247
For j=161 to 254
=fwrite (FP,CHR (i) +CHR (j))
Next
=fwrite (FP,CHR) +CHR (10))
Next
=fwrite (FP,CHR (26))
=fclose (FP)

The form of the file: The line corresponds to the first byte of the encoding, and the column corresponds to the second byte of the encoding. Please note that the encoded offset, such as kanji "ah" GB encoded 0XB1A1 first byte 0XB1 (177) Second byte 0xa1 (161) So he should be in the document (177-161=16) line ((161-161) *2=0) column.
Run CONVERT.EXE will convert gb.txt to BIG5 code files, so that you can get the BIG5 code table files organized by GB code big5.txt. On the contrary can also be BIG5 code organized by the GB Code table files.

The idea of conversion is this: (written with FoxPro)
First load the code table file into an array
fp = fopen ("Big5.txt")
i = 0
Do While feof (FP)
i = i+1
Dime Dict[i]
Dict[i] = fgets (FP)
Enddo
=fclose (FP)
Second, load the text to be converted into a variable
Create cursor Temp (mm m)
Append blank
Append memo mm from textFileName
Text = mm
Then scan the text and replace all the GB encodings
temp = ""
i = 1
Do While I < Len (text)
ch = substr (text,i,1)
If ISASCII (CH) && if ASCII
temp = Temp+ch
i = i+1
Else
CH1 = substr (text,i+1,1)
Big5 = substr (DICT[ASC (CH) -161+1], (ASC (CH1)-161) *2+1,2)
temp = Temp+big5
i = i+2
endif
Enddo
Finally, the converted text will be available in temp

It should be noted that in FoxPro the array pointer starts with 1, and the starting bit of the SUBSTR function >=1.
FoxPro Everyone should be able to read. The converted Big5.txt (17k) cannot be posted. If necessary, please contact me. email:czjsz_ah@stats.gov.cn



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.