Chinese character garbled Solution

Source: Internet
Author: User
Tags base64 encode bmp image printable characters truncated

Chinese character garbled Solution
Www.xyhhxx.com Publisher: Seo time:

We often encounter garbled characters when using computers: for example, we can see garbled characters when boarding the Hong Kong and Taiwan websites and when we open e-mail, we can also see garbled characters, what's more serious is that the normal Win9x/Win2k desktop and Chinese characters in the menu are displayed completely beyond the control overnight, and the Chinese characters in various applications (including games) that are displayed normally become garbled! Garbled text brings us too much trouble. We wish to say goodbye to Garbled text!
I. Classification of garbled Chinese Characters
Chinese character garbled characters can be roughly divided into four categories: webpage, text, document, and file garbled characters. The first category is caused by the unavailability of the big5 codes of Traditional Chinese and simplified Chinese (gb2312) of Hong Kong and Taiwan; the second type is garbled characters in the Win9x/Win2k System (menu, desktop, and prompt box). This is caused by improper font settings in the Win9x/Win2k registry; the third type is garbled characters in places where various applications (including games) are originally displaying Chinese characters. The reasons for the formation are complicated, and the second type is garbled, it may also be because the Chinese dynamic link library used by the software is overwritten by the English dynamic link library. The last type is email garbled.
Ii. Eliminate Chinese Character garbled characters
In view of the above several garbled characters, We will classify and introduce methods to eliminate garbled characters.

(1) elimination of garbled web pages, text and document files
Garbled web pages are formed when the browser (such as IE) Interprets HTML web pages. In the code of the webpage, for example, <HTML> Another solution is to install a multi-language support package for the browser without modifying the webpage Code (for example, to install a multi-language support package when installing IE). In this way, when the webpage is garbled, you can select "View"/"encoding"/"auto select"/Simplified Chinese (gb2312) in the browser ), for example, select "View"/"encoding"/"auto-select"/"big5" for traditional Chinese, and select the corresponding language in other languages, in this way, webpage garbled characters can be eliminated.
Another solution is to use the multi-Internal code display platform to convert the internal code. Common multi-Internal code Display platforms include:
"Antarctic Star": It can automatically identify the GB and big5 codes and display them in simplified or traditional Chinese formats. It can also display the GB and big5 codes on the same screen and correctly display Japanese and Korean. : Http: // www.njstar.com
"Four benefits": supports 17 types of Chinese characters including GB, big5, Hz, Japan and South Korea, and unicod, and also supports preview, in addition, small and useful functions such as "adding or deleting spaces" and "inserting prohibited spaces" are added, which is a good helper for netizens. : Http: // www.srsnet.com
"Magicwin 98": different inner codes can be displayed on the same screen, that is, both the GB and big5 codes coexist and can be displayed normally. It supports formats such as GB, Hz, big5, JIS, EUC, sjis, KSC, utf7, and utf8, and software such as Netscape Communicator 4.X, Internet Explorer, and office; allows you to view documents with different internal codes in multiple windows. : Http: // www.itwin.com. my/magicwin
The method for saving a webpage without garbled characters is: when you open the webpage in a browser, select "auto select" in "View"/"encoding", and select "webpage" as the storage type ", select "unicod" for encoding. When the saved web page is opened again, select Simplified Chinese (gb2312) and simplified Chinese (HZ) in "View" and "encoding" in the browser menu) or Unicode (UTF-8) or traditional Chinese (big5), the final display will not appear garbled.
Garbled text and document files are generally displayed in the simplified Chinese system or in the opposite situation. As long as you convert the originally traditional internal code into a simplified internal code (or the opposite), you can eliminate Garbled text.
Word2000 can do this kind of work. For example, to convert traditional Chinese to simplified Chinese, the method is: select the file to convert the internal code, in the pop-up dialog box (1 ), select "Traditional Chinese (big5)" in "Other encoding" to open this file without garbled characters. Saving without garbled characters: when saving, select "Save as" in "file", save it as "Word documents", save the disk, open it, and save it as plain text or other formats; you can also use Word2000's "simplified Chinese character conversion" tool to save without garbled characters by selecting "tool/Language/simplified Chinese character conversion" on the menu bar, and then saving the code after conversion.
Wps2000 can also be used to convert the internal code. It supports three major Chinese character codes, including gb2312, big5, and GBK. It can also be used to convert the internal code when outputting the RTF, txt, and htm files.
In addition, you can use an internal code conversion tool to convert big5 (Traditional Chinese) and gb2312 (Simplified Chinese. Common internal code conversion tools include:
"Hurricane simple and prosperous": free Chinese software, pure "green software", no installation, more than 300 K compressed package to any directory on the hard disk to use. The software supports conversion of common text, clipboard content, disk file big5 → GB or GB → big5 internal code, which is fast. The software features the "website conversion" function: automatically generates a big5 code version for your website within a few minutes. Of course, when do you get a website from Hong Kong and Taiwan, you can also start a big5 → GB conversion, and then easily enjoy it. : Http://renliang.yeah.net
"Internal Code Conversion master": it can select a large number of files in a flexible way and convert them at the same time. The selected file can be directly viewed before and after conversion. The conversion is directly performed on the original file, saving the tedious operations such as selecting the target directory and copying multiple times. Exclusive: For HTML files, they can automatically convert their Chinese character set definitions so that they are automatically displayed in a centralized manner according to the converted code when viewed in a browser. Currently, gb2312 and big5 code sets are supported. : Http://rchan.yeah.net
"Mandarin connect": Free Software, supports big5 <-> GB bidirectional conversion, supports text, web pages, RTF, and other formats, can be converted in batches. : Http://lanny.yeah.net
"Text robot": supports conversion of multiple inner codes such as big5 <-> GB, big5 → GBK, big5 → GBK simplified, GBK traditional → GBK simplified, and can convert the text format to a Web page, batch internal code conversion is supported. : Http://denvor.yeah.net
"Cross-strait pass Chinese character internal code converter": Free Software, supporting two-way conversion of GB <-> big5, big5 <-> GB, batch conversion, and direct conversion of text in the clipboard. Before converting a file, you can preview the conversion result in the preview window. : Http: // www.njstar.com

(2) elimination of garbled characters in Win9x/Win2k Systems
This type of Garbled text is caused by incorrect font configurations in the registry. Even if you use internal code translation software such as siteconnoisseurs, Antarctic stars, magic Win98, and cross-strait communication, it will not be eliminated. Solution: Restore the font settings in the registry.
If a machine of the same Win9x/Win2k version is displayed as normal, follow these steps:
1. Select "start"/"run" on the normal machine, and type "Regedit" in the dialog box to open the Registry Editor;
2. move the cursor to "HKEY_LOCAL_MACHINE/system/current control set/control/fontassoc", select "Registry/export Registry File", and then select "branch ", export the registry information of this branch to a file (such as Li. reg) (2 );
3. set Li. copy the reg file to a machine with garbled characters, Run regedit on the machine with garbled characters, select "register Registry"/"Import Registry", and set Li. the reg file is imported into the registry.
If you do not have a machine that has the same Win9x/Win2k version and is displayed normally, you need to manually restore some of the font registry keys by performing the following steps:
1. Open Regedit on the garbled machine (Regedit. EXE is in the Windows directory );
2. Find "HKEY_LOCAL_MACHINE/system/current control set/control/fontassoc". Under normal circumstances, there will be two folders: associated defaultfonts and associated charset. The correct content is:
Subdirectory content
Chinese Win98
Chinese Win98 (OEM Edition)
Chinese Win2k
Aasociated charset
ANSI (00) = "yes"
Gb2312 (86) = "yes"
Den (FF) = "yes"
Symbol (02) = "no"
ANSI (00) = "yes"
Gb2312 (86) = "yes"
OEM (FF) = "yes"
Symbol (02) = "no"
ANSI (00) = "yes"
OEM (FF) = "yes"
Symbol (02) = "no"
Associated default
Fonts
Assocsystemfont = "simsun. TTF"
Fontpackagedecorative = ""
Fontpackagedontcare = ""
Fontpackagemodern = ""
Fontpackageroman = ""
Fontpackagescript = ""
Fontpackageswiss = ""
Same as the left column
Assocsystemfont = "simsun. TTF"
Fontpackage = ""
Fontpackagedecorative = ""
Fontpackagedontcare = ""
Fontpackagemodern = ""
Fontpackageroman = ""
Fontpackagescript = ""
Fontpackageswiss = ""
3. When Chinese characters are garbled, the content in the above two folders will be incomplete, and some will not have the associated charset folder or the content is incomplete; some will have incomplete content in the associated defaultfonts. You only need to use Regedit in "HKEY_LOCAL_MACHINE/system/CurrentControlSet/control/fontassoc" and restore it based on the correct content.

(3) elimination of garbled apps (including Games)
Garbled characters may occur on the display interface such as the Chinese software menu, which may be caused by incorrect changes in the font settings in the Windows registry, at this time, we can use the solution described in the preceding 2nd points.
If the above method cannot be solved, it is because the Chinese Link Library of the software is overwritten by the English Link Library, this phenomenon often occurs in Chinese software developed by Microsoft development tools such as VB and VC. In such software, the Chinese characters on menus and other Display Interfaces are all subject to a dynamic link library (DLL file) the dynamic link library of the software is usually installed in the Windows System directory. If an English software is installed later, a dynamic link library with the same name is used, the dynamic link library of the English software will cover the dynamic link library of the Chinese software under Windows/system. In this way, when running the Chinese software, the dynamic link library of the English language will be called, so garbled characters appear. The solution is to reinstall the Chinese software and restore the dynamic link library.

(4) Eliminate email garbled characters
1. Cause and troubleshooting of email garbled characters
There are many reasons for email garbled characters, including the following:
(1) The email server does not support 8-bit (non-ASCII format)
Different Email transmission mechanisms or encoding may cause the email server to not support 8-bit (non-ASCII format) transmission and cause garbled mail. The mail server may not be able to process binary files, such as tokens, so it filters out the 8th bits of each character in the mail, resulting in distortion or damage to the mail information, there are a bunch of garbled characters when you receive the email.
Countermeasure: when sending an 8-bit text file, it must be encoded in advance to convert the file to a 7-bit ASCII code or a smaller-digit format, and then ensure that the file is transmitted correctly. After receiving a 7-or fewer-bit email, the recipient can convert it to an 8-bit email format to avoid Garbled text.
(2) The email software and settings used by the sending and receiving terminals are different.
Generally, the attachment function of the e-mail software can automatically encode the letter first and then send it out. In this way, as long as the recipient's e-mail software (such as outlook and Netscape e-mail) can distinguish the letter encoding method, the letter can be automatically decoded. However, because the e-mail software used to receive senders is configured differently by default or some options customized by the recipient are different, after receiving the encoded letter, the system may not be able to identify the encoding method used by the letter. Naturally, it will not be able to automatically decode the letter. In this case, garbled characters may occur.
Countermeasure: You can use WinZip + IE to decode the email by dragging the hosts file to the IE window. The original content of the email is displayed without garbled characters.
You can also determine the encoding method based on the key characters in the email and select the appropriate decoding software for decoding.
Email encoding methods include uuencode, base64 encode, qP-encode, and binhex.
UUEncode: This is the encoding method used in the Unix environment, which is rarely used at present. The general format is:
Begin 644 kk.zip M1G) O; 2! I; & En + F) b3t! C (Vee + fyc = '4n961u + g1w (% = E9 "!.; W8 @ (#8 @, 3 (ZM, SDZ, C4 @, 3dy-@ I296-E: 79e9 # H @ 9g) O; 2! F; & % B; 6% I; "YF; & % B + f9u: FET ......
End
Feature: "begin xxxcharacter" is contained in front of the Garbled text, followed by the original file name (such as kk.zip) before the Garbled text, followed by the encoded letter content (such as the Garbled text). The last line is "end ".
Decoding Method: Becky is available! Eudora and other e-mail software can be decoded by selecting the corresponding encoding options. You can also save garbled emails in the E-mail software with the suffix ". UUE "format file, and then use WinZip decoding to expand. Garbled characters are eliminated after decoding.
Mime/base64 encode: This encoding method uses 4 bytes (6 bits) for three bytes (8 bits). Because the encoded content is 6 bits, therefore, the 8th-bit truncation can be avoided. The general format is:
Mime-type: 1.0
Content-Type: text/plain; charset = "US-ASCII"
Content-transfer-encoding: base64
Status: R
Authorization + sxqst6skp owrskxzsn3drlfnrmghqq0kq1 + stqq6vdcx <br> 0lf6tfit07ddw0shrw0kd 1_1py3jvc29mdcuibjbnrlcm5ldcbn ......
Features: Before Garbled text, there are generally the following "Header": Content-Type (content type), charset (Character Set), and content-transfer-encoding (content transmission garbled mode ).
Decoding Method: use the E-mail software and select the base64 encoding option to decode it. garbled characters are eliminated after decoding.
Qpencodeqp: "quoted-printable content-transfer-encoding ". Because the content of emails in this format is printable characters in the ASCII character set, the name contains printable. The general format is:
= A1A = b1z = a6n = a1i = A7 = da = a6b = BA = F4 = B8 = F4 = a4w
= B1o ......
= E5 = ABH = A5 = F3 = b0 = dd = C3d = b1m = AEA = A1A ......
Feature: The content usually has many equal signs "=", so you do not need to check the "Header" or determine whether it is QP encoding.
Decoding Method: Put the email like A1A = b1z = a6n... copy all the codes, paste them to a new plain text file, and add the quoted-pintable format file header to the file header:
Contenet-type: text/plain; charset = "gb2312"
Content-transfer-encoding; quoted-pintable
Save the file with the suffix "eml" and double-click the file in resource manager to display the correct content. If some Chinese characters are garbled, you can use WinZip to decompress the stored EML file to see the correct content.
Binhex: the general format of this encoding method is:
(This file must be converted with binhex4.0)
Sgmhqbf6pm6hsafapmk69lj0pfexb6qsstqq6vdcx <br> 0lf6tfit07ddw0shrw0kdqqtuqx9p2m2rlf6p9q
Oz6xoie ......
Decoding Method: Use the email software to decode it. You can also save garbled emails in the email software with the suffix ". hqx format file, and then use WinZip to decode it.
Garbled characters are eliminated after decoding.
UTIF-7/UTIF-8: they are two conversion codes for Unicode.
The general format of UTIF-7 encoding mode is:
+ Sgmhqbf/6pm6hsafapmk69l/j0pfexb6q + sxqst6skp. OWrSKXzsN3DRLFNrmGhQQ0Kq1-sTqq6vdCx <br> 0lf6tfit07ddw0shrw0kd 1_1py3jvc29mdcuibjbnrlcm5ldcbn ......
Decoding Method: Add the following information to the original email header:
Mime-type: 1.0
Content-Type: text/plain; charset = "utf-7"
Content-transfer-encoding: 7bit
After being inserted, leave a line with the character, save the email as an "eml" suffix, and then use outlook to decode it to eliminate garbled characters.
UTIF-8
Decoding Method: Add the following information to the original email header:
Mime-type: 1.0
Content-Type: text/plain; charset = "UTF-8"
Content-transfer-encoding: 8bit
Save the email as an "eml" suffix and use outlook to decode it to eliminate Garbled text.
(3) Different Operating System languages
For a Chinese email, if the operating system used by the recipient is an English environment and there is no plug-in for the Chinese system or the encoding method is not switched to the Chinese language (for example, the four-party or Antarctic star, chinese characters cannot be seen. All double-byte characters (such as Chinese Simplified/traditional GB and big5 characters and Japanese JIS, EUC and Korean KSC codes) are garbled in non-local operating systems. You can only see garbled characters when reading other double-byte characters in the simplified Chinese Gbit/s environment.
Countermeasure: Install a multi-language support package or use a multi-Internal code display platform (such as the four-party or Antarctic star). You can switch the received emails to the corresponding encoding method based on their language to eliminate Garbled text.
2. To avoid garbled characters, the sender should note:
(1) Correct settings in the email software
The English email software should be set:
Default charset: ISO 8859-1
(Latin1)
Encoding method: quoted-printable. You cannot select 7 characters (because 7 characters do not support Chinese characters)
Code Page (optional): 936 or HZ-GB-2312,
To support the whole-Word Recognition mail format: mime
Font:
The Chinese e-mail software should be set:
Default charset: Simplified Chinese gb2312
Encoding method: encoding: quoted-printable: mime
Font:
In Outlook Express, "simplified Chinese (gb2312) should be used as the default mail language. Select" international settings "/use the default encoding for all received emails.
(2) recode the email in 7-bit format before sending
Before sending an 8-bit text file, you must encode it in advance to convert the file to a 7-bit ASCII code or a smaller-digit format. After receiving a 7-or fewer-bit email, the recipient can convert it to an 8-bit email format.
In the mail client software, set the write option to automatically set the 7-bit encoding by default.
(3) convert to an appropriate internal code
In the write option of the e-mail software, the default automatic 7-bit encoding is set. Before sending a Chinese email edited by the Chinese character system, you are advised to specify the Chinese character code standard (such as gb2312, Chinese Hz, and GBK) in the fixed signature column ); the email authors in Hong Kong, Macao, Taiwan, and Southeast Asia should convert the email into one of the three simplified country code types before using the big5 code and sending it to the mainland, and enter it in the signature column. If the conversion is not performed, you may not be able to read it because many domestic email systems do not support big5 codes.
(4) Send a test letter before sending important information
When sending important information, you should first send a test letter to confirm whether the body can be sent without encoding. It is also necessary to determine whether the recipient can decode the attachment file. If an encoded email is sent, it is best to add enough "Header" information so that the recipient knows the required decoding method. We recommend that you use uencoding as the header and base64 encoding as the header for MPack encoding.
(5) try to use the "attachment" function to send files
Almost all email software, such as Netscape and the bat! Becky! When the mail system attaches such non-standard ASCII format files, the attached files can be automatically encoded in the "base64" mode (only the attachment part is encoded ). You do not need to encode the email before sending it as an attachment; otherwise, it is counterproductive. Because the email software can automatically successfully decode such "APPEND" files, this method should be preferred when sending Chinese emails.
If a file cannot be sent as an attachment, the Chinese or binary file must be sent in the body. If the sender/recipient is far away, the eighth digit may be truncated during the transfer. At this time, it is best to send a test letter to the recipient in Chinese in the body and understand whether the recipient can correctly receive the body of the email. If the eighth digit is truncated, the recipient will see garbled characters instead of the aforementioned formats such as Uu, b64, and qP, and such letters are almost unrecoverable.
Countermeasure: In Netscape, Eudora, or Pegasus Mail, select "quoted printalbe" or "mime encoding" in its preference or option configuration ".
(6) select excellent client mail software
Selecting excellent email sending and receiving software can effectively solve email garbled characters.
3. to exclude garbled characters, the recipient should note:
Email search: the signature column or body contains any English characters indicating the standard Chinese character code category used in this email. Select "language" from the "View (v)" drop-down menu ", the subsequent menu contains all the Chinese character standards supported by the system. Click one of the following in the email. If the Chinese character standard is not specified in the received email, you can click it in sequence until the email body is correctly displayed. mark, which is the Chinese character standard used by your editor ). If Netscape is used, you can select the corresponding project in the document encode of the option menu.
4. No Garbled text in Chinese emails on non-Chinese platforms
When the recipient opens your Chinese email without a Chinese platform, garbled characters will appear. There are two solutions:
(1) using tools such as e-mail aid
The tool e-mail aid included with ucwin gold 1.0 converts a text file to an aid file, which is only a few Kb larger than the original TXT file. After writing a Chinese mail, save it in text format, save it in aid format with e-mail aid, and insert the file together with e-mail aid as an attachment in the letter. After receiving the letter, the recipient only needs to run e-mail aid to open the aid format file to view Chinese characters. No garbled characters will appear in any language platform of the recipient.
(2) Save Chinese emails in graphic format
Use a paint brush or other drawing software to write a Chinese email, input text in the image, save it in the default BMP format, and set the attribute to the black/white mode (to reduce the size of the BMP image ), then, use WinZip to compress it into ZIP format and send it as an attachment in the email, so that no garbled characters will appear in the recipient's language platform. The disadvantage of this method is that the generated BMP Chinese mail is too large.
References: http://www.xyhhxx.com/news/IT/20050912213229.htm

**************************************** ********

Garbled characters often occur (it is normal to directly enter Chinese characters in the software). When I encountered this problem, it was really frustrating, and later I thought of the root cause of the problem.
Solution:
"Change the default keyboard to Chinese (CH)", because some people enable the English (en) keyboard in the input method settings and set it to the default. that is to say, when copying and pasting, you must keep your input method in Chinese (CH) instead of/"EN /".

I guess the main reason may be that the input method of the current application is half-width during "copying. So the solution should be: When "copying", switch the input method of the VB window to the Chinese input.
In addition, you can set Chinese to the default input language in the control panel.

The Chinese dynamic link library used in the software may be overwritten by the English dynamic link library. In this case, the software may need to be re-installed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.