Encoding issues such as ultraedit editplus notepad

Source: Internet
Author: User
Tags ultraedit
ArticleDirectory
    • Ultraedit is a very powerful tool. However, if it is too powerful, it will become a double-edged sword. If it is used well, it is a good tool. If it is used badly, there may be many doubts, in terms of coding, ultraedit has a confusing problem. I did a little research and shared it with you.
Ultraedit is a very powerful tool. However, if it is too powerful, it will become a double-edged sword. If it is used well, it is a good tool. If it is used badly, there may be many doubts, in terms of coding, ultraedit has a confusing problem. I did a little research and shared it with you.

The main problem comes from the UTF-8.

The recommended method for marking byte order in Unicode specifications is BOM (byte order mark)

The UTF-8 does not need BOM to indicate the byte order, but BOM can be used to indicate the encoding method. If the receiver receives a byte stream starting with ef bb bf, it will know that this is UTF-8 encoding.
Because the UTF-8 Bom is not widely supported, it causes incompatibility within a certain range. The following describes how to process BOM using several major tools.

1. notepad

When notepad is saved, select the UTF-8 format, will write BOM header in the file header. when reading the file, will analyze the BOM and file whether there are Chinese characters, and then make the right choice.

2. notepad ++

You can set various formats and support BOM or not.

3. editplus

When the file is saved, select the UTF-8 format without writing BOM header in the file header. Read can recognize the UTF-8

4. ultraedit

Ultraedit in advanced-& gt; configuration, you can select whether to write BOM header when saving the file or save it. Reading is, if no automatic detection UTF-8 is set or some non-bom files are not displayed properly.

5. Eclipse

If you have set the encoding of the file to ask the UTF-8, the file is saved in non-bom format. Read normally.

6. vi

Refers to VIM in Linux, if the UTF-8 file has a BOM header at the beginning, it can normally display the UTF-8 encoding, otherwise, the display is garbled.

 

Main questions about ultraedit

1. if you create a new file, choose Save As UTF-8 in Bom-free format, and if the data does not contain Chinese characters or harset = UTF-8, the UE will still save the file in ANSI format no matter how it is saved, in this way, the encoding method will not change when you add Chinese characters later, which will cause Java buildProgramThe generated script contains garbled characters.

2. if it is the correct UTF-8 without BOM format, if there is no Chinese characters in the first 9205 characters, then ue will stubbornly think this file is ANSI format, therefore, the Chinese characters of the file are garbled (test ue 13.10a ). The solution is to automatically add a Chinese character before the first 9205 characters.

3. Automatic Detection of crying UTF-8. There are options for automatic UTF-8 detection in advanced-& gt; configuration-& gt; Unicode/UTF-8 auto check, and if selected, the analyzed ue uses three detection methods:

A) whether the start of the file encoding has the [ef bb bf] character (BOM), if yes, it is considered as a UTF-8

B) Check whether it contains charset = UTF-8 similar text, if there is, then think it is in UTF-8 format, which will lead to garbled files stored in ANSI.

C) if it is a UTF-8 document without BOM format, ue will check whether the first 9205 characters contain Chinese characters, if there is, if not, use ANSI encoding for parsing, the subsequent Chinese characters are garbled. If this time forced to use ue to convert to UTF-8, then add chaos, the file is voided. This check is not available for files stored in ANSI format. Chinese is normal.

4. ue open UTF-8 files will be converted to UTF-16 by default, little impact.

 

For users

1. when the UE is garbled, adding Chinese comments to the first 9205 characters can solve this problem, or use [convert]-& gt; In the UE's [file] menu; unicode/UTF-8 to UTF-8 (UNICODE edit) to convert.

2. Do not use ue to create a non-Chinese UTF-8 without BOM file.

3. Do not delete the Garbled text in a garbled file, add Chinese characters, and save it.

4. New UTF-8 without BOM file can be used eclipse, notepad ++, editplus

5. For the UTF-8 script file saved in notepad, the Java build program can also be identified, but the Java file cannot use the notepad editing editor to identify the ef bb bf tag of the file header.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.