Encoding issues such as ultraedit editplus notepad

Last Update:2018-12-07 Source: Internet

Author: User

Tags ultraedit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

ArticleDirectory

Ultraedit is a very powerful tool. However, if it is too powerful, it will become a double-edged sword. If it is used well, it is a good tool. If it is used badly, there may be many doubts, in terms of coding, ultraedit has a confusing problem. I did a little research and shared it with you.

Ultraedit is a very powerful tool. However, if it is too powerful, it will become a double-edged sword. If it is used well, it is a good tool. If it is used badly, there may be many doubts, in terms of coding, ultraedit has a confusing problem. I did a little research and shared it with you.

The main problem comes from the UTF-8.

The recommended method for marking byte order in Unicode specifications is BOM (byte order mark)

The UTF-8 does not need BOM to indicate the byte order, but BOM can be used to indicate the encoding method. If the receiver receives a byte stream starting with ef bb bf, it will know that this is UTF-8 encoding.
Because the UTF-8 Bom is not widely supported, it causes incompatibility within a certain range. The following describes how to process BOM using several major tools.

1. notepad

When notepad is saved, select the UTF-8 format, will write BOM header in the file header. when reading the file, will analyze the BOM and file whether there are Chinese characters, and then make the right choice.

2. notepad ++

You can set various formats and support BOM or not.

3. editplus

When the file is saved, select the UTF-8 format without writing BOM header in the file header. Read can recognize the UTF-8

4. ultraedit

Ultraedit in advanced-& gt; configuration, you can select whether to write BOM header when saving the file or save it. Reading is, if no automatic detection UTF-8 is set or some non-bom files are not displayed properly.

5. Eclipse

If you have set the encoding of the file to ask the UTF-8, the file is saved in non-bom format. Read normally.

6. vi

Refers to VIM in Linux, if the UTF-8 file has a BOM header at the beginning, it can normally display the UTF-8 encoding, otherwise, the display is garbled.

Main questions about ultraedit

1. if you create a new file, choose Save As UTF-8 in Bom-free format, and if the data does not contain Chinese characters or harset = UTF-8, the UE will still save the file in ANSI format no matter how it is saved, in this way, the encoding method will not change when you add Chinese characters later, which will cause Java buildProgramThe generated script contains garbled characters.

2. if it is the correct UTF-8 without BOM format, if there is no Chinese characters in the first 9205 characters, then ue will stubbornly think this file is ANSI format, therefore, the Chinese characters of the file are garbled (test ue 13.10a ). The solution is to automatically add a Chinese character before the first 9205 characters.

3. Automatic Detection of crying UTF-8. There are options for automatic UTF-8 detection in advanced-& gt; configuration-& gt; Unicode/UTF-8 auto check, and if selected, the analyzed ue uses three detection methods:

A) whether the start of the file encoding has the [ef bb bf] character (BOM), if yes, it is considered as a UTF-8

B) Check whether it contains charset = UTF-8 similar text, if there is, then think it is in UTF-8 format, which will lead to garbled files stored in ANSI.

C) if it is a UTF-8 document without BOM format, ue will check whether the first 9205 characters contain Chinese characters, if there is, if not, use ANSI encoding for parsing, the subsequent Chinese characters are garbled. If this time forced to use ue to convert to UTF-8, then add chaos, the file is voided. This check is not available for files stored in ANSI format. Chinese is normal.

4. ue open UTF-8 files will be converted to UTF-16 by default, little impact.

For users

1. when the UE is garbled, adding Chinese comments to the first 9205 characters can solve this problem, or use [convert]-& gt; In the UE's [file] menu; unicode/UTF-8 to UTF-8 (UNICODE edit) to convert.

2. Do not use ue to create a non-Chinese UTF-8 without BOM file.

3. Do not delete the Garbled text in a garbled file, add Chinese characters, and save it.

4. New UTF-8 without BOM file can be used eclipse, notepad ++, editplus

5. For the UTF-8 script file saved in notepad, the Java build program can also be identified, but the Java file cannot use the notepad editing editor to identify the ef bb bf tag of the file header.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More