Whether UTF-8 bom should be used-problems caused by directvobsub's failure to support UTF-8 no BOM

Source: Internet
Author: User

Use directvobsub as the subtitle plug-in for the player.

Convert subtitles to UTF-8 no Bom format. The subtitles are garbled during playback.

Convert subtitles to UTF-8 BOM format. The subtitles are normal during playback.

It seems that directvobsub does not support UTF-8 no Bom.

Directvobsub (vsfilter) Official Website: http://sourceforge.net/projects/guliverkli2/files/DirectShow%20Filters/

 

Should UTF-8 be replaced by BOM? How are Unicode standards defined?

I checked it for reference:

 

Http://zh.wikipedia.org/zh-cn/UTF-8#UTF-8.E7.9A.84.E8.A1.8D.E7.94.9F.E7.89.A9

Wikipedia says:

Although not standard, many windowsProgram(Including Windows laptops) Add a byte string ef bb bf at the beginning of a UTF-8-encoded file. This is the UTF-8 encoding result of the byte sequential mark U + feff. For text editors and browsers that are not expected to process the UTF-8, they are displayed as a ISO-8859-1 string "character» Author »¿".

 

From Wikipedia, it seems that bom should not be used.

In line with the prejudice that "Microsoft is reliable, sows will go to the tree", because the windows notepad will generate BOM when it is saved as UTF-8, while gedit will generate UTF-8 no Bom, I think UTF-8 should not be used.

Then find the http://unicode.org/faq/utf_bom.html#bom1

Unicode.org says:

Q: How I shoshould deal with BOMs?

A: Here are some guidelines to follow:

  1. A participant protocol (e.g. microsoft conventions. TXT files) may require use of the BOM on certain UNICODE data streams, such as files. when you need to conform to such a protocol, use a Bom.

  2. Some protocols allow optional BOMs in the case of untagged text. In those cases,

    • Where a text data stream is known to be plain text, but of unknown encoding, BOM can be used as a signature. If there is no Bom, the encoding cocould be anything.

    • Where a text data stream is known to be plain Unicode text (but not which endian), then BOM can be used as a signature. if there is no Bom, the text shoshould be interpreted as big-Endian.

  3. Some byte oriented protocols character ct ascii characters at the beginning of a file. If the UTF-8 is used with these protocols, use of the BOM as encoding form signature shocould be avoided.

  4. Where the precise type of the data stream is known (e.g. unicode big-Endian or Unicode little-Endian), the BOM shocould not be used. in particle, whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE A bomMustNot be used. See also [Q: What is the difference between UCS-2 and UTF-16?] [Af] & [MD]

What I understand is:

 

Use BOM when plain text files do not declare encoding. If Bom is not available, the encoding is hard to judge.

If the data declares encoding, for example, the data stored in the database (encoding is declared in the database) and XML (encoding is declared using encoding = "UTF-8), HTML (using charset = UTF-8 to declare encoding), should not use BOM (the BOM shocould not be used ).

 

Therefore, you can use UTF-8 BOM in plain text.

I suddenly remembered that in the past, various players (mplayer and smplayer) in Linux all experienced UTF-8 format subtitle garbled characters. Is it because of the Text Editor (gedit, etc.) in Linux) is the generated UTF-8 no Bom?

 

References:
    • I will talk about unicode encoding and briefly explain the terminologies such as UCOS, UTF, BMP, and BOM.
      Http://blog.csdn.net/fmddlmyy/archive/2005/05/04/372148.aspx
    • How I shoshould deal with BOMs?
      Http://unicode.org/faq/utf_bom.html#bom1
    • Byte sequence mark (English: byte-order mark, BOM)
      Http://zh.wikipedia.org/zh-cn/%E4%BD%8D%E5%85%83%E7%B5%84%E9%A0%86%E5%BA%8F%E8%A8%98%E8%99%9F

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.