Getting started with Linux: how to change the character encoding of text files in Linux

Source: Internet
Author: User

Getting started with Linux: how to change the character encoding of text files in Linux

Problem: In my Linux system there is a subtitle file encoded as a iso-8859-1, some of which cannot be properly displayed, I want to change the text to utf8 encoding. Is there a good tool in Linux to convert the character encoding of text files?

As we know, computers can only process low-level binary values and cannot directly process characters. When a text file is stored, every character in the file is mapped to binary values, which are exactly the "binary values" actually stored in the hard disk ". When the program opens a text file, all binary values are read and mapped back to the original readable characters. This "Save and open" process can be completed well only when all programs that need to access this file can "understand" its encoding, that is, the ing between binary values and characters, this also ensures the round-trip process of understandable data.

If different programs use different codes to process the same file, special characters in the source file cannot be displayed normally. Special characters here refer to non-English characters, such as those with accents (such as N, á, u ).

Then the question arises: 1) How do we determine the character encoding used by a fixed text file? 2) How do we convert a file to a selected character encoding?

Step 1

To determine the character encoding of a file, we use a command line tool named "file. Because the file command is a standard UNIX program, we can find it in all modern Linux distributions.

Run the following command:

  1. $ File -- mime-encoding filename

Step 2

The next step is to check the file encoding types supported by your Linux system. To this end, we use the tool named iconv and the "-l" option (lower case of L) to list all currently supported encodings.

  1. $ Iconv-l

The iconv tool is part of the GNU libc library, so it is out-of-the-box in all Linux releases.

Step 3

After selecting the target encoding in the encoding supported by our Linux system, run the following command to complete the encoding conversion:

  1. $ Iconv-f old_encoding-t new_encoding filename

For example, convert iso-8859-1 encoding to UTF-8 encoding:

  1. $ Iconv-f iso-8859-1-t UTF-8 input.txt

After learning how to use these tools, you can fix a damaged subtitle file as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.