Method of modifying character encoding of text in Linux system

Source: Internet
Author: User

As we know, computers can only handle low-level binary values and cannot handle characters directly. When a text file is stored, each character in the file is mapped to a binary value, which is actually stored on the hard disk by the binary value. Then when the program opens a text file, all binary values are read in and mapped back to the original readable characters. This "Save and open" process works well when all the programs that need access to this file can "understand" its encoding, that is, the binary value to the character mapping, which also ensures a roundtrip process for understandable data.

If different programs use different encodings to process the same file, special characters in the source file will not display correctly. Special characters here refer to non-English characters such as accented characters (e.g., Á,ü).

Then the question came: 1 How do we determine what character encoding is used for a certain text file? 2 How do we convert the file to the selected character encoding?

  Step One

To determine the character encoding of the file, we use a command-line tool called "File". Because the file command is a standard UNIX program, we can find it in all modern Linux distributions.

Run the following command:

The code is as follows:

$ file--mime-encoding filename

  Step Two

The next step is to see what kind of file encoding your Linux system supports. To do this, we use the tool named Iconv and the "-l" option (lowercase of L) to list all currently supported encodings.

The code is as follows:

$ iconv-l

The Iconv tool is part of the GNU libc library, so it is out-of-the-box in all Linux distributions.

Step Three

After we have selected the target encoding in the encoding supported by our Linux system, run the following command to complete the encoding conversion:

The code is as follows:

$ iconv-f old_encoding-t new_encoding filename

For example, convert iso-8859-1 encoding to UTF-8 encoding:

The code is as follows:

$ iconv-f iso-8859-1-T Utf-8 input.txt

Once you've learned how to use these tools, you can fix a damaged subtitle file like the following:

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.