Getting started with Linux: how to change the character encoding of text files in Linux
Problem: In my Linux system there is a subtitle file encoded as a iso-8859-1, some of which cannot be properly displayed, I want to change the text to utf8 encoding. Is there a good tool in Linux to convert the character encoding of text files?
As we know, computers can only process low-level binary values and cannot directly process characters. When a text file is stored, every character in the file is mapped to binary values, which are exactly the "binary values" actually stored in the hard disk ". When the program opens a text file, all binary values are read and mapped back to the original readable characters. This "Save and open" process can be completed well only when all programs that need to access this file can "understand" its encoding, that is, the ing between binary values and characters, this also ensures the round-trip process of understandable data.
If different programs use different codes to process the same file, special characters in the source file cannot be displayed normally. Special characters here refer to non-English characters, such as those with accents (such as N, á, u ).
Then the question arises: 1) How do we determine the character encoding used by a fixed text file? 2) How do we convert a file to a selected character encoding?
Step 1
To determine the character encoding of a file, we use a command line tool named "file. Because the file command is a standard UNIX program, we can find it in all modern Linux distributions.
Run the following command:
- $ File -- mime-encoding filename
Step 2
The next step is to check the file encoding types supported by your Linux system. To this end, we use the tool named iconv and the "-l" option (lower case of L) to list all currently supported encodings.
- $ Iconv-l
The iconv tool is part of the GNU libc library, so it is out-of-the-box in all Linux releases.
Step 3
After selecting the target encoding in the encoding supported by our Linux system, run the following command to complete the encoding conversion:
- $ Iconv-f old_encoding-t new_encoding filename
For example, convert iso-8859-1 encoding to UTF-8 encoding:
- $ Iconv-f iso-8859-1-t UTF-8 input.txt
After learning how to use these tools, you can fix a damaged subtitle file as follows: