Coding problems that are often encountered under Linux
If you need to operate files under Windows in Linux, you may often encounter problems with file encoding conversions. The default file format in Windows is GBK (gb2312), and Linux is generally UTF-8.
To view the encoding method
Method One: File filename
Method Two: eNCA command
Method Three: The file encoding can be viewed directly in vim
: Set fileencoding
If you just want to see other encoded files or if you want to solve the problem of viewing files garbled with Vim, you can
Add the following to the ~/.VIMRC file:
Set Encoding=utf-8 fileencodings=ucs-bom,utf-8,cp936
This allows vim to automatically identify the file encoding (can automatically identify the UTF-8 or GBK encoded files), in fact, according to the fileencodings provided by the encoding list attempt, if not found the appropriate encoding, the latin-1 (ASCII) encoding to open
File Encoding Conversion
Multi-platform approach:
ICONV provides standard programs and APIs for encoding conversions;
convert_encoding.py Python-based text file conversion tool;
DECODEH.PY provides algorithms and modules to talk about the coding of divination of word characters;
Linux file Encoding conversion:
Method One:
Convert file encoding directly into Vim, such as converting a file to Utf-8 format
: Set Fileencoding=utf-8
Or
1) set up a collection of files, that is, which files to manipulate, you can use wildcards, such as I am usually the C + + source program encoding Conversion
: args *.h *.cpp
2) give the command to execute on each file, here is the conversion code:
: Argdo Set Fenc=utf-8 | Update
In fact, garbled this problem is caused by the system integration character set, because it can not correctly use the relative character character set, so the OS can not recognize the text caused the garbled, the solution is not difficult ...
First of all, we have to know that the control of the Linux OS language environment variables are $LANG and $lc_all, to solve the garbled situation we just need to set the above two variables correctly.
Garbled two kinds of situations:
1. Garbled of terminal (pure Shell interface)
Vi/etc/profile
Export Lc_all= "ZH_CN. Gb18030:zh_cn. Gb2312:zh_cn. GBK:zh_CN:en_US. Utf-8:en_us:en:zh:zh_tw:zh_cn. BIG5 "
Save exit, reboot system can be.
2.x-window (graphical interface) garbled
vi/etc/sysconfig/i18n
Lang= "ZH_CN. Gb18030:zh_cn. Gb2312:zh_cn. GBK:zh_CN:en_US. Utf-8:en_us:en:zh:zh_tw:zh_cn. BIG5 "
Language= "ZH_CN. Gb18030:zh_cn. Gb2312:zh_cn. GBK:zh_CN:en_US. Utf-8:en_us:en:zh:zh_tw:zh_cn. BIG5 "
Save reboot ...
Because of the Chinese character set encoding a lot, I am not very clear about the compatibility of each other, so as far as possible to find a lot of different coding are written up, we can also filter their own, the total solution is to modify the variables to control the parameters of the environment, to increase the character set supported by the OS (if the character exists on the kernel, Otherwise you need to compile the kernel) ...
The web system being developed is deployed in red HEAD.
RH Version Information:
LSB Version:: Core-3.1-amd64:core-3.1-ia32:core-3.1-noarch:graphics-3.1-amd64:graphics-3.1-ia32:graphics-3.1-noa Rch
Distributor Id:redhatenterpriseserver
description:red Hat Enterprise Linux Server Release 5 (Tikanga)
Release:5
Codename:tikanga
-------------------------------
Locale Information
Lang=zh_cn. UTF-8
Lc_ctype= "ZH_CN. UTF-8 "
Lc_numeric= "ZH_CN. UTF-8 "
Lc_time= "ZH_CN. UTF-8 "
Lc_collate= "ZH_CN. UTF-8 "
Lc_monetary= "ZH_CN. UTF-8 "
Lc_messages= "ZH_CN. UTF-8 "
Lc_paper= "ZH_CN. UTF-8 "
Lc_name= "ZH_CN. UTF-8 "
Lc_address= "ZH_CN. UTF-8 "
Lc_telephone= "ZH_CN. UTF-8 "
Lc_measurement= "ZH_CN. UTF-8 "
Lc_identification= "ZH_CN. UTF-8 "
Lc_all=
---------------------------------
Because the program directory has a number of files to read out to display on the page, the file name is Chinese name
I used the File.list () method to get the file name list, but the display is garbled.
New String (Filename.getbytes ("Utf-8"), "GBK");
New String (Filename.getbytes ("iso-8859-1"), "GBK");
New String (Filename.getbytes (), GBK ");
Doesn't work,
Using System.getproperty ("file.encoding"), the "Utf-8" is obtained.
In addition, using the LS command to view the time is garbled, using the LS--show-control-chars command can display the Chinese name (console)
Add locale to estimate that your system does not support the GBK character set.
Under Ubuntu is vi/var/lib/locales/supported.d/local
After adding Locale-gen, refresh the character set cache again.
If you need to operate files under Windows in Linux, you may often encounter problems with file encoding conversions. The default file format in Windows is GBK (gb2312), and Linux is generally UTF-8. Here's how to view the encoding of a file in Linux and how to encode and convert the file.
One, view the file encoding:
There are several ways to view file encodings in Linux:
1. The file encoding can be viewed directly in vim
: Set fileencoding
You can display the file encoding format.
If you just want to see other encoded files or if you want to solve the problem of viewing files garbled with Vim, you can
Add the following to the ~/.VIMRC file:
Set Encoding=utf-8
fileencodings=ucs-bom,utf-8,cp936
This allows vim to automatically identify the file encoding (which automatically identifies the UTF-8 or GBK encoded file), in fact, in accordance with the fileencodings provided by the encoding list to try, if not found the appropriate encoding, the latin-1 (ASCII) encoding opened.
2. eNCA (if you do not have this command installed on your system, you can use sudo yum install-y eNCA installation) to view the file encoding
$ enca filename
Filename:universal Transformation Format 8 bits; UTF-8
CRLF Line Terminators
It is important to note that eNCA is not very good at identifying certain GBK encoded files and will appear when identified:
Unrecognized encoding
Second, file encoding conversion
1. Convert file encoding directly into Vim, such as converting a file to Utf-8 format
: Set Fileencoding=utf-8
2. Iconv conversion, the ICONV command format is as follows:
Iconv-f ENCODING-T Encoding Inputfile
such as converting a UTF-8 encoded file into a GBK encoding.
Iconv-f gbk-t UTF-8 file1-o file2
3. enconv Conversion File Encoding
For example, to convert a GBK encoded file into UTF-8 encoding, the operation is as follows
Enconv-l zh_cn-x UTF-8 filename
Third, the file name encoding conversion:
From Linux to the Windows copy files or from Windows to Linux copy files, sometimes the Chinese file name garbled, the cause of this problem is because the Windows file name Chinese encoding defaults to GBK, and the default file name in Linux is encoded as UTF8, Because the encoding is inconsistent, so the file name garbled problem, to solve this problem need to transcode the file name.
A tool convmv for file name encoding is provided specifically in Linux, which converts the file name from GBK to UTF-8 encoding, or from UTF-8 to GBK.
First look at whether the CONVMV is installed on your system, if it is not installed:
Yum-y Install CONVMV installation.
Here's a look at the specific usage of CONVMV:
Convmv-f Source code-T new encoding [options] File name
Common parameters:
-R recursive processing of subfolders
–notest the actual operation, please note that by default, the file does not actually operate, but only the experiment.
–list display of all supported encodings
–unescap can be escaped, such as to turn%20 into a space
For example, we have a UTF8 encoded file name, converted to GBK encoding, the command is as follows:
Convmv-f UTF-8-T gbk–notest UTF8 encoded file name
After this conversion "UTF8 encoded file name" will be converted to GBK encoding (only the file name encoding conversion, the contents of the files will not change)
Four, vim encoding mode of setting
Like all popular text editors, Vim can be very good at editing various character-encoded files, which of course includes popular Unicode encoding methods such as UCS-2, UTF-8, and so on. Unfortunately, as with many software from the Linux world, this requires you to set up your own hands.
Vim has four options related to character encoding, encoding, fileencoding, Fileencodings, termencoding (the possible values for these options can be found in the VIM online assistance: Help Encoding-names), They have the following meanings:
* Encoding:vim Internal use of the character encoding, including Vim buffer (buffers), menu text, message text and so on. The default is based on your locale selection. The user manual recommends changing its value only in. VIMRC, which in fact seems to only make sense if the value is changed in. vimrc. You can use another encoding to edit and save the file, such as your vim encoding for Utf-8, the edited file is cp936 encoded, VIM will automatically convert the file read into Utf-8 (Vim can read the way), and when you write to the file, It automatically turns back into cp936 (the file's save code).
* The character encoding of the currently edited file in Fileencoding:vim, Vim saves the file as well as this character encoding (regardless of whether the new file is the same).
* Fileencodings:vim automatically detects the sequential list of fileencoding, which detects the character encoding of the file to be opened by the character encoding it lists, and sets the fileencoding to the final detected character encoding method. It is therefore best to place Unicode encoding at the top of this list, Latin1 Latin encoding to the last side.
* The character encoding of the terminal (or Windows Console window) that the Termencoding:vim is working on. If Vim is in the same term as the VIM code, no setting is required. If not, you can use Vim's termencoding option to automatically convert to term encoding. This option is not valid for the gVim of our common GUI mode under Windows, and for the console mode vim is the Windows console code page, and Often we don't need to change it.
Five, Vim's multi-character encoding working mode
1. Vim starts, sets the character encoding of buffer, menu text, message text according to the value of encoding set in. vimrc.
2. Read the file that needs to be edited, and probe the file encoding method according to the character encoding listed in Fileencodings. and set the fileencoding to be detected, it appears to be correct (note 1) the character encoding method.
3. Compare the values of fileencoding and encoding, if different call Iconv to convert the contents of the file to encoding described by the character encoding, and put the converted content into the buffer opened for this file, we can begin to edit this file. Note that you need to call external Iconv.dll (note 2) to complete this step, and you need to ensure that the file exists in $VIMRUNTIME or other directory in the PATH environment variable.
4. When you save the file after editing is complete, compare the values of fileencoding and encoding again. If it is different, call Iconv again to convert the text in the saved buffer to the character encoding described by fileencoding and save it to the specified file. Again, this requires calling Iconv.dll because Unicode can contain characters in almost all languages, and Unicode's UTF-8 encoding is a very cost-effective encoding (less space consuming than UCS-2), so it is recommended that the value of encoding be set to Utf-8 。 Another reason for this is that when encoding is set to Utf-8, the encoding of the Vim auto-probe file is more accurate (perhaps this is the main reason.) We are editing files in Chinese Windows, in order to take into account the compatibility with other software, file encoding or set to GB2312/GBK more appropriate, so fileencoding recommended set to Chinese (Chinese is an individual name, in Unix represents gb2312 , which in Windows represents cp936, the code page for GBK).
CentOS file name garbled