Detailed Linux Chinese garbled problem Ultimate solution _linux

Source: Internet
Author: User
Tags control characters i18n locale posix time and date ssh

First into the Linux program staff, often get garbled greetings. "Toss". Because of garbled, and finally gave up Linux is not a minority. Well, to get to the point, here's a look at the specific Linux garbled solution.

method One: modify/root/.bash_profile file, increase export LANG=ZH_CN. GB18030

The file is in the user directory, and for other users, you must modify the file accordingly.

Putty can display Chinese when using this method, but the desktop system is in English, and all the pages in Chinese are still garbled.

Method Two:
modifying/etc/sysconfig/i18n files

#LANG = "en_US." UTF-8 "
#SUPPORTED =" en_US. Utf-8:en_us:en "
#SYSFONT =" Latarcyrheb-sun16 "

Modified to:

Lang= "ZH_CN. GB18030 "
language=" ZH_CN. Gb18030:zh_cn. GB2312:ZH_CN "
supported=" ZH_CN. GB18030:zh_CN:zh "
sysfont=" Lat0-sun16 "
sysfontacm=" 8859-15 "

Reference:

Linux Chinese garbled problem

Recently, the company in the XP system to pass data between Linux, the Chinese garbled problem!

First, the character set:

Encoding

* GB2312 Word set is a set of simplified characters, all known as GB2312 (80) Word set, a total of 6,763 GB Simplified Chinese characters. * BIG5 Character Set is a set of traditional Chinese characters, including the national standard of traditional kanji 13,053. * GBK is a simple set of characters, including the GB character set, BIG5 Word set, and some symbols, a total of 21,003 characters. * GB18030 is a national set of mandatory large set of standards, called gb18030-2000, its introduction makes the Chinese character set has a "unification" standard.

Ascii:

American Standard Code for Information Interchange, U.S. Information Interchange Standard code. The most widely used character set and its code in the computer are developed by the U.S. National Standards Office (ANSI). It has been set by the International Organization for Standardization (ISO) as a standard, known as ISO 646. The ASCII character set consists of control characters and graphic characters. In a computer's storage unit, an ASCII value occupies one byte (8 bits), and its highest bit (B7) is used as a parity bit. The so-called parity, refers to the code in the process of transmission to verify that there are errors in a method, generally divided into odd and even check two. Odd Check rule: The correct code 1 of the number of bytes must be odd, if not odd, in the highest bit B7 Tim 1.

Parity rule: The correct code 1 of the number of bytes must be even, if not even, in the highest bit B7 Tim 1.

UTF:
Unicode is implemented in a different way than encoding. Unicode encoding of a character is determined, but in the actual transmission process, because the design of different system platform is not necessarily consistent, and for space-saving purposes, the implementation of Unicode encoding is different. Unicode is implemented as a Unicode conversion format (Unicode Translation format, abbreviated as UTF). * UTF-8:8bit variable length encoding, for most common character sets (0~127 characters in ASCII) it uses only Single-byte, and for other commonly used characters (especially Korean and Chinese), it uses 3 bytes. * Utf-16:16bit code, is variable length code, roughly equivalent to 20-bit code, the value between 0 to 0x10ffff, is basically the implementation of Unicode encoding, and CPU word order.

Note: ASCII char (2); UTF-8 wide character WChar 4 times times. The best code for compatibility is utf-8!. After all, gbk/gb2312 is the domestic standard, when a large number of foreign open source software, UTF-8 is the most common language in the coding world.

In Linux, Locale is used to set up the different locales in which the program runs, and locale is supported by ANSI C. Locale's naming rules are < language >_< region >.< character code, such as ZH_CN. Utf-8,zh represents the Chinese, CN represents the mainland region, UTF-8 represents the character set.

In a locale environment, there is a set of variables that represent different settings in an internationalized environment:

1. Lc_collate
Define the sorting and comparison rules for the environment

2. Lc_ctype
Used for character classification and string processing, which controls how all characters are processed, including character encoding, whether the character is single-byte or multibyte, and how to print. is one of the most important environment variables.

3. Lc_monetary
Currency format

4. Lc_numeric
Non-currency Digital display format

5. Lc_time
Time and date formats

6. Lc_messages
The language that prompts the message. There is also a language parameter that is similar to lc_messages, but if the parameter is set, the Lc_messages parameter is invalidated. The language parameter can set multiple language information at the same time, such as languane= "Zh_CN.GB18030:zh_CN.GB2312:zh_CN".

7. LANG
The default value for the lc_*, which is the lowest level setting, and if lc_* is not set, use the value. Similar to Lc_all.

8. Lc_all
It is a macro that overrides the set value of all lc_* if the value is set. Note that the value of Lang is not affected by this macro.

Example:

Before setting, use the default locale:

code example:

[Root@ahlinux ~]# locale lang= "POSIX" lc_ctype= "POSIX" lc_numeric= "POSIX" lc_time= "
POSIX"
LC _collate= "POSIX"
lc_monetary= "POSIX"
lc_messages= "POSIX"
lc_paper= "POSIX"
lc_name= "POSIX"
lc_address= "POSIX"
Lc_telephone= "POSIX"
lc_measurement= "POSIX"
lc_identification= "POSIX"
lc_all=

After setting, use ZH_CN. GDK Chinese locale:

code example:

[Root@ahlinux ~]# export LC_ALL=ZH_CN. GBK
[root@ahlinux ~]# locale
lang=zh_cn. UTF-8
lc_ctype= "ZH_CN. GBK "
lc_numeric=" ZH_CN. GBK "
lc_time=" ZH_CN. GBK "
lc_collate=" ZH_CN. GBK "
lc_monetary=" ZH_CN. GBK "
lc_messages=" ZH_CN. GBK "
lc_paper=" ZH_CN. GBK "
lc_name=" ZH_CN. GBK "
lc_address=" ZH_CN. GBK "
lc_telephone=" ZH_CN. GBK "
lc_measurement=" ZH_CN. GBK "
lc_identification=" ZH_CN. GBK "
lc_all=zh_cn. GBK

"C" is the default locale of the system, and "POSIX" is the alias of "C". So when we install a new system, the default locale is C or POSIX.
The way to install locales in Debian is as follows:

    • Install the locales package through the Apt-get install locales command
    • • Install complete locales package, the system will automatically locale configuration, you just select the required locale, you can choose more. Finally, specify a system default locale. This system will help you automatically generate the corresponding locale and configure the system locale.
    • Adding new locale is also very simple, with dpkp-reconfigure locales reconfigure locale.
    • We can also manually add locale, as long as the new locale added to the/etc/locale.gen file, and then run the Locale-gen command to generate a new locale. You can set the locale of the system by setting the lc_* variable described above. The following is an example of a Locale.gen file.

code example:

# This file lists locales the wish to have built. Can find a list
# of valid supported locales at/usr/share/i18n/supported. Other
# combinations are possible, but may isn't be the tested. If you
are in the # This file, you are need to rerun Locale-gen.
#
ZH_CN. GBK GBK
zh_cn. UTF-8 UTF-8

As far as I'm concerned, it's OK to make sure that Lang and supported are fine, and others may not use too much at ordinary times.

Here's how to set the environment variable.

Modify the/etc/sysconfig/i18n file, such as

code example:

Lang= "en_US. UTF-8 ", Xwindow will display the English interface,
lang=" ZH_CN. GB18030 ", Xwindow will display the Chinese interface.

There is also a way to cp/etc/sysconfig/i18n $HOME/.i18n

Modify $HOME/.i18n file, such as

code example:

Lang= "en_US. UTF-8 ", Xwindow will display the English interface,
lang=" ZH_CN. GB18030 ", Xwindow will display the Chinese interface.

This will change the personal interface language without affecting other users

The modified/etc/sysconfig/i18n file is:

code example:

Lang= "en_US. UTF-8 "
supported=" ZH_CN. GB18030:zh_CN:zh:en_US. Utf-8:en_us:en "
sysfont=" Latarcyrheb-sun16 "
lc_all=" en_US. UTF-8 "
Export Lc_all

Reboot after Setup or use rc.local to make it effective

or modify the. bash_profile file of the logged-on user

code example:

Export LANG=ZH_CN. GB18030
Export LANGUAGE=ZH_CN.GB18030:ZH_CN.GB2312:ZH_CN

Be sure to know that Windows XP is GB2312 code, if your server character set is not this, the estimated will be garbled, so to adjust.

Some people in the adjustment, said I changed the system environment variables, resulting in User content display garbled, but the solution is two:

1. Convert Iconv to current encoding

2. Use your original code

Looking at these two, you definitely have to be clear about how your original character code is. It all boils down to the Lang supported and the encoding of your original file character set:)

Of course, locale-a. You can look at the character set currently supported in the system, and if not, install the Austrian.

The first two methods are very practical, I have tried. Other methods are found on the Internet, hehe ...

****************************

When you take it out of the database, when you save it in a Linux file, you develop a coded format for the character stream. The code is as follows:

code example:

FileOutputStream fos=new FileOutputStream (New File (FilePath), true);
Writer out=new outputstreamwriter (FOS, "UTF-8");
Out.write (s);
Out.write ("\ n");
Out.flush ();
Fos.close ();
Out.close ();
vi. bash_profile
export lang=zh_cn
vi/etc/sysconfig/i18n
lang= "en_US. UTF-8 "
supported=" en_US. Utf-8:en_us:en:zh_cn. GB18030:zh_CN:zh:zh_TW.big5:zh_TW:zh:ja_JP. Utf-8:ja_jp:ja:ko_kr.euckr:ko_kr:ko "
sysfont=" Latarcyrheb-sun16 "

The first one doesn't work, as if the second one is particularly important and must be changed.

1, console terminal garbled

Add the following on the last line of the/etc/profile file:

code example:

Export Lc_all= "ZH_CN. GB18030 "

2, Xwindow terminal garbled

Add the following on the last line of the/etc/sysconfig/i18n file:

code example:

Export Lc_all= "ZH_CN. GB18030 "

Garbled characters are divided into two kinds of situations:

1. Terminal (pure Shell interface) of garbled

code example:

Vi/etc/profile
export lc_all= "ZH_CN. Gb18030:zh_cn. Gb2312:zh_cn. GBK:zh_CN:en_US. Utf-8:en_us:en:zh:zh_tw:zh_cn. BIG5 "

Save exit, reboot system can ...

2.x-window (graphical interface) of garbled

code example:

vi/etc/sysconfig/i18n
lang= "ZH_CN. Gb18030:zh_cn. Gb2312:zh_cn. GBK:zh_CN:en_US. Utf-8:en_us:en:zh:zh_tw:zh_cn. BIG5 "
language=" ZH_CN. Gb18030:zh_cn. Gb2312:zh_cn. GBK:zh_CN:en_US. Utf-8:en_us:en:zh:zh_tw:zh_cn. BIG5 "

Save reboot ...

New Linux virtual machine, with Vim appeared in Chinese garbled problem, find the data, solution:

vi/etc/sysconfig/i18n

Change the content to

code example:

Lang= "ZH_CN. GB18030 "
language=" ZH_CN. Gb18030:zh_cn. GB2312:ZH_CN "
supported=" ZH_CN. GB18030:zh_CN:zh:en_US. Utf-8:en_us:en "
sysfont=" Lat0-sun16 "

So the Chinese in the Ssh,telnet terminal will be able to display the normal

Among them, the main modification content mainly is ZH_CN. GB18030, which should pay attention to the content of VI personality under the root directory, to pay attention to permissions.

After each install Linux, with SSH connection, Chinese always display garbled.

Workaround: Edit/etc/sysconfig/i18n, lang= "ZH_CN." UTF-8 "changed to Lang=" ZH_CN. GB2312 ".

Disconnect the reconnection.

The solution of Chinese garbled problem under 1,linux is attached.

From Windows passed files copied to Linux is garbled, we want to display in Linux under the Chinese, how to do? We first Test, the Linux under the Chinese can be normal display? A: Yes. So the problem is more obvious, windows under the handcuffs can not show that Windows and Linux under the support of the format is not the same.
Linux is generally used utf-8 encoding, and we edit the file on Windows is gb2312 encoding. So the Chinese code will be garbled. To correct this problem is actually very simple. Just convert the file to UTF-8 encoding and then import it OK.

Then use the following command to convert:

Iconv-f gb2312-t utf-8 test.txt> testutf8.tzt

(-F is source code,-t convert target encoding, test.txt source file, Testutf8.txt generate target encoding file)

Note: Use Iconv-l to view the System support encoding format. Of course, you can add the encoding format:

Default is UTF8, if you want to use another encoding such as GBK

To manually change a configuration file's command:

Shell> vi/etc/sysconfig/i18n

Will lang= "ZH_CN." UTF-8 "modified to:

Lang= "ZH_CN. GBK "

Save and close, and run the following command to make the configuration effective:

Shell> source/etc/sysconfig/i18n

To display the terminal character encoding as Simplified Chinese:

Shell> vi/etc/profile.d/chinese.sh

Add the following line:

code example:

Export LC_ALL=ZH_CN. GBK
shell> source/etc/profile.d/chinese.sh

Attached 2, solve the Linux operating system Java Chinese garbled problem.

After jdk15, just build a fallback directory under ~/jre/lib/fonts/and grill the fonts you want to use in Java into this directory

The following method tests the pass under FC6, assuming that the user's JRE path is/usr/java/jdk1.6.0_03/jre/

code example:

Cd/usr/java/jdk1.6.0_03/jre/lib/fonts
sudo mkdir fallback

Copy C:\WINDOWS\FONTS\SIMSUN.TTC to the/usr/java/jdk1.6.0_03/jre/lib/fonts/fallback folder
Export LC_ALL=ZH_CN. Gb2312;export LANG=ZH_CN. GB2312 is the most effective.

1. Regardless of the SSH client, the font settings must be set to display Chinese fonts.

2. The remote locale must be set to LANG=ZH_CN. UTF-8

Modify/etc/profile

Add this line

Export LC_ALL=ZH_CN. GBK

Attached 3,SSH display Chinese garbled problem

1), open/etc/sysconfig/i18n

Set to:

code example:

Lang= "ZH_CN. GB2312 "
language=" ZH_CN. Gb18030:zh_cn. GB2312:ZH_CN "
supported=" ZH_CN. Gb18030:zh_cn. Gb2312:zh_cn. Utf-8:zh:en_us. Utf-8:en_us:en:ja_jp. Utf-8:ja_jp:ja "
sysfont=" Lat0-sun16 "
sysfontacm=" 8859-15 "

Which lang= "ZH_CN. GB2312 "is necessary (if you do not want to make Chinese garbled words!!!)

Others can be changed according to their own needs.

2), open smb.conf

Add to:

code example:

Display charset=cp936
Unix charset=cp936
doc charset=cp936

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.