Description of the phenomenon
Let's start with the reason for garbled characters.
Example
To give a practical example, we generally use SSH remote to the server to operate. When you perform some output tasks on the terminal, you will encounter garbled characters.
For example, I landed on the Oracle database server and looked at the status of the Oracle RAC:
For example above, in addition to the English alphabet and other things have become garbled. Of course, this is not related to running what program, you can try the system comes with the command, when the parameter is wrong, it will also be garbled.
When we find the solution to the problem on the Internet, there is let you modify the configuration file, there is let you modify the environment variables, there is let you change the client, and let you install Language pack.
There are so many answers for you to choose from, but none of them tell you why. Before I give an answer, let me analyze why, why do I change the environment variable, not the other, what does it represent?
Of course, to not look into the matter, I set the scope, only concerned about the system level, language implementation and architecture level we are not involved.
Reason
There are two concepts in Linux that are related to this issue: internationalization (which corresponds to localization), encoding
International-i18n-internationalization
At the programming level, the output of garbled programs are good programs, at least it takes into account that its users may not be nationals, may have not understand English, German people in use.
In different languages, the output of the program will be different. The related details are the environment variable lang, the system command locale, and so on.
Coding
The encoding is the internal representation of the data (text) in the computer. Code is encoded in the data store (file) or during transmission (Encoding specification: UTF-8, GBK, etc.)
Off Topic
As you can understand, text files are encoded, and the concept is weakened in Windows. TXT files in the Chinese version of Windows are encoded in gb2312. Find an editor that can choose the encoding mode such as "Cosmos First IDE" Vscode, if you open gb2312 encoded TXT file with Utf-8, it will show garbled characters.
Garbled
Finally to the key point, the program output garbled nature and file garbled is one thing.
File garbled
Writes the file in encoded format "A", parses the file contents in encoded format "B" and displays
Terminal garbled
The program outputs text in the encoded format "A" of the system requirements (LANG), and the output is parsed and displayed by the terminal program in its own encoded format "B".
"Terminal" program coding format is relatively good understanding, we put a few on the explanation. The conventions within the Linux system are relatively difficult to understand, and later we'll look at Lang, env, and locale.
The encoding format of the terminal program output under Windows Windows comes with a CMD program
Windows version of the GIT shell program
Encoding format for Terminal program output under Linux
Terminal program gnome-terminal under Linux
Enter the appropriate configuration interface via the menu terminal->preferences->profiles->edit->compatibility->encoding
In addition to the cmd that comes with Windows, other terminals can change the encoding of the current program output.
Is the coding of a Linux system a swollen setting?
Simply put, it is set by the environment variable lang, all programs in the system (including gnome-termial) will read this value at startup to set the current Program menu, interface, output and other encoding formats.
Can not be shown may indeed be related to the font package, such as Redhat under the RPM package Fonts-chinese-3.02-12.el5. It is mostly font files, which can be found in Windows.
[Email protected] ~]# RPM-QL Fonts-chinese/usr/share/fonts/chinese/truetype/fonts.cache-1/usr/share/fonts/chinese /truetype/fonts.dir/usr/share/fonts/chinese/truetype/fonts.scale/usr/share/fonts/chinese/truetype/ukai.ttf/usr /share/fonts/chinese/truetype/uming.ttf/usr/share/fonts/chinese/fonts.cache-1/usr/share/fonts/chinese/misc/usr /share/fonts/chinese/misc/fonts.alias/usr/share/fonts/chinese/misc/fonts.cache-1/usr/share/fonts/chinese/misc/ Fonts.dir/usr/share/fonts/chinese/misc/fonts.scale/usr/share/fonts/chinese/misc/taipei16.pcf.gz/usr/share/ fonts/chinese/misc/taipei20.pcf.gz/usr/share/fonts/chinese/misc/taipei24.pcf.gz/usr/share/fonts/chinese/misc/ vga12x24.pcf.gz/usr/share/fonts/zh_tw/usr/share/fonts/zh_tw/truetype/usr/share/fonts/zh_tw/truetype/ Bsmi00lp.ttf
We assume that these packages have been installed, that is, in the installation of the system "language support" selected in Chinese.
Linux supports language encoding that can be viewed through the locale command. We only focus on English and Chinese, so we filtered it.
[Email protected] ~]# locale-a|grep ' ^[z|e][h|n] ' |grep \\.| Grep-v Isoen_AU.utf8en_BW.utf8en_CA.utf8en_DK.utf8en_GB.utf8en_HK.utf8en_IE.utf8en_IN.utf8en_NZ.utf8en_ PH.utf8en_SG.utf8en_US.utf8en_ZA.utf8en_ZW.utf8zh_CN.gb18030zh_CN.gb2312zh_CN.gbkzh_CN.utf8zh_HK.big5hkscszh_ HK.utf8zh_SG.gb2312zh_SG.gbkzh_SG.utf8zh_TW.big5zh_TW.euctwzh_TW.utf8
The contents of the environment variable Lang are in the format: < language >_< geography >.< Character Set >
environment variable language format for:< language >_< Geography >
There are environment variables at the beginning of the lc_ that command locale outputs, which respond to the output and interface of the program.
Details can be referred to the Web: locale settings in the Lang, Lc_all, language environment variable difference
In order to prevent link damage cannot open, I copy a short paragraph:
locale divides into 12 categories according to the various aspects of the cultural tradition involved, the 12 major categories are:1, language symbols and their classification (LC_CTYPE)2, Digital (lc_numeric)3, comparison and sequencing habits (lc_collate)4, Time display format (lc_time)5, Currency Unit (lc_monetary)6, information is mainly informational, error message, status information, title, Tag, button and menu etc. (lc_messages)7, Name writing method (Lc_name)8, Address writing method (lc_address)9, telephone number writing method (Lc_telephone)Ten, Weights and Measures expression (lc_measurement) One, default paper size (lc_paper) A, an overview of the locale itself containing information (lc_identification). Locale is the language of the software at runtime, including language (Language), Geography (Territory), and character set (CodeSet). The writing format for a locale is: language [_ Region [. Character set]]. The complete locale expression is [language [_ Region] [. Character set] [@ fix positive value]. Zh_cn. Gb2312= Chinese-People's Republic of China + GB 2312 character set. Locale settings: Lc_all and lang priority relationship: Lc_all> Lc_* >LANG1, if you need a pure Chinese system, set lc_all= ZH_CN. XXXX, or lang=ZH_CN. XXXX all can. 2, if you only want an environment where you can enter Chinese, and keep the menu, title, System information, and so on as the English interface, you only need to set LC_CTYPE=ZH_CN. xxxx,lang=en_US. XXXX is available. 3And if you don't do anything, it's Lc_all,lang and lc_*.does not specify a specific value, the system uses POSIX as the Lcoale, which is the C locale. The difference between Lang and language: lang-Specifies thedefaultLocale forAll unset locale variables LANGUAGE-Most programs use This forThe language of itsInterfacelanguage is the interface language for setting up the application. Lang is a low-priority variable that specifies the default values for all locale-related variables
Conclusion
When the environment variable Lang, LANGUAGE, and lc_ start the variable setting inconsistent with the terminal program or the character set is not included, it will appear garbled.
The front cushion so much, of course, will not throw out a sentence a!=b on the end, said two examples?
Examples of the above command-line error are shown in the example of Gnome-terminal and Windows Git-shell under Linux, respectively.
Linux gnome-terminal
Let us first confirm the encoding used by the current terminal, under Menu terminal->preferences->profiles->edit->compatibility->encoding. We want to demonstrate the normal and abnormal. We show you can read the code, in English, Simplified Chinese, Chinese traditional and garbled as an example to demonstrate.
First look at the character set chosen by the terminal encoding:
SSH may bring in some variables during the transfer, you can open verbose mode to view the details.
bash$
When you change variables such as language, the output of the command that comes with the system also changes:
When the terminal character set contains characters from the above output, it can be displayed as normal. When it cannot be displayed, garbled characters will appear.
For example terminal selected is gb2312, and language selected is ZH_TW, program output is:
Part of the number is garbled, the reason why you can see some Chinese characters, because the gb2312 character set contains some traditional Chinese characters (not simplified word in traditional).
Similarly, if Terminal chooses UTF-8, this character set contains global visible characters, so both Chinese and German can be displayed.
Like what:
[email protected] ~]# cat a.sh lang=ru_ua.utf8language=ru_ualc_ctype=ru_ua.utf8lc_numeric=ru_ua.utf8lc_time=ru_ Ua.utf8lc_collate= "Ru_ua.utf8" lc_monetary=ru_ua.utf8lc_messages= "Ru_ua.utf8" lc_paper=ru_ua.utf8lc_name=ru_ ua.utf8lc_address=ru_ua.utf8lc_telephone=ru_ua.utf8lc_measurement=ru_ua.utf8lc_identification=ru_ua.utf8[[ Email protected] ~]# source a.sh
Windows Git-shell
Two examples to solve the Linux terminal garbled