The-linux character set and garbled processing of Linux learning

Last Update:2018-06-24 Source: Internet

Author: User

Tags i18n locale

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Linux character set and garbled processing

1, the character (Character) is a variety of words and symbols of the general name, including the national text, punctuation, graphic symbols, numbers and so on. The character set (Character set) is a collection of multiple characters, with a variety of character sets, each with a different number of characters, common character set names: ASCII character set, GB2312 character set, BIG5 character set, GB18030 character set, Unicode character sets, etc.

The character set is represented in the system as an environment variable to see how the character set is used by the current system terminal.

[Email protected] ~]# echo $LANG #LANG为字符集的环境变量名称

en_US. UTF-8

[Email protected] ~]# Env|grep LANG #env命令查看系统的环境变量

Lang=en_us. UTF-8

[Email protected] ~]# Export|grep LANG #export命令用于将shell变量或函数输出为环境变量

Declare-x lang= "en_US. UTF-8 "

[[email protected] ~]# locale #Get locale-specific information lists the current locale environment Lang=en_us. UTF-8 #指定所有与locale有关的变量的默认值

Lc_ctype= "en_US. UTF-8 "#语言符号及其分类

Lc_numeric= "en_US. UTF-8 "#数字格式

Lc_time= "en_US. UTF-8 "#日期与时间格式

Lc_collate= "en_US. UTF-8 "#排序规则

Lc_monetary= "en_US. UTF-8 "#货币格式

Lc_messages= "en_US. UTF-8 "#响应信息主要是提示信息, error messages, status information, title, tags, buttons and menus, etc.

Lc_paper= "en_US. UTF-8 "#默认纸张尺寸大小

Lc_name= "en_US. UTF-8 "#姓名书写方式

Lc_address= "en_US. UTF-8 "#地址书写方式

Lc_telephone= "en_US. UTF-8 "#电话号码书写方式

Lc_measurement= "en_US. UTF-8 "#度量衡表达方式

Lc_identification= "en_US. UTF-8 "#自身包含信息元数据信息

Lc_all=

LC_CTYPE (character identification code) indicates that the system now uses the character set of en_US. UTF-8

How to modify the character set

1), directly set the way to modify the variables, command the following two commands:

[[email protected] ~]# lang=xxx or export lang=xxx;

[[email protected] ~]# lc_all= "xxx" or export lc_all= "XXX";

Note: XXX is the character set you want to modify

The method of viewing the standard character set, the Locale–a command, is commonly used with ZH_CN. GB2312, ZH_CN. GB18030 or ZH_CN. UTF-8, en_US. UTF-8, etc.

However, the above modification will only take effect in the current shell, and the new shell environment variable disappears.

So usually log on the system to perform "lang=" This command when the display is no garbled reason, meaning is to cancel the display of the character set, cancel the character set can also be executed [[email protected] ~]# unset LANG this command.

2), modify the file mode, by modifying the/etc/sysconfig/i18n file control

[Email protected] ~]# vim/etc/sysconfig/i18n

lang= " en_US. UTF-8 "Language of the system

Sysfont= "Lat0-sun16"

Modify file save after exiting to take effect to execute the following command to be effective

[Email protected] ~]$ source/etc/sysconfig/i18n

4. The Vim editor is related to coding:

1) fileencoding, used to configure the opening file and save the file encoding, but only one value, only for a few files are the same encoding environment, so generally do not use

2) Fileencodings, from the name to know is the fileencoding of the enhanced version, you can configure a number of different encodings, the common configuration for, after the configuration, the list of text encoding as long as the legal, can be vim read correctly, recommended configuration: Set Fileencodings=utf-bom,utf-8,gbk,gb2312,gb18030,cp936,latin1

3) Encoding,vim Internal code, vim read the file, but not to read the encoding of the file processing, but will be converted to the internal encoding format, this encoding is generally related to the operating system, Linux under the Utf-8 majority, Chinese Windows is GDK, recommended configuration: Set Encoding=utf-8

4) Termencoding,vim output encoding, output refers to the output to the operating system or command terminal, the default is consistent with the operating system language encoding, if using the Linux command terminal, it is recommended that the terminal and the Linux system configure the same encoding, and then configure the same termencoding, Otherwise the need to take vim to the shell, but if the shell does not exist in Chinese name file, the configuration terminal and termencoding consistent, for Windows, can automatically identify GBK and utf-8, without special configuration, recommended configuration: Set Termencoding=utf-8

5) Fileformats, used to differentiate the operating system, mainly the difference between the return \ r \ n, recommended configuration: Set Fileformats=unix,dos

There are several cases of common garbled characters:

(1) The file is garbled when the Windows environment files are under RZ to Linux

Solution: 1. Use notepad++ to convert the file format to UTF-8 no BOM format or ANSI encoding format before RZ; 2.set Encoding=utf-8;

(2) garbled in SECURECRT or xterm2 editing environment, simply adjust the character encoding to GB2312 or UTF-8 in the session options

(3) When the log file for Vim editing garbled, most of the time because the format of the log file is GB2312.

Solution: 1.set encoding=gb2312;2 If scenario 1 does not work, adjust the editing environment for SECURECRT or Xterm2 to GB2312

(4) wget download file name garbled

Solution: Generally add –restrict-file-names=nocontrol, such as wget--restrict-file-names=nocontrol-m www.xxx.com/

(5) Cat file Normal, vim file is not normal

Solution:

A. Write directly to/ETC/VIM/VIMRC, and the last line adds

Modify content to set Fileencodings=ucs-bom,utf-8,gbk,gb2312,latin1

Set fileencoding=gb2312

Set Termencoding=utf-8

B. transcoding iconv-f gb2312-t utf-8 19.txt

Batch file transcoding command iconv-c-F gbk-t utf-8 $data _path/$item _uv

Reference Source:

Https://www.linuxidc.com/Linux/2014-03/97777.htm

50947243

Excellent documentation links for character sets

Https://wenku.baidu.com/view/1f476aea9ec3d5bbfd0a746a.html

The-linux character set and garbled processing of Linux learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More