Introduction to Python vim checking the corresponding encoding in the file

Source: Internet
Author: User

If you do not know the format of the Chinese encoding when opening a file, for example, the file header may be specified as utf8, but the actual encoding is not, this is the case, you can use Python vim to check the encoding. The following is a detailed description of the article.

Open a Chinese file and it is unclear what the Chinese encoding format is. The python source program file header may be utf8, but the actual encoding is gbk. Inconsistent encoding in python source code may result in an error during execution. One solution is to view the binary data, but what encoding does the binary data of Chinese characters correspond?

Add two lines in vimrc of vim:

 
 
  1. set fenc=utf-8  
  2. set fileencodings=utf-8,cp936,big5,euc-jp,
    euc-kr,latin1,ucs-bom 

In this way, the default file storage is UTF-8 encoding.

 
 
  1. set enc=cp936 

This is the code displayed on the gvim interface. cp936 is used in windows and utf8 is used in linux. It is recommended that you do not set it.

If you are not sure whether a newly opened file is utf8 or gbk, use Pythonvim to open the file, view Chinese characters, and then run

 
 
  1. :%!xxd  

See the corresponding binary. If the text contains "hello", you will see your hexadecimal representation at the corresponding position on the left. Open python3.0 and run the "hello" character in the text in the command line to perform binary transcoding.

 
 
  1. View plaincopy to clipboardprint?
  2. >>> A = 'hello'
  3. >>> B = a. encode ('utf8 ')
  4. >>> B
  5. B '\ xe4 \ xbd \ xa0 \ xe5 \ xa5 \ xbd'
  6. >>> C = a. encode ('gbk ')
  7. >>> C
  8. B '\ xc4 \ xe3 \ xba \ xc3'
  9. >>> A = 'hello'
  10. >>> B = a. encode ('utf8 ')
  11. >>> B
  12. B '\ xe4 \ xbd \ xa0 \ xe5 \ xa5 \ xbd'
  13. >>> C = a. encode ('gbk ')
  14. >>> C
  15. B '\ xc4 \ xe3 \ xba \ xc3'

As you can see, for the Chinese "hello" binary, utf8 is

 
 
  1. 0xe4ba0 0xe5a5bd 

For gbk, gb2312, cp936, and gb18030, the binary value is 0xc4e3 0xbac3, Which is compared with the binary value in the Python vim check encoding. After knowing the encoding, use

 
 
  1. :%!xxd -r 

Command to convert the hexadecimal format into plain text and save it. For existing text, you can use iconv to transcode it in linux. The above section describes how to check the encoding of Python3.0 and Python vim.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.