Python 2 obtains the encoding of Chinese file names.

Source: Internet
Author: User

Python 2 obtains the encoding of Chinese file names.

Problem:

Python2 obtains the file name containing Chinese characters. If the file name is not transcoded, garbled characters may occur.

Assume that the name of the folder to be tested is test. Five files with Chinese names under the folder are:

Pythonability Analysis and Optimization

Pythondata Analysis and Comparison

Python Programming Practice: high-quality program development created by operating design model concurrency and Library

Fluent python.pdf

Compile 59efficient coding codes for high-quality pythoncodes

First, print the obtained file name without transcoding. The Code is as follows:

import osfor file in os.listdir('./test'): print(file)

Output garbled characters:

Python���ܷ������Ż�.pdfPython���ݷ������ھ�ʵս.pdfPython���ʵս���������ģʽ�������ͳ���ⴴ������������.pdf������Python.pdf��д������Python�����59����Ч����.pdf

Solution:

First, test the file name encoding. Here we use the chardet module and the installation command:

pip install chardet

Use the chardet. detect function to check the file name encoding method:

{'confidence': 0.99, 'encoding': 'GB2312'}{'confidence': 0.99, 'encoding': 'GB2312'}{'confidence': 0.99, 'encoding': 'GB2312'}{'confidence': 0.73, 'encoding': 'windows-1252'}{'confidence': 0.99, 'encoding': 'GB2312'}

We can see that the GB2312 encoding has the highest confidence level. We use GB2312 encoding to decode the file name. The Code is as follows:

import osimport chardetfor file in os.listdir('./test'): r = file.decode('GB2312') print(r)

Output:

Pythonability Analysis and Optimization

Pythondata Analysis and Comparison

Python Programming Practice: high-quality program development created by operating design model concurrency and Library

Fluent python.pdf

Compile 59efficient coding codes for high-quality pythoncodes

After encoding, the file name is printed correctly.

PS: the longer the Character String Detected by chardet. detect, the more accurate it is. The shorter it is, the less accurate it is.

Another problem is that the above Code is tested in Windows, and the file name encoding in Linux is UTF-8. To be compatible with Windows and Linux, You need to modify the code, the code is encapsulated in the function:

#-*-Coding: UTF-8-*-import osdef get_filename_from_dir (dir_path): file_list = [] if not OS. path. exists (dir_path): return file_list for item in OS. listdir (dir_path): basename = OS. path. basename (item) # print (chardet. detect (basename) # Find out the file name encoding. The file name contains Chinese characters # in windows, the file encoding is GB2312, and in linux, the file name is UTF-8 try: decode_str = basename. decode ("GB2312") Doesn't UnicodeDecodeError: decode_str = basename. decode ("UTF-8") file_list.append (decode_str) return file_list # test code r = get_filename_from_dir ('. /test') for I in r: print (I)

First use GB2312 decoding. If an error occurs, use UTF-8 decoding. This is compatible with Windows and Linux (tested in Win7 and Ubuntu16.04 ).

The above discussion about how to get the encoding of Chinese file names in Python2 is all the content shared by xiaobian. I hope to give you a reference, and I hope you can provide more support to the customer's house.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.