Python 2 obtains the encoding of Chinese file names.

Last Update:2018-01-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Problem:

Python2 obtains the file name containing Chinese characters. If the file name is not transcoded, garbled characters may occur.

Assume that the name of the folder to be tested is test. Five files with Chinese names under the folder are:

Pythonability Analysis and Optimization

Pythondata Analysis and Comparison

Python Programming Practice: high-quality program development created by operating design model concurrency and Library

Fluent python.pdf

Compile 59efficient coding codes for high-quality pythoncodes

First, print the obtained file name without transcoding. The Code is as follows:

import osfor file in os.listdir('./test'): print(file)

Output garbled characters:

Python���ܷ������Ż�.pdfPython���ݷ������ھ�ʵս.pdfPython���ʵս���������ģʽ�������ͳ���ⴴ������������.pdf������Python.pdf��д������Python�����59����Ч����.pdf

Solution:

First, test the file name encoding. Here we use the chardet module and the installation command:

pip install chardet

Use the chardet. detect function to check the file name encoding method:

{'confidence': 0.99, 'encoding': 'GB2312'}{'confidence': 0.99, 'encoding': 'GB2312'}{'confidence': 0.99, 'encoding': 'GB2312'}{'confidence': 0.73, 'encoding': 'windows-1252'}{'confidence': 0.99, 'encoding': 'GB2312'}

We can see that the GB2312 encoding has the highest confidence level. We use GB2312 encoding to decode the file name. The Code is as follows:

import osimport chardetfor file in os.listdir('./test'): r = file.decode('GB2312') print(r)

Output:

Pythonability Analysis and Optimization

Pythondata Analysis and Comparison

Python Programming Practice: high-quality program development created by operating design model concurrency and Library

Fluent python.pdf

Compile 59efficient coding codes for high-quality pythoncodes

After encoding, the file name is printed correctly.

PS: the longer the Character String Detected by chardet. detect, the more accurate it is. The shorter it is, the less accurate it is.

Another problem is that the above Code is tested in Windows, and the file name encoding in Linux is UTF-8. To be compatible with Windows and Linux, You need to modify the code, the code is encapsulated in the function:

#-*-Coding: UTF-8-*-import osdef get_filename_from_dir (dir_path): file_list = [] if not OS. path. exists (dir_path): return file_list for item in OS. listdir (dir_path): basename = OS. path. basename (item) # print (chardet. detect (basename) # Find out the file name encoding. The file name contains Chinese characters # in windows, the file encoding is GB2312, and in linux, the file name is UTF-8 try: decode_str = basename. decode ("GB2312") Doesn't UnicodeDecodeError: decode_str = basename. decode ("UTF-8") file_list.append (decode_str) return file_list # test code r = get_filename_from_dir ('. /test') for I in r: print (I)

First use GB2312 decoding. If an error occurs, use UTF-8 decoding. This is compatible with Windows and Linux (tested in Win7 and Ubuntu16.04 ).

The above discussion about how to get the encoding of Chinese file names in Python2 is all the content shared by xiaobian. I hope to give you a reference, and I hope you can provide more support to the customer's house.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python 2 obtains the encoding of Chinese file names.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python 2 obtains the encoding of Chinese file names.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support