Brief introduction:
Description: This module is mainly used to implement string/file encoding detection
Quick installation:
Pip Install--upgrade Chardet
Common methods:
Chardet.detect (ABUF), Dict
Description: Detect string encoding, return a dictionary containing confidence encoding match accuracy rate, encoding final detection of the encoding, when ABUF is empty may encoding to none, so it's best to judge
Best Practices:
1. Firmwareupload will automatically connect to the OA system and the corresponding SVN server, automatically regularly read the latest OA released firmware program and Releasenote, but the middle of the releasenote may be through the Department of OA through different departments have been modified, can not distinguish between the synchronization of the Code, Due to the eventual reading of the Releasenote file generated corresponding directory structure is automatically published to the Upgradeserver, so want to be able to accurately identify the encoding unified conversion to UTF-8 encoding?
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Date : 2016-11-23 11:14:15# @Author : li ([email protected]) # @Link : http://xmdevops.blog.51cto.com/# @Version : $Id $from __ future__ import absolute_import# Description: Import Public module import osimport chardet# Description: Import other modules if __name__ == ' __main__ ': res_lines = [] with open (' Changelog_chinese.dat ', ' r+b ') as fd: res_line = os.linesep for line in fd: line = line.lstrip () encoding = chardet.detect (line). Get (' encOding ', none) print encoding if encoding: res_line = line.decode (encoding, ' replace '). Encode (' Utf-8 ') res_lines.append (Res_line) print res_lines
This article is from the "Li-Yun Development Road" blog, please be sure to keep this source http://xmdevops.blog.51cto.com/11144840/1875749
Basic Primer _python-modules and packages-best practices for Chartdet coding detection in operational development