Use chardet to detect webpage encoding and chardet to detect Encoding
Environment: Win7_x64 + python3.4.3
You need to download chardet and install it first,: https://pypi.python.org/packages/source/c/chardet/chardet-2.3.0.tar.gz
Install: Enter the decompressed directory and run the following command in the Command window:Python setup. py install
Write a test python script (DetectURLCoding. py ):
# Coding: UTF-8 ''' python 3. x ''' import sys import urllib. request import chardet # write data to the file fname def writeFile (fname, data): f = open (fname, "wb") if f: f. write (data) f. close () def blog_detect (blogurl): ''' encoding method ''' try: fp = urllib. request. urlopen (blogurl) failed t Exception as e: print (e) print ('Download exception-[% s] '% blogurl) return 0 blog = fp. read () # python3.x read the html as html code bytearray fp. close () # writeFile ("t.html", blog) # get encoding string codedetect = chardet. detect (blog) ['encoding'] print ('% s <-% s' % (blogurl, codedetect )) return 1 if _ name __= = '_ main _': if len (sys. argv) = 1: print (''' usage: python DetectURLCoding. py http://xxx.com ''') else: v = blog_detect (sys. argv [1]) print (v) # He asked hovertree.com
Running result:
D:\profile\Desktop>PYTHON de.py http://hovertree.com/ http://hovertree.com/ <- utf-8 1 D:\profile\Desktop>PYTHON de.py http://photo.cankaoxiaoxi.com/roll10/2015/0318/709734.shtml http://photo.cankaoxiaoxi.com/roll10/2015/0318/709734.shtml <- utf-8 1
Web Front-end: http://www.cnblogs.com/roucheng/p/texiao.html