Using Python to capture administrative area code, python captures
Preface
The National Bureau of Statistics website has relatively consistent Administrative Code. For some websites, this is a very basic data, so I wrote a Python program to capture this part of the data.
Note:After capturing it, you need to organize it manually.
Sample Code:
#-*-Coding: UTF-8-*-''' obtain the administrative division code ''' import requests, rebase_url = 'HTTP: // javasdef get_xzqh (): html_data = requests. get (base_url ). content pattern = re. compile ('<p class = "MsoNormal" style = ". *? "> <Span lang =" EN-US "style = ".*? "> (\ D +) <span> .*? </Span> <span style = ".*? "> (.*?) </Span> </p> ') areas = re. findall (pattern, html_data) print "code, name, level" for area in areas: print area [0], area [1]. decode ('utf-8 '). replace (u'', ''), area [1]. decode ('utf-8 '). count (u'') if _ name __= = '_ main _': get_xzqh ()
Note:
In addition, there is another channel for obtaining information about the country/region table, that is, the country/region information table that comes with the QQ Software. (The file name isLocList.xml
), The general storage location is:C:\Program Files\Tencent\QQ\I18N\2052
If you need to install the Chinese version of QQ, you can obtain it. If you need the English version, install the English version of QQ. The international version is in the 1033 directory.
The code is written according to the iso00006 standard and can be easily imported into the database.
Summary
The above is the use of Python to get all the content of the Administrative Division code. I hope this article will be helpful for you to learn or use python. If you have any questions, you can leave a message.