1. Parsing JSON data
Python converts JSON into dictionaries, JSON arrays to lists, and JSON strings to python strings.
The following example demonstrates the use of Python's JSON parsing library to handle the different types of data that may occur in a JSON string:
>>> import json>>> jsonstring = ' {"Arrayofnums": [{"Number": 0},{"number": 1},{"number": 2}], " Arrayoffruits ": [{" Fruit ":" Apple "},{" fruit ":" Banana "},{" fruit ":" Pear "}]} ' >>> jsonobj = json.loads ( jsonstring) >>> Print (Jsonobj.get ("Arrayofnums")) [{' Number ': 0}, {' Number ': 1}, {' Number ': 2}]>>> Print (Jsonobj.get ("Arrayofnums") [1]) {' number ': 1}>>> print (Jsonobj.get ("Arrayofnums") [1].get ("number") + Jsonobj.get ("Arrayofnums") [2].get ("number")) 3>>> print (Jsonobj.get ("arrayoffruits") [2].get ("fruit")) Pear
The first line of output is a list object consisting of a group dictionary, the second line is a Dictionary object, the third line is an integer (the first row of the dictionary list of integers), and the fourth line is a string.
Using Python's JSON parsing function to decode, you can print out the country code with IP address 50.78.253.58.
#-*-Coding:utf-8-*-import jsonfrom urllib.request import urlopendef getcountry (ipAddress): Response = Urlopen ("http ://freegeoip.net/json/"+ipaddress). Read (). Decode (' utf-8 ') Responsejson = json.loads (response) return RESPONSEJSON.G ET ("Country_code") Print (Getcountry ("50.78.253.58")) >>>us
2. Edit history page of Wikipedia entry
Do a basic collection of Wikipedia program, look for the editing history page, and then the editing history of the IP address to find out, query the IP address of the country code.
# -*- coding: utf-8 -*-import reimport datetimeimport randomimport Jsonfrom urllib.request import urlopenfrom bs4 import beautifulsouprandom.seed ( Datetime.datetime.now ()) Def getlinks (Articleurl): html = urlopen ("http:/ /en.wikipedia.org "+articleurl" bsobj = beautifulsoup (html, "lxml") return bsobj.find ("div", {"id": "bodycontent"}). FindAll ("A", href=re.compile (" ^ (/wiki/) ((?!:).) *$ ")) def gethistoryips (pageurl): # edit history page URL link format is: # http://en.wikipedia.org/w/index.php?title=Title_in_URL&action=history pageurl = pageurl.replace ("/wiki/", "") historyurl = "http://en.wikipedia.org/w/index.php?title=" +pageurl+ "&action=history" priNT ("history url is: " +historyurl) html = urlopen (HISTORYURL) bsobj = beautifulsoup (html, "lxml") # Find the link to the class attribute "Mw-anonuserlink" # they use the IP address instead of the username ipaddresses = bsobj.findall ("A", {"class": "Mw-anonuserlink"}) addresslist = set () for ipAddress in ipAddresses: addresslist.add (Ipaddress.get_text ()) return addressList def getcountry (ipAddress): try: response = urlopen ("http://freegeoip.net/json/" +ipaddress). Read (). Decode (' Utf-8 ') except httperror: return none respoNsejson = json.loads (response) return responsejson.get ("Country_code") links = getlinks ("/wiki/python_ (Programming_language)") while (Len (links) > 0): for link in links: print ("-------------------") historyIPs = gethistoryips (link.attrs["href"]) for historyip in historyIPs: #print ( HISTORYIP) country = Getcountry (HISTORYIP) if country is not None: print (historyip+ " IS&Nbsp;from "+country) newlink = links[random.randint (0, len (links)-1)].attrs["href"] links = getlinks (NewLink)
First gets the editing history of all entries connected by the start entry (in the example is the Python programminglanguage entry). Then, randomly select an entry as the starting point, and then get the editing history of all the entries connected to the page, and query the editor's IP address to the country and region to which it belongs. Repeat this process until the page is not connected to the wiki entry.
Where the function gethistoryips searches all the Mw-anonuserlin classes for link information (the IP address of the anonymous user, not the user name), and returns a list of links.
The IP address data of the edited history is obtained, which is combined with the Getcountry function of the previous section to query the country and region of the IP address.
The following are some of the output results:
-------------------History url is: http://en.wikipedia.org/w/index.php?title=programming_ paradigm&action=history168.216.130.133 is from us223.104.186.241 is from cn31.203.136.191 is from kw192.117.105.47 is from il193.80.242.220 is from at223.230.96.108 is from in39.36.182.41 is from pk68.151.180.83 is from ca218.17.157.55 is from cn110.55.67.15 is from ph42.111.56.168 is from IN92.115.222.143 is from MD197.255.127.246 is from gh2605:6000:ec0f:c800:edfd:179f:b648:b4b9 is from us2a02:c7d:a492:f200:e126:2b36:53ca:513a  IS FROM GB-------------------history url is: http://en.wikipedia.org/w/ index.php?title=object-oriented_programming&action=history103.74.23.139 is from pk217.225.8.24 is from de223.230.215.145 is from in162.204.116.16 is from us170.142.177.246 is from us205.251.185.250 is from us117.239.185.50 is from in119.152.87.84 is from PK93.136.125.208 is from HR113.199.249.237 is from np112.200.199.62 is from ph103.241.244.36 is from in27.251.109.234 is from in103.16.68.215 is from in121.58.212.157 is from ph2605:a601 : 474:600:2088:fbde:7512:53b2 is from us-------------------
"Python Network data Acquisition" Reading notes (v)