"Python Network data Acquisition" Reading notes (v)

Source: Internet
Author: User
Tags pear

1. Parsing JSON data

Python converts JSON into dictionaries, JSON arrays to lists, and JSON strings to python strings.

The following example demonstrates the use of Python's JSON parsing library to handle the different types of data that may occur in a JSON string:

>>> import json>>> jsonstring = ' {"Arrayofnums": [{"Number": 0},{"number": 1},{"number": 2}], " Arrayoffruits ": [{" Fruit ":" Apple "},{" fruit ":" Banana "},{" fruit ":" Pear "}]} ' >>> jsonobj = json.loads ( jsonstring) >>> Print (Jsonobj.get ("Arrayofnums")) [{' Number ': 0}, {' Number ': 1}, {' Number ': 2}]>>> Print (Jsonobj.get ("Arrayofnums") [1]) {' number ': 1}>>> print (Jsonobj.get ("Arrayofnums") [1].get ("number") + Jsonobj.get ("Arrayofnums") [2].get ("number")) 3>>> print (Jsonobj.get ("arrayoffruits") [2].get ("fruit")) Pear

The first line of output is a list object consisting of a group dictionary, the second line is a Dictionary object, the third line is an integer (the first row of the dictionary list of integers), and the fourth line is a string.


Using Python's JSON parsing function to decode, you can print out the country code with IP address 50.78.253.58.

#-*-Coding:utf-8-*-import jsonfrom urllib.request import urlopendef getcountry (ipAddress): Response = Urlopen ("http ://freegeoip.net/json/"+ipaddress). Read (). Decode (' utf-8 ') Responsejson = json.loads (response) return RESPONSEJSON.G ET ("Country_code") Print (Getcountry ("50.78.253.58")) >>>us


2. Edit history page of Wikipedia entry

Do a basic collection of Wikipedia program, look for the editing history page, and then the editing history of the IP address to find out, query the IP address of the country code.

# -*- coding: utf-8 -*-import reimport datetimeimport randomimport  Jsonfrom urllib.request import urlopenfrom bs4 import beautifulsouprandom.seed ( Datetime.datetime.now ()) Def getlinks (Articleurl):     html = urlopen ("http:/ /en.wikipedia.org "+articleurl"     bsobj = beautifulsoup (html,  "lxml")      return bsobj.find ("div",  {"id": "bodycontent"}). FindAll ("A",  href=re.compile (" ^ (/wiki/) ((?!:).) *$ "))     def gethistoryips (pageurl):    #  edit history page URL link format is:     # http://en.wikipedia.org/w/index.php?title=Title_in_URL&action=history     pageurl = pageurl.replace ("/wiki/",  "")     historyurl  =  "http://en.wikipedia.org/w/index.php?title=" +pageurl+ "&action=history"      priNT ("history url is: " +historyurl)     html = urlopen (HISTORYURL)     bsobj = beautifulsoup (html,  "lxml")     #  Find the link to the class attribute "Mw-anonuserlink"     #  they use the IP address instead of the username     ipaddresses  = bsobj.findall ("A",  {"class": "Mw-anonuserlink"})     addresslist =  set ()     for ipAddress in ipAddresses:         addresslist.add (Ipaddress.get_text ())     return addressList     def getcountry (ipAddress):    try:         response = urlopen ("http://freegeoip.net/json/" +ipaddress). Read (). Decode (' Utf-8 ')     except httperror:        return none     respoNsejson = json.loads (response)     return responsejson.get ("Country_code")     links = getlinks ("/wiki/python_ (Programming_language)") while (Len (links)  > 0):    for link in links:         print ("-------------------")         historyIPs  = gethistoryips (link.attrs["href"])         for historyip  in historyIPs:             #print ( HISTORYIP)             country =  Getcountry (HISTORYIP)             if country  is not None:                 print (historyip+ " IS&Nbsp;from  "+country)                  newlink = links[random.randint (0, len (links)-1)].attrs["href"]     links = getlinks (NewLink)

First gets the editing history of all entries connected by the start entry (in the example is the Python programminglanguage entry). Then, randomly select an entry as the starting point, and then get the editing history of all the entries connected to the page, and query the editor's IP address to the country and region to which it belongs. Repeat this process until the page is not connected to the wiki entry.

Where the function gethistoryips searches all the Mw-anonuserlin classes for link information (the IP address of the anonymous user, not the user name), and returns a list of links.

The IP address data of the edited history is obtained, which is combined with the Getcountry function of the previous section to query the country and region of the IP address.

The following are some of the output results:

-------------------History url is: http://en.wikipedia.org/w/index.php?title=programming_ paradigm&action=history168.216.130.133 is from us223.104.186.241 is from  cn31.203.136.191 is from kw192.117.105.47 is from il193.80.242.220 is  from at223.230.96.108 is from in39.36.182.41 is from pk68.151.180.83  is from ca218.17.157.55 is from cn110.55.67.15 is from ph42.111.56.168  is from IN92.115.222.143 is from MD197.255.127.246 is from  gh2605:6000:ec0f:c800:edfd:179f:b648:b4b9 is from us2a02:c7d:a492:f200:e126:2b36:53ca:513a  IS FROM GB-------------------history url is: http://en.wikipedia.org/w/ index.php?title=object-oriented_programming&action=history103.74.23.139 is from  pk217.225.8.24 is from de223.230.215.145 is from in162.204.116.16 is from us170.142.177.246 is from  us205.251.185.250 is from us117.239.185.50 is from in119.152.87.84 is  from PK93.136.125.208 is from HR113.199.249.237 is from  np112.200.199.62 is from ph103.241.244.36 is from in27.251.109.234 is  from in103.16.68.215 is from in121.58.212.157 is from ph2605:a601 : 474:600:2088:fbde:7512:53b2 is from us-------------------


"Python Network data Acquisition" Reading notes (v)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.