How to capture all Sina's IP addresses in China using Python

Source: Internet
Author: User
This article mainly introduces how to use Python programs to capture all the IP addresses of Sina in China. as a small practice of getting IP addresses in Python network programming, if you need it, you can refer to data analysis, in particular, the visitor's IP address needs to be analyzed in website analysis. the analysis IP address mainly distinguishes the visitor's province, city, and administrative district data, considering that the pure IP database does not make a good distinction between the data, it looks for another feasible solution (of course, it is not costly to buy it ). The solution is to capture Sina's IP data.

Sina's IP data interface is:

Http://int.dpool.sina.com.cn/iplookup/iplookup.php? Format = json & ip = 123.124.2.85

The returned data is:

The code is as follows:


{"Ret": 1, "start": "123.123.221.0", "end": "123.124.158.29", "country": "\ u4e2d \ u56fd", "province ": "\ u5317 \ u4eac", "city": "\ u5317 \ u4eac", "district": "", "isp": "\ u8054 \ u901a", "type ": "", "desc ":""}

The returned content includes the province, city, and administrative region information. this is what we really want.

Next, let's talk about how to capture this part of IP data. The main task of capturing this part of data is enumeration. the IP address in the interface is constantly replaced. it is certainly impossible to replace all the IP addresses, therefore, we narrow down the scope and only list all IP segments in China. Considering that Sina's IP interface returns an IP segment, the amount of effort required is missing. The last and 256 IP addresses in the IP segment are basically in one region. Therefore, we need to drop a lot of data. The most important thing to do is to replace the IP address with the INT type.

For specific IP address segments in China, visit the official APNIC website or the following documents.

Http://ftp.apnic.net/apnic/dbase/data/country-ipv4.lst

Let's take a look at how to write the exhaustive program:


import re def ipv3_to_int(s):  l = [int(i) for i in s.split('.')]  return (l[0] << 16) | (l[1] << 8) | l[2] def int_to_ipv3(s):  ip1 = s >> 16 & 0xFF  ip2 = s >> 8 & 0xFF  ip3 = s & 0xFF  return "%d.%d.%d" % (ip1, ip2, ip3) i = open('ChinaIPAddress.csv', 'r')list = i.readlines()for iplist in list:  pattern = re.compile('(\d{1,3}\.\d{1,3}\.\d{1,3})\.\d{1,3}')  ips = pattern.findall(iplist)  x = ips[0]  y = ips[1]  for ip in range (ipv3_to_int(x),ipv3_to_int(y)):    ipadress=str(ip)    #ip_address = int_to_ipv3(ip)    o = open('ChinaIPAddress.txt','a')    o.writelines(ipadress)    o.writelines('\n')  o.close()i.close()

After the preceding steps are completed, you can crawl the Sina IP interface. the capture code is as follows:

#!/usr/bin/python# -*- coding: utf-8 -*-import urllib,urllib2, simplejson, sqlite3, time def ipv3_to_int(s):  l = [int(i) for i in s.split('.')]  return (l[0] << 16) | (l[1] << 8) | l[2] def int_to_ipv4(s):  ip1 = s >> 16 & 0xFF  ip2 = s >> 8 & 0xFF  ip3 = s & 0xFF  return "%d.%d.%d.0" % (ip1, ip2, ip3) def fetch(ipv4, **kwargs):  kwargs.update({    'ip': ipv4,    'format': 'json',  })  DATA_BASE = "http://int.dpool.sina.com.cn/iplookup/iplookup.php"  url = DATA_BASE + '?' + urllib.urlencode(kwargs)  print url  fails = 0  try:    result = simplejson.load(urllib2.urlopen(url,timeout=20))  except (urllib2.URLError,IOError):    fails += 1    if fails < 10:      result = fetch(ipv4)    else:      sleep_download_time = 60*10      time.sleep(sleep_download_time)      result = fetch(ipv4)  return result def dbcreate():  c = conn.cursor()  c.execute('''create table ipdata(    ip integer primary key,    ret integer,    start text,    end text,    country text,    province text,    city text,    district text,    isp text,    type text,    desc text  )''')  conn.commit()  c.close() def dbinsert(ip,address):  c = conn.cursor()  c.execute('insert into ipdata values(?,?,?,?,?,?,?,?,?,?,?)',(ip,address['ret'],address['start'],address['end'],address['country'],address['province'],address['city'],address['district'],address['isp'],address['type'],address['desc']))  conn.commit()  c.close() conn = sqlite3.connect('ipaddress.sqlite3.db')dbcreate() i = open('ChinaIPAddress.txt','r')list = [s.strip() for s in i.readlines()]end = 0for ip in list:  ip = int(ip)  if ip > end :    ipaddress = int_to_ipv4(ip)    info = fetch(ipaddress)    if info['ret'] == -1:      pass    else:      dbinsert(ip,info)      end = ipv3_to_int(info['end'])      print ip,end  else :    passi.close()

By now, all the domestic IP address data of Sina can be captured and used in the data analysis project .~

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.