Python Network Programming Small example: Using Python to get site domain name information

Last Update:2015-05-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

about Whois

Whois(read as" who is", non-abbreviated) is used to queryDomain Name ofIPas wellThe Transfer Protocol for information such as the owner. Simply put,whoisis a query forWhether the domain name has been registered, and the database that registers the details of the domain name (such as the domain owner, domain name registrar). Passwhoisto achieve theQuery for domain name information. of earlywhoisQuery multiple command-line interface exists, but now there are some web interface simplification of the online query tool, you can query a different database at once. The query tool for the Web interface is still dependent onwhoisthe protocol sends a query request to the server, and the tool for the command-line interface is stillWidely used by system administrators.whoistypically usedTCP protocol +Port. EachDomain/IPof thewhoisThe information is kept by the corresponding regulatory body.

A different domain name suffixwhoisThe information needs to be differentwhoisDatabase queries. such as the. comwhoisdatabase, and. edu is different. Currently available in ChinaWHOISof the query serviceWebsite has million nets, webmaster's home and so on. Each domain name orIPof theWHOISThe information is kept by the corresponding regulatory body, for example, to. comThe end of the domain nameWHOISInformation by. comdomain operatorVeriSignManagement, China National top-level domain. CNDomain name byCNNICManagement[1].

WHOISThe service is an online "request/Response "service. WHOIS Serverrun onBackground monitoring +Port, whenInternetusers to search for aDomain name (or other information such as hosts, contacts, etc.),WHOIS Serverfirst set up a withClientof theTCPconnection, and then receives the information requested by the user and queries the background domain name database accordingly. If there is a corresponding record in the database, it will have the relevant information such asManagement information, technical contact information, etc., feedback toClient. StayServerend of output,Clientcloses the connection, at which point a query process ends.

Some registrars, whois information for international domain names , are shielded and can only be contacted by the corresponding registrar if they are to be queried. This protection mechanism is to prevent malicious use of this whois information contact, exposing the customer's privacy information.

Get site Domain information using python

The following example will use the webmaster's home whois, combined with the use of web crawlers, to obtain whois information.

#!/usr/bin/python#-*-coding:utf-8-*-"This code was to get the WHOIS info by http://whois.chinaz.com/Created on 2015-05-13 Version:0.1@author:zhang ' Import osfrom urllib2 import Request, Urlopen, Urlerror, httperrorfrom BS4 import beautifulso Upimport Chardetimport socketsocket.setdefaulttimeout () class Get_whois:def __init__ (self,url): Self.url = ur            L def getdomaininfo (self): domain= ' http://whois.chinaz.com/' +self.url req = Request (domain) Try:                Response = Urlopen (req) except Urlerror, E:if isinstance (E.reason, socket.timeout): print ' Socket timeout ' print ' network meets some problems: ' print ' Reason: ',                E.reason else:if Response! = None:the_page = Response.read ()                char = Chardet.detect (the_page) #解码为unicode the_page.decode (char[' encoding '), ' ignore ') Soup = BeautiFulsoup (the_page) #<div class= "Div_whois" > #因为http://whois.chinaz.com/ The information returned #包含在 <div class= "Div_whois" > tag, get directly domain = Soup.find (' div ', id = "Whoisinfo") domain = Unicode (domain) return domainif __name__ = = "__main__": Dom    Ain = [' baidu.com '] test = Get_whois (domain[0]) info = test.getdomaininfo () Print info

After execution, the content of info is:

<div class= "Div_whois" id= "Whoisinfo" > Registrar: MarkMonitor inc.<br/> Domain Name server:whois.markmonitor.com<br/> DNS Server: Dns.baidu.com<br/>dns server: Ns2.baidu.com<br/>dns server:ns3.baidu.com<br/> DNS server: Ns4.baidu.com<br/>dns server:ns7.baidu.com<br/> domain name status: the operator has set up customer disable Delete protection http://www.icann.org/epp# The operator has set up the customer Forbidden Delete protection <br/> domain name status: the operator has set up a customer prohibited transfer protection http://www.icann.org/epp# operator set up customer prohibited transfer protection <br/> Domain Name Status: the operator has set up the customer to prohibit the modification protection http://www.icann.org/epp# operator set up the customer prohibit Modify protection <br/> domain Name status: Prohibit delete protection on the domain name server Http://www.icann.org/epp #域名服务器上禁止删除保护 <br/> Domain name status: Prohibit transfer protection on the domain name server http://www.icann.org/epp# domain Name server prohibit transfer protection <br/> domain name status: Disable modification of HTTP protection on the domain name server : Disable modification of protection <br/> update on//www.icann.org/epp# domain Name server: October 14, 2013 <br/> created: October 11, 1999 <br/> Expiry date: October 11, 2015 <br/> Contact:: Zhiyong duan<br/><br/></div>

Because of the free reason, the information obtained is not comprehensive, and there is a lot of information is closed by the service provider

After obtaining the corresponding information, it can be further parsed, for example, combining the re module in python to extract the time information.

In the above example, python 's two toolkits are used, namely BeautifulSoup is a python -written Html/xml parser, which can handle nonstandard tags well and generate parse tree . It provides simple and common navigation (navigating), search and modify the parse tree operation. It can greatly save your programming time.

The other is Chardet, used to implement string / file encoding detection .

When executing the code, make sure that the program can find both of these toolkits. See the source package related URLs in the reference file

Reference documents:

Http://baike.baidu.com/link?url=2eMXzLsFeJFyKxD6117bmL1L6Ka-bFP-SAxK0zF30VagwYD8StCLPnoEf9rDdjj_8metwL1QpcF2XImNDoUvLK

http://whois.chinaz.com/

http://www.crummy.com/software/BeautifulSoup/bs4/doc/

Https://pypi.python.org/pypi/chardet

Python Network Programming Small example: Using Python to get site domain name information

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More