Python crawler downloads whois server dictionary and whois automation query

Last Update:2018-06-11 Source: Internet

Author: User

Tags tld domain tld

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The previous article mainly introduced Python through the socket query Whois method, involving Python based on the Socket module query related skills, with a certain reference value, the need for friends can refer to the next http://www.cnblogs.com/ Zhangmengqin/p/9144022.html

Knock the blackboard!!!

each domain suffix corresponds to the WHOIS server is not the same!

So how do I know the domain suffix that I want to query the corresponding WHOIS server??

Don't worry, baby----https://www.iana.org/domains/root/db (there are various suffixes of the world Whois server, just find a point to go in)

For example I click on this:

This suffix must correspond to this WHOIS server

There are thousands of suffixes of the domain name, and I'm going to manually get all the suffixes and their corresponding WHOIS servers. That's not a hand to waste ...

However, I immediately thought of the Python crawler, it is convenient to make a dictionary locally. The key to match this dictionary will automatically get the corresponding WHOIS server to query.

1. First we get the domain suffix of the world in https://www.iana.org/domains/root/db

1 Import Requests2  fromBS4 Import BeautifulSoup3 4Iurl ='https://www.iana.org/domains/root/db'5res = requests.Get(iurl,timeout= -)6Res.encoding ='Utf-8'7Soup = BeautifulSoup (Res.text,'Html.parser')8list1=[]9List2=[]TenJsonstr={} One  forTaginchSoup.find_all ('span', class_='domain TLD'): AD_suffix =Tag.get_text () -Print (D_suffix)

Use of is BeautifulSoup, the advantage is not to write their own regular, as long as according to his grammar to write, after many tests finally completed the analysis of the data

2. Obtain their WHOIS server by acquiring the domain name suffix and save it in a local file

1 ImportRequests2  fromBs4ImportBeautifulSoup3 ImportRe4 Import Time5 6Iurl ='https://www.iana.org/domains/root/db'7res = Requests.get (iurl,timeout=600)8Res.encoding ='Utf-8'9Soup = BeautifulSoup (Res.text,'Html.parser')Tenlist1=[] OneList2=[] AJsonstr={} -  forTaginchSoup.find_all ('span', class_='domain TLD'): -D_suffix =Tag.get_text () the    Print(D_suffix) - list2.append (D_suffix) -N_suffix = D_suffix.split ('.') [1] -New_url = Iurl +'/'+N_suffix +Server="' -    Try: +Res2=requests.get (new_url,timeout=600) Ares2.encoding='Utf-8' atSoup2= BeautifulSoup (Res2.text,'Html.parser') -  -Retxt = Re.compile (r'<b>whois server:</b> (. *?) \ n') -arr =Retxt.findall (Res2.text) -         ifLen (arr) >0: -Server =Arr[0] in list2.append (server) -         Print(server) toTime.sleep (1) +    exceptException as E: -          Print('timed out') the  *With open ('SuffixList.txt',"a", encoding='Utf-8') as My_file: $My_file.write (N_suffix +":"+ server+'\ n')Panax Notoginseng  - Print('crawl end!!! ')

Above this code down, really cost me a lot of time, while writing side debugging, side Baidu ~ ~ But fortunately eventually came out. After the data has been sorted out, then I saved it to the TXT file.

Because the acquisition of HTML page style, so think of a layer to parse, did not expect to use the regular so simple! It seems that small white does have a solid foundation OH ~

3. Randomly enter any suffix or domain name to automatically query Whois

1temp = input ('Please enter the domain name you want to query:')2result = Temp.split ('.') [0]3Result1=temp.split ('.') [1]4r_suf='.'+RESULT15 Print(Type (R_SUF))6 #print (Result)7 Print(R_suf)8 9 #d = json.dumps (dictionary)TenWhois_server =dictionary.get (R_suf) One Print(Whois_server) A Print(Type (whois_server)) -  - ifWhois_server isNone: the     Print(R_suf +'after that, the deserted.') - Else: -  -s =Socket.socket (socket.af_inet, socket. SOCK_STREAM) +S.connect ((Whois_server, 43)) -temp= (temp +'\ r \ n'). Encode () + s.send (temp) AResponse = b"' at      whileTrue: -data = S.RECV (4096) -Response + =Data -         if  notData: -              Break - s.close () in     Print(Response.decode ())

The paper came to the end of the light, I know this matter to preach. It's true that it's a different thing to see and do. Here Baidu, where Google, the problem is solved, the program has come out, the most important thing is to do hands-on!

Https://whois.22.cn/you can also directly query the OH.

Python crawler downloads whois server dictionary and whois automation query

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More