Python crawler downloads whois server dictionary and whois automation query

Source: Internet
Author: User
Tags tld domain tld

The previous article mainly introduced Python through the socket query Whois method, involving Python based on the Socket module query related skills, with a certain reference value, the need for friends can refer to the next http://www.cnblogs.com/ Zhangmengqin/p/9144022.html

Knock the blackboard!!!

each domain suffix corresponds to the WHOIS server is not the same!

So how do I know the domain suffix that I want to query the corresponding WHOIS server??

Don't worry, baby----https://www.iana.org/domains/root/db (there are various suffixes of the world Whois server, just find a point to go in)

For example I click on this:

This suffix must correspond to this WHOIS server

There are thousands of suffixes of the domain name, and I'm going to manually get all the suffixes and their corresponding WHOIS servers. That's not a hand to waste ...

However, I immediately thought of the Python crawler, it is convenient to make a dictionary locally. The key to match this dictionary will automatically get the corresponding WHOIS server to query.

1. First we get the domain suffix of the world in https://www.iana.org/domains/root/db

1 Import Requests2  fromBS4 Import BeautifulSoup3 4Iurl ='https://www.iana.org/domains/root/db'5res = requests.Get(iurl,timeout= -)6Res.encoding ='Utf-8'7Soup = BeautifulSoup (Res.text,'Html.parser')8list1=[]9List2=[]TenJsonstr={} One  forTaginchSoup.find_all ('span', class_='domain TLD'): AD_suffix =Tag.get_text () -Print (D_suffix)

Use of is BeautifulSoup, the advantage is not to write their own regular, as long as according to his grammar to write, after many tests finally completed the analysis of the data

2. Obtain their WHOIS server by acquiring the domain name suffix and save it in a local file
1 ImportRequests2  fromBs4ImportBeautifulSoup3 ImportRe4 Import Time5 6Iurl ='https://www.iana.org/domains/root/db'7res = Requests.get (iurl,timeout=600)8Res.encoding ='Utf-8'9Soup = BeautifulSoup (Res.text,'Html.parser')Tenlist1=[] OneList2=[] AJsonstr={} -  forTaginchSoup.find_all ('span', class_='domain TLD'): -D_suffix =Tag.get_text () the    Print(D_suffix) - list2.append (D_suffix) -N_suffix = D_suffix.split ('.') [1] -New_url = Iurl +'/'+N_suffix +Server="' -    Try: +Res2=requests.get (new_url,timeout=600) Ares2.encoding='Utf-8' atSoup2= BeautifulSoup (Res2.text,'Html.parser') -  -Retxt = Re.compile (r'<b>whois server:</b> (. *?) \ n') -arr =Retxt.findall (Res2.text) -         ifLen (arr) >0: -Server =Arr[0] in list2.append (server) -         Print(server) toTime.sleep (1) +    exceptException as E: -          Print('timed out') the  *With open ('SuffixList.txt',"a", encoding='Utf-8') as My_file: $My_file.write (N_suffix +":"+ server+'\ n')Panax Notoginseng  - Print('crawl end!!! ')

Above this code down, really cost me a lot of time, while writing side debugging, side Baidu ~ ~ But fortunately eventually came out. After the data has been sorted out, then I saved it to the TXT file.

Because the acquisition of HTML page style, so think of a layer to parse, did not expect to use the regular so simple! It seems that small white does have a solid foundation OH ~

3. Randomly enter any suffix or domain name to automatically query Whois

1temp = input ('Please enter the domain name you want to query:')2result = Temp.split ('.') [0]3Result1=temp.split ('.') [1]4r_suf='.'+RESULT15 Print(Type (R_SUF))6 #print (Result)7 Print(R_suf)8 9 #d = json.dumps (dictionary)TenWhois_server =dictionary.get (R_suf) One Print(Whois_server) A Print(Type (whois_server)) -  - ifWhois_server isNone: the     Print(R_suf +'after that, the deserted.') - Else: -  -s =Socket.socket (socket.af_inet, socket. SOCK_STREAM) +S.connect ((Whois_server, 43)) -temp= (temp +'\ r \ n'). Encode () + s.send (temp) AResponse = b"' at      whileTrue: -data = S.RECV (4096) -Response + =Data -         if  notData: -              Break - s.close () in     Print(Response.decode ())

The paper came to the end of the light, I know this matter to preach. It's true that it's a different thing to see and do. Here Baidu, where Google, the problem is solved, the program has come out, the most important thing is to do hands-on!

Https://whois.22.cn/you can also directly query the OH.

Python crawler downloads whois server dictionary and whois automation query

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.