The previous article mainly introduced Python through the socket query Whois method, involving Python based on the Socket module query related skills, with a certain reference value, the need for friends can refer to the next http://www.cnblogs.com/ Zhangmengqin/p/9144022.html
Knock the blackboard!!!
each domain suffix corresponds to the WHOIS server is not the same!
So how do I know the domain suffix that I want to query the corresponding WHOIS server??
Don't worry, baby----https://www.iana.org/domains/root/db (there are various suffixes of the world Whois server, just find a point to go in)
For example I click on this:
This suffix must correspond to this WHOIS server
There are thousands of suffixes of the domain name, and I'm going to manually get all the suffixes and their corresponding WHOIS servers. That's not a hand to waste ...
However, I immediately thought of the Python crawler, it is convenient to make a dictionary locally. The key to match this dictionary will automatically get the corresponding WHOIS server to query.
1. First we get the domain suffix of the world in https://www.iana.org/domains/root/db
1 Import Requests2 fromBS4 Import BeautifulSoup3 4Iurl ='https://www.iana.org/domains/root/db'5res = requests.Get(iurl,timeout= -)6Res.encoding ='Utf-8'7Soup = BeautifulSoup (Res.text,'Html.parser')8list1=[]9List2=[]TenJsonstr={} One forTaginchSoup.find_all ('span', class_='domain TLD'): AD_suffix =Tag.get_text () -Print (D_suffix)
Use of is BeautifulSoup, the advantage is not to write their own regular, as long as according to his grammar to write, after many tests finally completed the analysis of the data
2. Obtain their WHOIS server by acquiring the domain name suffix and save it in a local file
1 ImportRequests2 fromBs4ImportBeautifulSoup3 ImportRe4 Import Time5 6Iurl ='https://www.iana.org/domains/root/db'7res = Requests.get (iurl,timeout=600)8Res.encoding ='Utf-8'9Soup = BeautifulSoup (Res.text,'Html.parser')Tenlist1=[] OneList2=[] AJsonstr={} - forTaginchSoup.find_all ('span', class_='domain TLD'): -D_suffix =Tag.get_text () the Print(D_suffix) - list2.append (D_suffix) -N_suffix = D_suffix.split ('.') [1] -New_url = Iurl +'/'+N_suffix +Server="' - Try: +Res2=requests.get (new_url,timeout=600) Ares2.encoding='Utf-8' atSoup2= BeautifulSoup (Res2.text,'Html.parser') - -Retxt = Re.compile (r'<b>whois server:</b> (. *?) \ n') -arr =Retxt.findall (Res2.text) - ifLen (arr) >0: -Server =Arr[0] in list2.append (server) - Print(server) toTime.sleep (1) + exceptException as E: - Print('timed out') the *With open ('SuffixList.txt',"a", encoding='Utf-8') as My_file: $My_file.write (N_suffix +":"+ server+'\ n')Panax Notoginseng - Print('crawl end!!! ')
Above this code down, really cost me a lot of time, while writing side debugging, side Baidu ~ ~ But fortunately eventually came out. After the data has been sorted out, then I saved it to the TXT file.
Because the acquisition of HTML page style, so think of a layer to parse, did not expect to use the regular so simple! It seems that small white does have a solid foundation OH ~
3. Randomly enter any suffix or domain name to automatically query Whois
1temp = input ('Please enter the domain name you want to query:')2result = Temp.split ('.') [0]3Result1=temp.split ('.') [1]4r_suf='.'+RESULT15 Print(Type (R_SUF))6 #print (Result)7 Print(R_suf)8 9 #d = json.dumps (dictionary)TenWhois_server =dictionary.get (R_suf) One Print(Whois_server) A Print(Type (whois_server)) - - ifWhois_server isNone: the Print(R_suf +'after that, the deserted.') - Else: - -s =Socket.socket (socket.af_inet, socket. SOCK_STREAM) +S.connect ((Whois_server, 43)) -temp= (temp +'\ r \ n'). Encode () + s.send (temp) AResponse = b"' at whileTrue: -data = S.RECV (4096) -Response + =Data - if notData: - Break - s.close () in Print(Response.decode ())
The paper came to the end of the light, I know this matter to preach. It's true that it's a different thing to see and do. Here Baidu, where Google, the problem is solved, the program has come out, the most important thing is to do hands-on!
Https://whois.22.cn/you can also directly query the OH.
Python crawler downloads whois server dictionary and whois automation query