標籤:tor stat 地址 readlines editor cep 關閉 load rip
幫朋友抓了一些代理IP,並根據測試聯的通性,放在了不通的檔案夾下。特將源碼分享
注意:
1,環境Python3.5
2,安裝BeautifulSoup4 requests
代碼如下:
| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566 |
#-*- coding:gb18030 -*- from bs4 import BeautifulSoupimport requestsimport timeimport os,sys all_url_add={ ‘url2‘:‘http://ip84.com/gn/‘, } def func(url): r = requests.get(url) content = r.text soup = BeautifulSoup(content, "html.parser") ListTable = soup.find_all("table", class_="list") for table in ListTable: ListTr = table.find_all("tr") for tr in ListTr: try: ListTd = tr.find_all("td") ipaddress = str(ListTd[0].get_text()).strip() port = str(ListTd[1].get_text()).strip() city = str(ListTd[2].get_text()).strip().replace("\n", "") leixing = str(ListTd[3].get_text()).strip() xieyi = str(ListTd[4].get_text()).strip() shudu = str(ListTd[5].get_text()).strip() time1 = str(ListTd[6].get_text()).strip() f = open("ip" + ‘.txt‘, ‘a‘) f.write(ipaddress+":"+port+‘\n‘) f.close() print(‘地址:‘+ipaddress + "連接埠:" + port + "地區:" + city + "類型:" + leixing + "協議" + xieyi + "速度" + shudu + "時間:" + time1) except Exception as e: print (u"-------------------程式異常-----------------------") return ‘success‘ print (u‘本頁抓取結束,正在跳轉下一頁‘) def pin(): f2 = open(‘ip.txt‘, ‘r‘) count = len(open(‘ip.txt‘, ‘rU‘).readlines()) for x in range(count): ip = f2.readline().split(‘:‘)[0] return1 = os.system(‘ping -n 5 -w 5 %s‘ % ip) if return1: print(‘測試失敗‘) else: print(‘測試成功,正在寫入新檔案‘) f3 = open(‘SuccessIp.txt‘, ‘a‘) f3.write(f2.readline() + ‘\n‘) f3.close() f2.close() print(‘程式結束,可用IP已放在SuccessIp中‘) if __name__==‘__main__‘: for x in all_url_add: print (x) for y in range(1,50): url=all_url_add[x]+str(y) print (url) status=func(url) if status==‘success‘: print(y,‘頁結束‘) print (u‘****程式抓取運行結束,正在檢查所得IP連通性,請勿關閉視窗*****‘) pin() |
有點亂,有時間將資料存放區在資料庫,再將這個功能整合在部落格當中。
Rex部落格保留所有權利
Python 爬蟲抓取代理IP,並檢測聯通性