The work needs to determine whether a URL in a text to access the normal, and randomly get the n rows of the normal access to the URL data, my idea is: Read the text of each row of data, with Urlopen access, will return the URL of the status Code 200 to a list, get the length of the list, Use random to generate a random value as the list subscript to get the row data. The specific implementation is as follows:
1 ImportUrllib2,random2 fromSetsImportSet3 4 defget_responses (URL):5 Globalgood_list6 Globalbad_list7 if notUrl.startswith ("http:"):8Http_url ="/ http"+URL9headers = {'user-agent':'mozilla/5.0 (Windows NT 5.1; rv:10.0.1) gecko/20100101 firefox/10.0.1',}Ten Try: OneRequest = Urllib2. Request (Http_url, headers=headers) ARESP =Urllib2.urlopen (Request) - PrintURL - exceptUrllib2. Urlerror, E: the Printe - bad_list.append (URL) - return0 - +Retcode =Resp.getcode () - ifRetcode = = 200: + good_list.append (URL) A #return 1 at Else: - bad_list.append (URL) - #return 0 - - defreadFile (): - Try: inurllist = Open (r'C:\Users\888\Desktop\urls.txt','R') - exceptIOError: to Print "file does not exist.\n" + forIteminchurllist: -Item = Item.strip ('\ n') theR =get_responses (item) * $ urllist.close ()Panax Notoginseng Print "Total URLs:%d, good urls:%d, bad URLs:%d."% ((len (good_list) +Len ( bad_list)), Len (good_list), Len (bad_list)) - the defWriteFile (linenum): +result = [] ALinelen =Len (good_list) the whileLen (Set (Result)) <Int (linenum): +s = Random.randint (0,linelen-1) - result.append (Good_list[s]) $ $ #Put the Good_url in Goodurl.txt file - Try: -Goodurl = Open (r'C:\Users\888\Desktop\goodurl.txt','w+') the exceptIOError: - Print "file does not exist.\n"Wuyi the forIteminchResult: -Goodurl.write (item+'\ n') Wu goodurl.close () - About Print "The mission is do, please check the Goodurl.txt file" $ - if __name__=="__main__": -Good_list = [] -Bad_list = [] A ReadFile () +WriteFile (100)
Python implements random reading of text n rows of data