Python crawler _ automatically obtains the poc instance of seebug, seebugpoc

Source: Internet
Author: User

Python crawler _ automatically obtains the poc instance of seebug, seebugpoc

I simply wrote a little trick to crawl the poc on www.seebug.org ~

First, we perform packet capture analysis.

The first problem we encountered was that seebug had to be logged on before downloading. This was a good solution. We only needed to capture the page with the return value of 200 and copy our headers information.

(I will not put my headers information here, but the content to be modified and noticed in headers will be clearly described in the following article)

headers = { 'Host':******, 'Connection':'close', 'Accept':******, 'User-Agent':******, 'Referer':'https://www.seebug.org/vuldb/ssvid-', 'Accept-Language':'zh-CN,zh;q=0.8', 'Cookie':***********}

As we know, our midpoint is the referer item, which will be modified later.

So how can we modify this?

I first click the download link to capture packets and find that the poc download link for seebug is particularly neat:

'Https: // www.seebug.org/vuldb/downloadPoc/xxxxx ',

You only need to add a five-digit number, and the five-digit number is a serial number!

This is clear at a glance. I changed the five-digit request and found that no beautiful 200 status code was returned. I glanced at the header and found the referer item:

'Referer': 'https: // www.seebug.org/vuldb/ssvid-xxxxx'

That is to say, the five digits of the referer item also need to change, so that our get request header is complete.

Next is the thread issue.

Using queue and threading for multi-thread processing, we cannot figure it quickly, or we will be discovered by anti-crawler.

Therefore, the import time is increased. sleep (1), just one second of sleep, and two threads (it seems that the thread is of little significance, but it is written as well)

# Coding = utf-8import requestsimport threadingimport Queueimport time headers = {***} url_download = 'https: // www.seebug.org/vuldb/downloadPoc/'class SeeBugPoc (threading. thread): def _ init _ (self, queue): threading. thread. _ init _ (self) self. _ queue = queue def run (self): while not self. _ queue. empty (): url_download = self. _ queue. get_nowait () self. download_file (url_download) def download_file (self, Url_download): r = requests. get (url = url_download, headers = headers) print r. status_code name = url_download.split ('/') [-1] print name if r. status_code = 200: f = open ('e:/poc/'{name}'.txt ', 'w') f. write (r. content) f. close () print 'it OK! 'Else: print' what fuck! 'Time. sleep (1) ''' def get_html (self, url): r = requests. get (url = url, headers = headers) print r. status_code print time. time () ''' def main (): queue = Queue. queue () for I in range (93000,93236): headers ['Referer'] = 'https: // www.seebug.org/vuldb/ssvid-'{str (I) queue. put ('https: // www.seebug.org/vuldb/downloadpoc/'{str (I) # queue is used to store the designed URLs and put them in a queue, so that you can use threads = [] thread_count = 2 for I in range (thread_count): threads. append (SeeBugPoc (queue) for I in threads: I. start () for I in threads: I. join () if _ name _ = '_ main _': main ()

Code above

Controls the two five-digit digits in the downloaded range, you just need to go to the seebug library and find the five-digit number (that is, their number) of the library you want to scan at the beginning and end of the library)

For the returned status code, if the project does not provide the poc download, the poc download does not exist, and the poc needs to be exchanged for the download, the normal 200 error will not be returned (abnormal: 404/403/521, etc)

Of course, if 521 is always displayed, You can refresh the page to obtain the header and modify the code.

Finally, let's determine the status code and write the 200 file.
 

(

It's easy to write.

If you find any errors or have any questions, leave a message to discuss them.

)

The above python crawler _ the poc instance for Automatically Obtaining seebug is all the content shared by Alibaba Cloud. I hope you can give us a reference and support for the customer's house.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.