Python Getting Started: Web bot Crawler

Source: Internet
Author: User

I started to learn Python in the last two days. Because I used C in the past, I felt very novel about the simplicity and ease of use of Python, which greatly increased my interest in learning Python.

Start to record the course and notes of Python today. On the one hand, it facilitates future access, and on the other hand, it shares learning with you.

After a brief look at Python's simple syntax, I found some information online. During the search process, I saw a Python learning video produced by zhipu education. The video named "Web bot crawler" attracted my attention.

The basic principle of Web bot crawler: When a blog website opens a blog, the access volume of the blog increases. If the same blog is opened repeatedly, the access volume of the blog increases significantly.

The program needs to use a third-party function library module: httplib2

Function library: https://code.google.com/p/httplib2/

You need to configure the system environment variables before use, and add the python installation directory after the system environment variable Path. Go to the decompress directory of the httplib2 module and run settup. py to install it.

The code for opening a webpage is:

[Python] view plaincopy
  1. Webbrowser. open_new_tab ('website ')
When a certain number of web pages are opened, the memory will increase. We need to disable the browser regularly. The code for turning off the browser is as follows (Chrome is disabled as an example): [python] view plaincopy
  1. OS. system ('taskkill/F/IMchrome.exe ')

At the same time, we need to use the while loop to perform operations cyclically to refresh the blog. The specific code is modeled after the video of zhipu education. Thanks to zhipu education, the complete code is as follows: [python] view plaincopy
  1. Importwebbrowserasweb
  2. Importtime
  3. Importos
  4. Importrandom
  5. Count = random. randint (5, 7)
  6. J = 0
  7. Whilej <= count:
  8. I = 0
  9. Whilei <= 8:
  10. Web. open_new_tab ('website') # enter the URL
  11. I = I + 1
  12. Time. sleep (0.8)
  13. Else:
  14. OS. system ('taskkill/F/IMchrome.exe ')
  15. Printj, 'timewebbrowerclosed'
  16. J = j + 1

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.