Introduction to Python web crawler 001 (Popular Science) web crawler
1. What is the Web crawler?
I give a few examples of life:
Example One:
I usually will learn the knowledge and accumulated experience written blog sent to the CSDN blog site, then for me to write Blender 3D Modeling aspects of the blog, there are a lot of pictures. If I want to publish a Blender 3D Modeling Blog, the picture to a piece of upload, upload speed is sometimes very slow, the whole post this blog, i upload pictures will operate for more than half an hour, so the efficiency is too low.
I can write a program, let it automatically in the background to help me upload pictures.
Example Two:
I am lazy, lazy to what extent: I do not want to go out to eat, basically a day of meals are online order takeout. In fact, every time I order the takeout is that several courses. (Because the food is too small to sell.) Then I am very lazy, I do not want to sell the order, I would like to eat time, someone volunteered to come to me to send rice. Because I have a choice of obstacles, every time I am hungry, I have to choose for a long time, and finally decided to eat the dishes, not only a waste, but also the focus is: I am hungry now, the fastest takeaway is half an hour before delivery.
Can I write a program that automatically helps me book a meal for a day?
Example Three:
If you open a Taobao store, you want to timely understand the competitors of the price of goods, good to do the corresponding countermeasures. I have to visit their Taobao shop every day, and then compare with my own shop prices, if found that the price of competitors ' goods have changed, my own shop in the same price of goods to be followed by the corresponding changes. It's going to cost a lot of time. An uppercase "annoying" word expresses my present mood. I opened an online shop, big and small is also a boss Ah, the boss of the time how valuable ah, if I do such things every day, then I have to lose several billion a day.
Can I write a program that lets it monitor the price of a competitor's merchandise in real time, and automatically make changes to the price of the goods in the shop?
The ultimate answer to these real-life problems is: Yes, you can write this program to help you improve your productivity.
Through this blog column tutorial, you can use web crawler technology to achieve these repetitive tasks of automated processing. 2. Whether the network crawler is legal
Yes, for lazy people like me, the web crawler is really my savior. Not only can I help you improve your productivity, but I can help you crawl all the information on any one site. So here's the question: is the web crawler legal?
How to say, now (2016-9-2-21:34:06) China does not have the relevant legal provisions.
But all things have to have a degree, if you do not recklessly carry out their own crawling behavior, you may be breaking the law. 3. Web Crawler Official Introduction
Finally, the official description of the web crawler will be posted:
Network crawler (web crawler) is also called network spider (Web spider), ant (ant), Automatic Retrieval tool (automatic indexer), or (in the FOAF software concept) network scurry (Web Scutter), is an " Automated browsing Network " program, or a network of robots . They are widely used in Internet search engines or other similar sites to get or update the content and retrieval methods of these sites. They can automatically capture all the content they can access to, for search engines to do further processing (sorting and collating downloaded pages), allowing users to retrieve the information they need more quickly.
Summary:
In this section, I know what a web crawler is. The next section, we introduce: before crawling a site, first of all to the target site size and results to a certain degree of understanding.