What is the future era? It's the data age! Data Analysis Services, Internet Finance, Data modeling, natural language processing, medical case analysis ... More and more work is done based on data, and crawlers are the most important way to get data fast, and Python crawlers are simpler and more efficient than other languages.
----------------------Course Catalogue------------------------------
The 1th chapter introduces the course:
Introduce course objectives, what you can learn through the curriculum, and what you need to know before you develop your system
The 2nd chapter constructs the development environment under the windows:
Introduces the development software that needs to be installed for project development, the installation and use of Python virtual virtualenv and Virtualenvwrapper, the final introduction of Pycharm and the simple use of navicat
The 3rd chapter The Crawler basic knowledge Review
This paper introduces the basic knowledge of crawler development, including what the crawler can do, regular expression, depth-first and breadth-first algorithm and implementation, the strategy of crawler URL de-weight, and thoroughly understand the difference and application of Unicode and UTF8 coding.
The 4th Chapter Scrapy crawl to take the well-known technical article website
Building scrapy development environment, this chapter introduces Scrapy's common commands and engineering directory structure analysis, this chapter will also explain the use of XPath and CSS selectors in detail. It then completes crawling of all articles with the spider provided by Scrapy. Then detailed the item and the item loader way to complete the specific field after the extraction using scrapy provided pipeline respectively to save the data to the JSON file and the MySQL database. ...
The 5th Chapter Scrapy crawl to take the famous quiz website
This chapter mainly completes the website question and the answer extraction. This chapter in addition to the analysis of the question and answer Web site of the network request will be through requests and scrapy Formrequest two ways to complete the site's simulation login, This chapter analyzes the Web request of the website in detail and analyzes the API request interface of the website question answer separately and saves the data to MySQL. ...
The 6th chapter through the Crawlspider to the recruitment site to crawl the whole station
This chapter completes the design of the data sheet structure of the recruitment website position, and through the form of link extractor and rule and configures the Crawlspider to complete the job site crawl, This chapter also from the source point of view to analyze Crawlspider let everyone have a deep understanding of crawlspider.
The 7th Chapter Scrapy to break through the anti-crawler restriction
This chapter will start with the crawler and anti-crawler struggle process, and then explain the principle of scrapy, and then through the random switch user-agent and set scrapy IP proxy way to complete the various anti-crawler restrictions. This chapter also introduces HttpResponse and HttpRequest in detail to analyze the functions of scrapy, and finally, through the cloud coding platform to complete the online Verification code identification and disable cookies and access frequency to reduce the possibility of spiders blocked. ...
8th Chapter Scrapy Advanced Development
This chapter will explain the more advanced features of scrapy, which include crawling of dynamic site data through selenium and PHANTOMJS and integrating both into scrapy, scrapy signals, custom middleware, pausing and starting scrapy crawlers, Scrapy's core API, Scrapy telnet, scrapy Web service and scrapy log configuration and email delivery. These features allow us not only to do it through scrapy ...
9th Chapter Scrapy-redis Distributed crawler
Scrapy-redis distributed crawler and the use of Scrapy-redis distributed crawler source analysis, so that we can according to their own needs to modify the source to meet their needs. Finally, we will explain how to integrate Bloomfilter into the Scrapy-redis.
10th Chapter Elasticsearch Use of search engine
This chapter will explain the installation and use of Elasticsearch, the introduction of the basic concepts of elasticsearch and the use of APIs. This chapter also explains the principles of the search engine and explains the use of ELASTICSEARCH-DSL, and finally explains how to save data to Elasticsearch through Scrapy pipeline.
The 11th chapter Django build search site
This chapter explains how to quickly build a search site through Django, and this chapter explains how to accomplish Django's search query with Elasticsearch.
12th SCRAPYD deployment of Scrapy crawler
This chapter mainly completes the online deployment of the Scrapy crawler through Scrapyd.
The 13th chapter of the course summary
Re-comb the whole process of system development, so that students have a more intuitive understanding of the system and the development process
: Baidu Network Disk
Python distributed crawler builds search engine website (worth 388 yuan)