Http://www.cnblogs.com/jinxiao-pu/p/6706319.html
Recently on the Internet to learn a course on the Scrapy Crawler, feel good, the following is the catalogue is still in the update, I think it is necessary to make a good note, research and research.
The 1th chapter of the course Introduction
- 1-1 python distributed crawler build search engine introduction 07:23
2nd. Building a development environment under Windows
- Installation and simple use of 2-1 pycharm 10:27
- 2-2 installation and use of MySQL and navicat 16:20
- 2-3 install Python2 and python3 06:49 under Windows and Linux
- 2-4 installation and configuration of Virtual environments 30:53
The 3rd chapter the crawler basic knowledge Review
- 3-1 Technology Selection crawler can do what 09:50
- 3-2 Regular Expressions -1 18:31
- 3-3 Regular Expressions -2 19:04
- 3-4 Regular Expressions -3 20:16
- 3-5 depth first and breadth First principle 25:15
- 3-6 URL de-weight method 07:44
- 3-7 thoroughly understand Unicode and UTF8 encoding 18:31
The
4th Chapter Scrapy Crawl to take the well-known technical article website (Bole online actual combat)
- 4-1 scrapy Installation and directory structure introduction 22:33
- 4-2 pycharm Debug scrapy execution Process 12:35
- 4-3 XPath Usage-1 22:17
- 4-4 XPath Usage-2 19:00
- 4-5 XPath Usage-3 21:22
- 4-6 CSS selector for field resolution-1 17:21
- 4-7 CSS selector for field resolution-2 16:31
- 4-8 writing spider crawl jobbole all articles-1 15:40
- 4-9 writing spider crawl jobbole all articles-2 09:45
- 4-10 Items Design-1 14:49
- 4-11 Items Design-2 15:45
- 4-12 Items Design-3 17:05
- 4-13 Data Sheet design and save item to JSON file 18:17
- 4-14 save data to Mysql-1 18:41 via pipeline
- 4-15 save data to Mysql-2 17:58 via pipeline
- 4-16 scrapy Item Loader mechanism-1 17:26
- 4-17 scrapy Item Loader mechanism-2 20:31
The 5th Chapter Scrapy Crawl to take the well-known quiz website (knows actual combat)
- 5-1 session and Cookie Automatic login mechanism 20:10
- 5-2 requests Analog Landing-1 13:32
- 5-3 requests Analog Landing-2 13:16
- 5-4 requests analog Landing-3 12:21
- 5-5 scrapy Analog Login 20:46
- 5-6 knowledge analysis and Data sheet design 1 15:17
- 5-7 knowledge analysis and Data sheet design-2 13:35
- 5-8 Item Loder Way extract question-1 14:57
- 5-9 Item Loder Way extract question-2 15:20
- 5-10 Item Loder Way extract question-3 06:45
- 5-11 knowledge of spider crawler logic implementation and answer extraction-1 15:54
- 5-12 knowledge of spider crawler logic implementation and answer extraction-2 17:04
- 5-13 saving data to MySQL-1 17:27
- 5-14 saving data to MySQL-2 17:22
- 5-15 saving data to MySQL-3 16:09
- 5-16 (Supplemental section) know the verification code login-1_1 16:41
- 5-17 (Supplemental section) know the verification code login-2_1 10:32
the 6th chapter through the Crawlspider to the recruitment site to crawl the whole station (pull Hook net combat)
- 6-1 Data Sheet structure design 15:33
- 6-2 Crawlspider Source Code Analysis-New Crawlspider and Settings configuration 12:50
- 6-3 Crawlspider Source Code Analysis 25:29
- 6-4 rule and Linkextractor use 14:28
- 6-5 Item Loader Way to resolve position 24:46
- 6-6 Job Data Warehousing-1 19:01
- 6-7 Job Information Warehousing-2 11:19
the 7th Chapter Scrapy the limit of the anti-crawler (Hook net combat)
- 7-1 crawler and anti-crawl confrontation process and strategy 20:17
- 7-2 scrapy Architecture Source Code Analysis 10:45
- 7-3 requests and Response introduction 10:18
- 7-4 random replacement of user-agent-1 17:00 via Downloadmiddleware
- 7-5 random replacement of user-agent-2 17:13 via Downloadmiddleware
- 7-6 scrapy implementing IP Agent Pool-1 16:51
- 7-7 scrapy implementing IP Agent Pool-2 17:39
- 7-8 scrapy implementing IP Agent Pool-3 18:46
- 7-9 Cloud Coding Implementation Verification Code identification 22:37
- 7-10 cookie disable, auto speed limit, custom spider settings 07:22
8th Chapter Scrapy Advanced Development
- 8-1 Selenium Dynamic Web request and emulation login 21:24
- 8-2 Selenium analog Login micro-blog, analog mouse drop-down 11:06
- 8-3 chromedriver do not load pictures, phantomjs get Dynamic Web pages 09:59
- 8-4 Selenium integrated into the scrapy 19:43
- 8-5 Other Dynamic Web Access technology Introduction-chrome No interface, Scrapy-splash, Selenium-grid, splinter 07:50
- 8-6 scrapy pause and restart 12:58
- 8-7 scrapy url de-weight principle 05:45
- 8-8 scrapy telnet service 07:37
- 8-9 Spider Middleware detailed 15:25
- 8-10 scrapy data collection 13:44
- 8-11 scrapy Signal detailed 13:05
- 8-12 Scrapy Extended Development 13:16
9th Chapter Scrapy-redis Distributed Crawler (actual combat project)
- 9-1 Distributed crawler Essentials 08:39
- 9-2 Redis Basics-1 20:31
- 9-3 Redis Basics-2 15:58
- 9-4 Scrapy-redis Writing distributed crawler code 21:06
- 9-5 scrapy Source Code Analysis-connection.py, defaults.py-11:05
- 9-6 Scrapy-redis Source Analysis-dupefilter.py-05:29
- 9-7 Scrapy-redis Source Analysis-pipelines.py, queue.py-10:41
- 9-8 Scrapy-redis Source Analysis-scheduler.py, spider.py-11:52
- 9-9 integrated bloomfilter to Scrapy-redis 19:30
Python distributed crawler builds search engine--------scrapy implementation