Create a search engine -------- scrapy implementation using python distributed crawler and scrapy distributed Crawler
I recently learned a scrapy crawler course on the Internet. I think it is not bad. The following is the directory is still being updated. I think it is necessary to take a good note and study it.
Chapter 2 course Introduction
- 1-1 Introduction to creating a search engine using python distributed Crawlers
Chapter 2 build a development environment in windows
- 2-1 install and use pycharm
- 2-2 install and use mysql and navicat
- 2-3 install python2 and python3 at in windows and linux
- 2-4 Virtual Environment installation and configuration 30: 53
Chapter 2 Review of basic crawler knowledge
- 3-1 What can a web crawler do?
- 3-2 regular expression-1
- 3-3 Regular Expression-2
- 3-4 Regular Expression-3 20:16
- 3-5 depth priority and breadth priority principle
- 3-6 url deduplication Method
- 3-7 thoroughly understand unicode and utf8 Encoding
Chapter 2 scrapy crawlers from well-known technical articles
- 4-1 scrapy installation and directory structure Overview
- 4-2 execute the scrapy debugging process of pycharm
- 4-3 xpath usage-recently learned
- 4-4 xpath usage-2
- 4-5 xpath usage-3
- 4-6 css selector for field parsing-1
- 4-7 css selector for field parsing-2
- 4-8 Write all articles about spider crawling jobbole-
- 4-9 compile all articles for spider crawling jobbole-2
- 4-10 items design-1
- 4-11 items design-
- 4-12 items design-3
- 4-13 design and save an item to the json File
- 4-14 use pipeline to save data to mysql-1
- 4-15 use pipeline to save data to mysql-2
- 4-16 scrapy item loader mechanism-1
- 4-17 scrapy item loader mechanism-2 20:31
Chapter 4 scrapy crawls well-known Q & A websites
- 5-1 automatic session and cookie Login Mechanism
- 5-2 requests simulated login knowledge-1
- 5-3 requests simulated login knowledge-2
- 5-4 simulated login by requests-3
- 5-5 scrapy login 20: 46
- 5-6 zhihu analysis and data table Design
- 5-7 zhihu analysis and data table design-2
- 5-8 item loder extraction question-1
- Question-2 extraction using item loder
- Extract question-3 using 5-10 item loder
- 5-11 Implementation of spider crawler logic and answer extraction-
- 5-12 Implementation of spider crawler logic and answer extraction-2
- 5-13 save data to mysql-1
- 5-14 save data to mysql-2
- 5-15 save data to mysql-3
- 5-16 (Supplemental section) Know About The Verification Code logon-1_1
- 5-17 (Supplemental section) Know About The Verification Code logon-2_1
Chapter 4 perform full-site crawling on the recruitment website through CrawlSpider
- 6-1 Data Table Structure Design
- 6-2 analyze CrawlSpider source code-create a crawler and configure settings
- 6-3 CrawlSpider source code analysis
- 6-4 use Rule and LinkExtractor
- 6-5 parsing the position of item loader
- 6-6 position data warehouse receiving-1
- 6-7 position information warehouse receiving-2
Chapter 2 Scrapy breaks through anti-crawler restrictions
- 7-1 crawler and anti-crawler confrontation process and strategy
- 7-2 scrapy architecture source code analysis
- 7-3 Introduction to Requests and Response
- 7-4 Use downloadmiddleware to randomly replace user-agent-1
- 7-5 Use downloadmiddleware to randomly replace user-agent-2
- 7-6 scrapy implement ip proxy pool-1
- 7-7 scrapy implement ip proxy pool-2
- 7-8 scrapy implement ip proxy pool-3
- 7-9 cloud CAPTCHA human bypass for verification code identification
- 7-10 cookie disabling, automatic speed limit, custom spider settings
Chapter 2 scrapy advanced development
- 8-1 selenium Dynamic Webpage request and simulated login knowledge
- 8-2 selenium simulate login to Weibo, simulate mouse drop-down
- 8-3 chromedriver does not load images, phantomjs gets dynamic webpages
- 8-4 integrate selenium into scrapy
- 8-5 Introduction to other dynamic web page retrieval technologies-chrome UI-less running, scrapy-splash, selenium-grid, splinter
- 8-6 pause and restart scrapy
- 8-7 scrapy url deduplication Principle
- 8-8 scrapy telnet Service
- 8-9 spider middleware
- Scrapy signal details
- 8-12 scrapy extension development
Open recently
Chapter 2 scrapy-redis distributed CrawlerOpen recently
Chapter 2 Use of elasticsearchOpen recently
Chapter 4 Deployment of scrapy crawler in scrapydOpen recently
Chapter 2 django build a search websiteOpen recently
Chapter 2 course Summary