First paragraph to find out the small video:
Https://www.iesdouyin.com/share/video/6550631947750608142/?region=CN&mid=6550632036246555405&titleType =title×tamp=1525407578&utm_campaign=client_share&app=aweme&utm_medium=ios&iid= 30176260384&utm_source=qq&tt_from=mobile_qq&utm_source=mobile_qq&utm_medium=aweme_ios&utm_ campaign=client_share&uid=92735989673&did=30176260384
The teaching video address of this article:
Https://v.qq.com/x/page/b0643tut4ze.html
Preface
This meow recently work needs to use node, and also want to be promoted to a full stack of engineers, so began the node learning journey, in the learning process, I will summarize some practical examples, to make blog and video tutorials, as an example to understand the use of node, so with the kitten from the shallow and deep learning node! The recent will be some basic articles, mainly used to understand node's various functions, is very suitable for node to understand but did not develop node-based front-end engineers, and other basic mastery, follow-up will be advanced exploration and summary yo This article will crawl Baidu search results in the search for the keyword related to the example, The church makes the simplest reptile with Nodejs:
The node modules and attributes to be used are described:
Request:
For sending page requests, fetching page code get requests
Cheerio:Cheerio is a subset of the jquery core that implements the browser-independent DOM manipulation API in jquery core: In this example, you will use the
Loadmethod, the following is a simple example:
Express:
Based on the node. JS platform, Fast, open, minimalist web development framework, mainly used to do simple routing functions, do not do a detailed introduction, mainly using get, specifically can refer to the official website.
Specific implementation:1. First, we want to use express to build a simple node service using the command line to run node Demo.js, and Access Localhost:3000/key in the browser to run the result of 2. Use the request to implement page fetching function
Use the command line to run node Demo.js and access the Localhost:3000/key in the browser to run the result as
3. Use Cheerio to parse the page code into jquery format and find the crawled content location with jquery syntax so that the crawler is implemented!
Run node demo.js using the command line and access the Localhost:3000/index in the browser The results are tips: Some sites are not utf-8 encoding mode, At this time can use Iconv-lite to remove the gb2312 garbled problem of course, each site has anti-crawler function, you can study how to simulate a normal user to circumvent some of the problems (Baidu's Chinese search will also be blocked) This article is just a primer, post-order has the opportunity to discuss with you in detail the advanced version Thank you for your attention.
Video Blog Combined Tutorial: using Nodejs to implement simple crawlers