Before you start, make sure that you have the node. JS environment installed and that you do not have the child shoes installed, please install your own Baidu tutorial ... 1. Install two required dependency packages in the project folder
NPM Install superagent--save-dev
Superagent is a lightweight, progressive Ajax API with good readability, low learning curve, and internal dependency on Nodejs native request API for NODEJS environments
NPM Install Cheerio--save-dev
Cheerio is Nodejs's crawl page module, a fast, flexible, implementation-specific jquery core implementation for servers. Suitable for various web crawler programs. Equivalent to jquery in node. js
2. Create a new Crawler.js file
Import Dependency Package Const HTTP = require ("http"); const path = require ("path"); const URL = require ("url"); Const FS = Require ("FS"); Const SUPERAGENT = require ("superagent"), const cheerio = require ("Cheerio");
3. Obtaining Boos Direct data
Superagent. Get ("Https://www.zhipin.com/job_detail/?city=100010000&source=10&query=%E5%89%8D%E7%AB%AF") . End ((error,response) =>{//Get page document data var content = Response.text; Cheerio is Nodejs. jquery wraps the entire document into a collection, defining a variable $ receive var $ = cheerio.load (content); Defines an empty array to receive data var result=[]; Analyzing the document structure first gets the contents of each Li re-traversal (at which point each Li holds the data we want to get) $ (". Job-list li Job-primary"). each ((Index,value) =>{ The address and type are displayed on a single line and require a string intercept//Address let address=$ (value). Find (". Info-primary"). Children (). EQ (1). html (); Type let type=$ (value). Find (". Info-company p"). html (); Decode Address=unescape (address.replace (/& #x/g, '%u '). Replace (/;/g, ')); Type=unescape (Type.replace (/& #x/g, '%u '). Replace (/;/g, '))//String intercept let Addressarr=address.split (' <em class= "VLine" ></em> "); Let Typearr=type.split (' <em class= "VLine" ></Em> '); Adds the obtained data to the array as an object Result.push ({title:$ (value). Find (". Name. Job-title"). Text (), money:$ (value). Find (". Name. Red"). Text (), Address:addressarr, company:$ (value). Find (". Info -company a "). Text (), Type:typearr, position:$ (value). Find (". Info-publis. Name "). Text (), tximg:$ (value). Find (". Info-publis img"). attr ("src"), time:$ (value). Find (". Info-publis p"). Text () }); Console.log (typeof $ (value). Find (". Info-primary"). Children (). EQ (1). html ()); }); Converts an array into a string result=json.stringify (result); Output the array to the JSON file to refresh the directory to see the current folder more than one Boss.json file (open Boss.json file, Ctrl + A Select all Ctrl+k, and then ctrl+f the JSON file automatically typesetting) Fs.writefile (" Boss.json ", result," Utf-8 ", (Error) =>{//Monitor error, such as normal output, then print null if (error==null) {Console . log ("Congratulations, data crawl success!" Please open the JSON file, CTRL + A, then ctrl+k, and finally ctrl+f the JSON file after formatting (onlyVisual Studio Code Editor) "); } }); });
Follow Setaria to finish this article Nodejs crawler believe that we get a new skill, setaria and everyone together refueling, work together!
Acknowledgement: Mrs.zhang
Reprint to: 80204322
NODEJS Implements crawler crawl data