NODEJS Implements crawler crawl data

Source: Internet
Author: User

Before you start, make sure that you have the node. JS environment installed and that you do not have the child shoes installed, please install your own Baidu tutorial ... 1. Install two required dependency packages in the project folder
NPM Install superagent--save-dev

  Superagent is a lightweight, progressive Ajax API with good readability, low learning curve, and internal dependency on Nodejs native request API for NODEJS environments

NPM Install Cheerio--save-dev

  Cheerio is Nodejs's crawl page module, a fast, flexible, implementation-specific jquery core implementation for servers. Suitable for various web crawler programs. Equivalent to jquery in node. js

2. Create a new Crawler.js file

Import Dependency Package Const HTTP       = require ("http"); const path       = require ("path"); const URL        = require ("url"); Const FS         = Require ("FS"); Const SUPERAGENT = require ("superagent"), const cheerio    = require ("Cheerio");

3. Obtaining Boos Direct data

Superagent. Get ("Https://www.zhipin.com/job_detail/?city=100010000&source=10&query=%E5%89%8D%E7%AB%AF")        . End ((error,response) =>{//Get page document data var content = Response.text;        Cheerio is Nodejs. jquery wraps the entire document into a collection, defining a variable $ receive var $ = cheerio.load (content);        Defines an empty array to receive data var result=[];            Analyzing the document structure first gets the contents of each Li re-traversal (at which point each Li holds the data we want to get) $ (". Job-list li Job-primary"). each ((Index,value) =>{            The address and type are displayed on a single line and require a string intercept//Address let address=$ (value). Find (". Info-primary"). Children (). EQ (1). html ();            Type let type=$ (value). Find (". Info-company p"). html ();            Decode Address=unescape (address.replace (/& #x/g, '%u '). Replace (/;/g, ')); Type=unescape (Type.replace (/& #x/g, '%u '). Replace (/;/g, '))//String intercept let Addressarr=address.split ('            <em class= "VLine" ></em> "); Let Typearr=type.split (' <em class= "VLine" ></Em> ');                Adds the obtained data to the array as an object Result.push ({title:$ (value). Find (". Name. Job-title"). Text (), money:$ (value). Find (". Name. Red"). Text (), Address:addressarr, company:$ (value). Find (". Info                -company a "). Text (), Type:typearr, position:$ (value). Find (". Info-publis. Name "). Text (),             tximg:$ (value). Find (". Info-publis img"). attr ("src"), time:$ (value). Find (". Info-publis p"). Text ()            });        Console.log (typeof $ (value). Find (". Info-primary"). Children (). EQ (1). html ());        });        Converts an array into a string result=json.stringify (result); Output the array to the JSON file to refresh the directory to see the current folder more than one Boss.json file (open Boss.json file, Ctrl + A Select all Ctrl+k, and then ctrl+f the JSON file automatically typesetting) Fs.writefile (" Boss.json ", result," Utf-8 ", (Error) =>{//Monitor error, such as normal output, then print null if (error==null) {Console . log ("Congratulations, data crawl success!" Please open the JSON file, CTRL + A, then ctrl+k, and finally ctrl+f the JSON file after formatting (onlyVisual Studio Code Editor) ");    }        });   });

Follow Setaria to finish this article Nodejs crawler believe that we get a new skill, setaria and everyone together refueling, work together!

Acknowledgement: Mrs.zhang

Reprint to: 80204322

NODEJS Implements crawler crawl data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.