Crawling JSON data based on Nodejs simulation browser POST request

Source: Internet
Author: User
Tags http post

Today I want to crawl the background data from a website, encountered a lot of obstacles in the middle, took 2 hours to request data, so I summed up some experience here.

First, put on the request address I crawled http://api.chuchujie.com/api/?v=1.0;

Let's start crawling the data.

I. Write a crawler based on Nodejs.

1. Introduce the required modules

There is a need to introduce the HTTP module (the module that NODEJS uses to send HTTP requests to the browser) and the QueryString module (to convert the parameters of the object in front of the foreground into string form);

  

var http = require ("http"); HTTP request//var HTTPS = require ("https"); HTTPS request var querystring = require ("querystring");

2. Configuring the Http.router (OPTIONS,FN) parameter options

In the configuration, the emphasis is on simulating the browser request header , generally must simulate cookie,user-agent (access device system), Content-type, some need to simulate more. Here, we don't have a cookie, so we don't have to pass it.

  

3. Send the HTTP POST request to the target background to get the data

var req = http.request (options, function (res) {        var json = ""; define JSON variables to receive data from the server        console.log (res.statuscode);        The Res.on method listens to the data to return this process, the "data" parameter represents the number of data received in the process of a little bit back, the chunk represents a data        res.on ("Data", function ( Chunk) {            json + = chunk;//json is a concatenation of data        )        //"End" is the end of the listening data return, callback (JSON) using callback to pass the parameter to the background results and back to the foreground         Res.on ("End", function () {            callback (JSON);        })    })    Req.on ("Error", function () {        console.log (' Error ')    })//This is a style of the foreground parameter, here the parameters param by the backend routing module, While the backend routing module parameters are from the foreground//    var obj = {//        query: ' {' function ': ' Newest ', ' Module ': ' ZDM '} ',//        client: ' {' Gender ' : "0"} ',//        page:1//}    req.write (querystring.stringify (param));//POST request    req.end ();// You have to write,

4. Modular Export

Complete Spider Code

/** * Created by Administrator on 2017/2/12. */var http = require ("http"); HTTP request//var HTTPS = require ("https"); HTTPS request var querystring = require ("querystring"); function request (Path,param,callback) {var options = {Hostn        Ame: ' api.chuchujie.com ', port:80,//port number HTTPS default ports 443, HTTP default port number is Path:path, method: ' POST ', Headers: {"Connection": "Keep-alive", "Content-length": 111, "Content-type": "Appli cation/x-www-form-urlencoded; Charset=utf-8 "," user-agent ":" mozilla/5.0 (Windows NT 6.1;    WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/56.0.2924.87 safari/537.36 "}//forged request Header}; var req = http.request (options, function (res) {var json = "";        Define JSON variables to receive data from the server Console.log (Res.statuscode);            The Res.on method listens to the data to return this process, the "data" parameter represents the number of data received in the process of a little bit back, the chunk represents a data Res.on ("Data", function (chunk) { JSON + = chunk; JSON is stitched together by a piece of data})//"End "is the end of the listener data Return, callback (JSON) using callback to pass the parameter to the background results and return to the foreground res.on (" End ", function () {callback (JSON); })}) Req.on ("Error", function () {console.log (' error ')})//This is a style of the foreground parameter, where the parameters param by the backend routing module, and the backend routing module parameters are        The reception came in//var obj = {///query: ' {' function ': ' Newest ', ' Module ': ' ZDM '} ',//client: ' {' Gender ': ' 0 '} ',// page:1//} req.write (Querystring.stringify (param)); Post request parameter req.end (); Must be written,}module.exports = Request;

  

Crawling JSON data based on Nodejs simulation browser POST request

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.