A classmate reflects, spider through post way crawl a site has a problem, always 302 to oneself, specific as follows:
Url:http://www.meituan.com/multiact/default/deal/25814805.html
Post data: "Yui_3_16_0_1_1423700000_000:{\" act\ ": \" deal/dynamiccomponent\ ", \" Args\ ": 25814805,\" __referer\ ": \" \ "}"
With Python, you can crawl the code as follows:
Import Urllibimport urllib2values = { ' yui_3_16_0_1_1423700000_000 ': ' {' act ': ' deal/dynamiccomponent ', ' args ' : 25814805, "__referer": ""} ',}header={ "X-requested-with": "XMLHttpRequest",}url= "http://www.meituan.com/ multiact/default/deal/25814805.html "data = Urllib.urlencode (values) Print Datareq = Urllib2. Request (URL, data,header) response = Urllib2.urlopen (req) the_page = Response.read () print The_page
However, the configuration of the HTTP request packet cannot be crawled, the request packet is as follows:
Post/multiact/default/deal/25814805.html http/1.1^m
Host:www.meituan.com^m
Content-length:126^m
Connection:close^m
Content-type:application/x-www-form-urlencoded^m
user-agent:mozilla/5.0 (Windows NT 5.1; rv:6.0.2) gecko/20100101 firefox/6.0.2^m
Accept-encoding:gzip^m
Accept: */*^m
X-requested-with:xmlhttprequest^m
Crawl failure reason, missing this parameter: content-type:application/x-www-form-urlencoded^m
Plus on it, specifically as follows:
Post/multiact/default/deal/25814805.html http/1.1^m
Host:www.meituan.com^m
Content-length:126^m
Connection:close^m
Content-type:application/x-www-form-urlencoded^m
user-agent:mozilla/5.0 (Windows NT 5.1; rv:6.0.2) gecko/20100101 firefox/6.0.2^m
Accept-encoding:gzip^m
Accept: */*^m
X-requested-with:xmlhttprequest^m
Content-type:application/x-www-form-urlencoded^m
Problems with the Post crawl page