Use Python2 to crawl Youdao translations

Source: Internet
Author: User
Tags urlencode

Crawler Core idea: Simulate the browser normal access to the server, the general situation as long as the browser can access, can crawl, if the anti-crawling, then consider the repeated test to add the request header data, know that can crawl so far.

Anti-crawling ideas are now known: user-agent,cookie,referer, access speed, verification code, user login and front-end JS code verification. This example encounters JS validation user-agent Referer Cookie total of 4 anti-crawl mechanisms.

The key part is the construction of parameter headers and data, headers to be repeated testing, the variables inside the data to find ideas.

Resources:

Using Python to decipher Youdao translation anti-crawler mechanism 75294947

Python cracked netease anti-crawler mechanism 79522067

Some anti-crawl mechanisms 79841901

Youdao translation page, the left input to translate the string, the right side will automatically output the results of the translation, such as

After several input character tests, we found that the page was not refreshed, guess that Ajax might be used, and then grab the packet and discover that the data was actually transmitted using AJAX

The

Code is as follows:

#!/usr/bin/env python#-*-coding:utf-8-*-import urllibimport urllib2import timeimport hashliburl = '/http/ Fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule ' keyword = raw_input (' Please enter a string to translate: ') # Headers function Simulation Browser headers = {# "Accept": "Application/json, Text/javascript, */*; q=0.01 ", #" Connection ":" Keep-alive ", #" Content-type ":" application/x-www-form-urlencoded; Charset=utf-8 "," Cookie ":" Your browser Cookie value "," Referer ":" http://fanyi.youdao.com/"," user-agent ":" Mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/67.0.3396.87 safari/537.36 ", #" X-requested-with ":" XMLHttpRequest ", }salt = str (int (time.time () *1000)) m = hashlib.md5 () str = "Fanyideskweb" + keyword + salt + "Ebsefb%=xz%t[kz" C (sy! ") M.update (str) sign = M.hexdigest (). Encode (' Utf-8 ') the print (sign) # data is the POST request for the database: {"I", keyword, "from": "AUTO", " To ': ' AUTO ', ' smartresult ': ' dict ', ' client ': ' Fanyideskweb ', ' salt ': salt, ' sign ': sign, ' doctype ': ' JSON ', ' Version ":" 2.1 "," Keyfrom ":" Fanyi.web "," Action ":" Fy_by_realtime "," Typoresult ":" false "}# encode post-uploaded data UrlEncode . UrlEncode (data) # Urllib can only accept URLs, cannot create a Request class instance, and cannot set parameter headers, but can encode URLs, and urllib2 cannot encode, so it's often used together # The Urllib2.urlopen (URL) cannot construct a complex request, so use URLLIB2. Request (Url,data=data,headers=headers), 2 is a data parameter when the post submission data, the value of headers to mimic the browser in the request header, in the form of a dictionary, The data that is accepted by the server looks like it is accessed using a browser. Request = Urllib2. Request (url,data=data,headers=headers) response = Urllib2.urlopen (request) print (Response.read ())

Code tests, such as

Use Python2 to crawl Youdao translations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.