Use Python2 to crawl Youdao translations

Last Update:2018-06-17 Source: Internet

Author: User

Tags urlencode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Crawler Core idea: Simulate the browser normal access to the server, the general situation as long as the browser can access, can crawl, if the anti-crawling, then consider the repeated test to add the request header data, know that can crawl so far.

Anti-crawling ideas are now known: user-agent,cookie,referer, access speed, verification code, user login and front-end JS code verification. This example encounters JS validation user-agent Referer Cookie total of 4 anti-crawl mechanisms.

The key part is the construction of parameter headers and data, headers to be repeated testing, the variables inside the data to find ideas.

Resources:

Using Python to decipher Youdao translation anti-crawler mechanism 75294947

Python cracked netease anti-crawler mechanism 79522067

Some anti-crawl mechanisms 79841901

Youdao translation page, the left input to translate the string, the right side will automatically output the results of the translation, such as

After several input character tests, we found that the page was not refreshed, guess that Ajax might be used, and then grab the packet and discover that the data was actually transmitted using AJAX

The

Code is as follows:

#!/usr/bin/env python#-*-coding:utf-8-*-import urllibimport urllib2import timeimport hashliburl = '/http/ Fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule ' keyword = raw_input (' Please enter a string to translate: ') # Headers function Simulation Browser headers = {# "Accept": "Application/json, Text/javascript, */*; q=0.01 ", #" Connection ":" Keep-alive ", #" Content-type ":" application/x-www-form-urlencoded; Charset=utf-8 "," Cookie ":" Your browser Cookie value "," Referer ":" http://fanyi.youdao.com/"," user-agent ":" Mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/67.0.3396.87 safari/537.36 ", #" X-requested-with ":" XMLHttpRequest ", }salt = str (int (time.time () *1000)) m = hashlib.md5 () str = "Fanyideskweb" + keyword + salt + "Ebsefb%=xz%t[kz" C (sy! ") M.update (str) sign = M.hexdigest (). Encode (' Utf-8 ') the print (sign) # data is the POST request for the database: {"I", keyword, "from": "AUTO", " To ': ' AUTO ', ' smartresult ': ' dict ', ' client ': ' Fanyideskweb ', ' salt ': salt, ' sign ': sign, ' doctype ': ' JSON ', ' Version ":" 2.1 "," Keyfrom ":" Fanyi.web "," Action ":" Fy_by_realtime "," Typoresult ":" false "}# encode post-uploaded data UrlEncode . UrlEncode (data) # Urllib can only accept URLs, cannot create a Request class instance, and cannot set parameter headers, but can encode URLs, and urllib2 cannot encode, so it's often used together # The Urllib2.urlopen (URL) cannot construct a complex request, so use URLLIB2. Request (Url,data=data,headers=headers), 2 is a data parameter when the post submission data, the value of headers to mimic the browser in the request header, in the form of a dictionary, The data that is accepted by the server looks like it is accessed using a browser. Request = Urllib2. Request (url,data=data,headers=headers) response = Urllib2.urlopen (request) print (Response.read ())

Code tests, such as

Use Python2 to crawl Youdao translations

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More