About crawlers using Urllib.urlopen to submit default user-agent values

Source: Internet
Author: User

Urllib.request.urlopen (URL) is often used in crawlers to open web pages, such as getting page status return values

The problem is that Urlopen sends the version of Python urllib on the user-agent that is sent on the GET request, looking at the following clutch

Get/xxx.do?p=xxxxxxxx http/1.1accept-encoding:identityhost:xxx.xxx.comconnection:closeuser-agent:python-urllib/ 3.4

  

Take a look at the source

The normal request should be the browser's user-agent

If you use some filtering analysis to user-agent values that contain Python, urllib (requests not tested) can judge the request as a spider crawling data.

Is it possible to use the Panabit Ros 7 layer filter even using the 7 layer Firewall add rule to filter the User-agent value to Python;urllib request and get the sender IP added to the blacklist to prevent acquisition?

How do I modify the user-agent of Urlopen default submission?

Actually, it's simple.

#先创建请求的包头结构req = urllib.request.Request (URL) req.add_hedaer ("User-agent", "mozilla/5.0" (Linux; Android 6.0; Nexus 5 build/mra58n) applewebkit/537.36 (khtml, like Gecko) chrome/57. ") #再openurllib. Request.urlopen (req)

The random user-agent used by the instance should be tested by itself

 

Here are some of the user-agent I've got from the internet and my own bag capture.

UA = ["ucweb7.0.2.37/28/999", "opera/9.80" (Android 2.3.4; Linux; Opera mobi/build-1107180945; U EN-GB) presto/2.8.149 version/11.10 "," opera/8.0 (Windows NT 5.1; U EN) "," openwave/ucweb7.0.2.37/28/999 "," nokia5700/ucweb7.0.2.37/28/999 "," Mqqbrowser/26 mozilla/5.0 (Linux; U Android 2.3.7; ZH-CN; MB200 build/grj22; CyanogenMod-7) applewebkit/533.1 (khtml, like Gecko) version/4.0 Mobile safari/533.1 "," mozilla/5.0 (X11; U Linux x86_64; ZH-CN; rv:1.9.2.10) gecko/20100922 ubuntu/10.10 (Maverick) firefox/3.6.10 "," mozilla/5.0 (X11; Linux x86_64) applewebkit/537.11 (khtml, like Gecko) chrome/23.0.1271.64 safari/537.11 "," mozilla/5.0 (Windows; U Windows NT 6.1; En-US) applewebkit/534.16 (khtml, like Gecko) chrome/10.0.648.133 safari/534.16 "," mozilla/5.0 (Windows NT 6.1; WOW64; trident/7.0; rv:11.0) Like Gecko "," mozilla/5.0 (Windows NT 6.1; WOW64; RV:7.0A1) gecko/20110623 firefox/7.0a1 fennec/7.0a1 "," mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) gecko/20100101 firefox/34.0 "," mozilla/5.0 (WindowsNT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) maxthon/4.4.3.4000 chrome/30.0.1599.101 safari/537.36 "," mozilla/5.0 ( Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/39.0.2171.95 safari/537.36 opr/26.0.1656.60 "," mozilla/5.0 ( Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/39.0.2171.71 safari/537.36 "," mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/38.0.2125.122 ubrowser/4.0.3214.0 safari/537.36 "," Mozilla/5.0 ( Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/30.0.1599.101 safari/537.36 "," mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.1 (khtml, like Gecko) chrome/21.0.1180.71 safari/537.1 lbbrowser "," mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/536.11 (khtml, like Gecko) chrome/20.0.1132.11 taobrowser/2.0 safari/536.11 "," mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/534.57.2 (khtml, like Gecko) version/5.1.7 safari/534.57.2 "," mozilla/5.0 (Windows NT 5.1; U En rv:1.8.1) gecko/20061208 firefox/2.0.0 Opera 9.50 "," mozilla/5.0 (Windows NT 5.1) applewebkit/535.11 (khtml, like Gecko) chrome/17.0.963.8 4 safari/535.11 SE 2.X METASR 1.0 "," mozilla/5.0 (symbianos/9.4; series60/5.0 nokian97-1/20.0.019; profile/midp-2.1 configuration/cldc-1.1) applewebkit/525 (khtml, like Gecko) browserng/7.1.18124 "," Mozilla/5.0 ( symbianos/9.4; series60/5.0 nokian97-1/20.0.019; profile/midp-2.1 configuration/cldc-1.1) applewebkit/525 (khtml, like Gecko) browserng/7.1.1812 "," Mozilla/5.0 (Linux; U Android 6.0.1; ZH-CN; REDMI 3S build/mmb29m) applewebkit/537.36 (khtml, like Gecko) version/4.0 chrome/37.0.0.0 mqqbrowser/7.3 Mobile Safari/ 537.36 "," mozilla/5.0 (Linux; U Android 6.0.1; ZH-CN; REDMI 3S build/mmb29m) applewebkit/533.1 (khtml, like Gecko) Mobile safari/533.1 "," mozilla/5.0 (Linux; U Android 3.0; En-us; Xoom build/hri39) applewebkit/534.13 (khtml, like Gecko) version/4.0 safari/534.13 "," mozilla/5.0 (Linux; U Android 2.3.7; En-us; Nexus one build/frf91) applewebkit/533.1 (khtml,Like Gecko) version/4.0 Mobile safari/533.1 "," mozilla/5.0 (Linux; U Android 2.2.1; ZH-CN; htc_wildfire_a3333 build/frg83d) applewebkit/533.1 (khtml, like Gecko) version/4.0 Mobile safari/533.1 "," Mozilla/5.0 ( Linux; Android 6.0; Nexus 5 build/mra58n) applewebkit/537.36 (khtml, like Gecko) chrome/57. "," mozilla/5.0 (Linux; Android 6.0.1; Redmi 3S build/mmb29m; WV) applewebkit/537.36 (khtml, like Gecko) version/4.0 chrome/53.0.2785.49 Mobile mqqbrowser/6.2 tbs/043124 Safari/ 537.36 micromessenger/6.5.7.1041 nettype/wifi language/zh_cn "," mozilla/5.0 (Linux; Android 6.0.1; REDMI 3S build/mmb29m) applewebkit/537.36 (khtml, like Gecko) chrome/45.0.2454.94 Mobile safari/537.36 "," mozilla/5.0 ( Linux; Android 5.1.1; SAMSUNG sm-n9200 build/lmy47x) applewebkit/537.36 (khtml, like Gecko) samsungbrowser/3.4 chrome/38.0.2125.102 Mobile safari/537.36 "," mozilla/5.0 (Linux; Android 4.4.4; En-us; Nexus 5 build/jop40d) applewebkit/537.36 (khtml, like Gecko) chrome/42.0.2307.2 Mobile safari/537.36 "," mozilla/5.0 (IPOD; U CPU iPhone os 4_3_3 like Mac os X; En-US) applewebkit/533.17.9 (khtml, like Gecko) version/5.0.2 mobile/8j2 safari/6533.18.5 "," mozilla/5.0 (IPhone; U CPU iPhone os 5_1_1 like Mac os X; En-US) applewebkit/534.46 (khtml, like Gecko) version/5.1 mobile/9b206 safari/7534.48.3 xiaomi/miuibrowser/8.7.0 "," mozilla/5.0 (IPhone; U CPU iPhone os 4_3_3 like Mac os X; En-US) applewebkit/533.17.9 (khtml, like Gecko) version/5.0.2 mobile/8j2 safari/6533.18.5 "," mozilla/5.0 (IPhone; U CPU iPhone os 4_0 like Mac os X; En-US) applewebkit/532.9 (khtml, like Gecko) version/4.0.5 mobile/8a293 safari/6531.22.7 "," mozilla/5.0 (IPhone; U CPU iPhone os 3_0 like Mac os X; En-US) applewebkit/420.1 (khtml, like Gecko) version/3.0 mobile/1a542a safari/419.3 "," mozilla/5.0 (IPad; U CPU os 4_3_3 like Mac os X; En-US) applewebkit/533.17.9 (khtml, like Gecko) version/5.0.2 mobile/8j2 safari/6533.18.5 "," mozilla/5.0 (IPad; U CPU os 4_2_1 like Mac os X; ZH-CN) applewebkit/533.17.9 (khtml, like Gecko) Version/5.0.2 mobile/8c148 safari/6533.18.5 "," mozilla/5.0 (IPad; U CPU os 3_2 like Mac os X; En-US) applewebkit/531.21.10 (khtml, like Gecko) version/4.0.4 mobile/7b334b safari/531.21.10 "," mozilla/5.0 (Hp-tablet ; Linux; hpwos/3.0.0; U En-US) applewebkit/534.6 (khtml, like Gecko) wosbrowser/233.70 safari/534.6 touchpad/1.0 "," mozilla/5.0 (compatible; MSIE 9.0; Windows Phone OS 7.5; trident/5.0; iemobile/9.0; HTC; Titan) "," mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; trident/5.0; SLCC2;. NET CLR 2.0.50727;. NET CLR 3.5.30729;. NET CLR 3.0.30729; Media Center PC 6.0;. net4.0c;. net4.0e; qqbrowser/7.0.3698.400) "," mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; trident/5.0; SLCC2;. NET CLR 2.0.50727;. NET CLR 3.5.30729;. NET CLR 3.0.30729; Media Center PC 6.0;. net4.0c;. net4.0e; Lbbrowser) "," mozilla/5.0 (BlackBerry; U BlackBerry 9800; EN) applewebkit/534.1+ (khtml, like Gecko) version/6.0.0.337 Mobile safari/534.1+ "," mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; trident/4.0; SV1; Qqdownload 732;. net4.0c;. net4.0e; SE 2.X METASR 1.0) "," mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Qqdownload 732;. net4.0c;. net4.0e; Lbbrowser) "," mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Qqdownload 732;. net4.0c;. net4.0e) "," mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; EN) Opera 9.50 "," JUC (Linux; U 2.3.7; ZH-CN; MB200; 320*480) ucweb7.9.3.103/139/999 "," dalvik/2.1.0 (Linux; U Android 6.0.1; REDMI 3S miui/7.3.9) "," dalvik/1.6.0 (Linux; U Android 4.4; Nexus 5 build/krt16m) ",]ua = Random.choice (UA) print (UA)

  

Get one at random and you can use it.

About crawlers using Urllib.urlopen to submit default user-agent values

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.