Selenium Phantomjs disguised as a Chrome browser by modifying the User-agent logo in the

Source: Internet
Author: User
Tags selenium grid dcap

Article starting personal blog: http://zmister.com/archives/179.html


Python crawler, GUI development, penetration testing, machine learning, all in http://zmister.com/


In the process of writing crawlers, for system environment or efficiency, we often use PHANTOMJS as selenium-manipulated browser webdriver, rather than using Chrome or Firefox webdriver directly, although the latter is more intuitive.

Although many advantages of PHANTOMJS, but also a lot of shortcomings, there is a disadvantage is not known as a disadvantage is that Phantomjs browser logo is "Phantomjs" (Brave to do their own wrong ...) :))

PHANTOMJS's logo is not a problem, but in a growing number of sites constantly upgrading their anti-crawler technology, PHANTOMJS obviously become a "requests" as the target.

As long as the server backstage to identify the visitor's user-agent as PHANTOMJS, it is possible that the server is identified as a reptile behavior, which causes the crawler to fail.

As with modifying the header field in requests to disguise as a browser, we can modify Phantomjs's browser identity to the identity of any browser in selenium. Here's a look at:

PHANTOMJS's browser ID

First look at Phantomjs's browser logo.

Http://service.spiritsoft.cn/ua.html is a Web site that obtains the browser identity user-agent, which displays the identity of the currently used browser:

650) this.width=650; "Src=" http://upload-images.jianshu.io/upload_images/38544-f3a74c04dc02c3d6.png?imageMogr2/ auto-orient/strip%7cimageview2/2/w/1240 "alt=" 1240 "/>

We use Selunium to manipulate PHANTOMJS to access http://service.spiritsoft.cn/ua.html and see the results returned:

# coding:utf-8from selenium import webdriverfrom bs4  IMPORT BEAUTIFULSOUPDEF DEFAULTPHANTOMJS ():    driver =  Webdriver. Phantomjs (executable_path=r "D:\phantomjs.exe")     driver.get (' http://service.spiritsoft.cn /ua.html ')     source = driver.page_source    soup =  beautifulsoup (source, ' lxml ')     user_agent = soup.find_all (' TD ', attrs={' Style ': ' Height:40px;text-align:center;font-size:16px;font-weight:bolder;color:red; '})     for u in user_agent:         Print (U.get_text (). replace (' \ n ', '). Replace ('   ', ') ')     driver.close () 

650) this.width=650; "Src=" http://upload-images.jianshu.io/upload_images/38544-aa4210940841953c.png?imageMogr2/ auto-orient/strip%7cimageview2/2/w/1240 "alt=" 1240 "/>

It is obvious that there are traces of PHANTOMJS.

Next, we modify the browser identity of the PHANTOMJS.

Disguised as Chrome

Introduction of a key module--desiredcapabilities:

From selenium.webdriver.common.desired_capabilities import desiredcapabilities

What is this module for? Let's look at the source code explanation:

Class Desiredcapabilities (object): "" "Set of default supported desired capabilities.  Use the as a starting point to creating a desired capabilities object for requesting remote Webdrivers for connecting To selenium server or selenium grid.

Describes the key-value pairs of a series of encapsulated browser properties, which are roughly used to set webdriverde properties. We use it to set Phantomjs's user-agent.

First, convert desiredcapabilities to a dictionary for easy addition of key-value pairs

Dcap = Dict (DESIREDCAPABILITIES.PHANTOMJS)

Then add a key-value pair that the browser identifies:

dcap[' phantomjs.page.settings.userAgent ' = (' mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/62.0.3202.94 safari/537.36 ')

Finally, set the parameters in the instantiation PHANTOMJS:

Driver = Webdriver. Phantomjs (executable_path=r "D:\phantomjs.exe", desired_capabilities=dcap,service_args=['--ignore-ssl-errors= True '])

The complete code is as follows:

# coding:utf-8from selenium import webdriverfrom bs4 import  Beautifulsoupfrom selenium.webdriver.common.desired_capabilities import desiredcapabilitiesdef  edituseragent ():     dcap = dict (DESIREDCAPABILITIES.PHANTOMJS)      dcap[' phantomjs.page.settings.userAgent '] =  (' mozilla/5.0  (windows nt  10.0; WOW64)  AppleWebKit/537.36  (Khtml, like gecko)  Chrome/62.0.3202.94  safari/537.36 ')     driver = webdriver. Phantomjs (executable_path=r "D:\phantomjs.exe", desired_capabilities=dcap,service_args=['--ignore-ssl-errors= True '])     driver.get (' http://service.spiritsoft.cn/ua.html ')      Source = driver.page_source    soup = beautifulsoup (source,  ' lxml ')     user_agent = soup.find_all (' TD ',  attrs={         ' style ':  ' height:40px;text-align:center;font-size:16px; font-weight:bolder;color:red; '})     for u in user_agent:         Print (U.get_text (). replace (' \ n ',  '). Replace ('   ',  '))     driver.close () if  __name__ ==  ' __main__ ':     edituseragent ()

Let's Run the code:

650) this.width=650; "Src=" http://upload-images.jianshu.io/upload_images/38544-6ee16cd3ffaced98.png?imageMogr2/ auto-orient/strip%7cimageview2/2/w/1240 "alt=" 1240 "/>

The PHANTOMJS was successfully identified for Chrome browser.

Isn't it simple?


This article is from the "Mr. State" blog, reprint please contact the author!

Selenium Phantomjs disguised as a Chrome browser by modifying the User-agent logo in the

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.