Set IP
Method 1:
Service_args = [ '--proxy=%s' % ip_html, # proxy Ip:prot (eg:192.168.0.28:808) '--proxy-type=http ', # Proxy type: Http/https '-- Load-images=no ', # close picture loading (optional) '--disk-cache=yes ', # Turn on cache (optional) '--ignore-ssl-errors=true ' # ignore HTTPS error (optional)= webdriver. PHANTOMJS (Service_args=service_args)
Method 2:
Browser=Webdriver. PHANTOMJS (PATH_PHANTOMJS)#using the Desiredcapabilities (proxy settings) parameter value, reopen a SessionID, I think the meaning is equivalent to the browser after emptying the cache, plus the agent to revisit the URL onceproxy=Webdriver. Proxy () Proxy.proxy_type=ProxyType.MANUALproxy.http_proxy='1.9.171.51:800'#Add the proxy settings to the Webdriver. In Desiredcapabilities.phantomjsproxy.add_to_capabilities (Webdriver. DESIREDCAPABILITIES.PHANTOMJS) browser.start_session (webdriver. DESIREDCAPABILITIES.PHANTOMJS) Browser.get ('http://1212.ip138.com/ic.asp')Print('1:', browser.session_id)Print('2:', Browser.page_source)Print('3:', Browser.get_cookies ()) reverts to System Agent#revert to System Agentproxy=Webdriver. Proxy () Proxy.proxy_type=ProxyType.DIRECTproxy.add_to_capabilities (Webdriver. DESIREDCAPABILITIES.PHANTOMJS) browser.start_session (webdriver. DESIREDCAPABILITIES.PHANTOMJS) Browser.get ('http://1212.ip138.com/ic.asp')
Set the request header
Method 1
#-*-coding:utf-8-*- fromSeleniumImportWebdriver fromSelenium.webdriver.common.desired_capabilitiesImportdesiredcapabilities fromSelenium.webdriver.common.proxyImportProxytype desired_capabilities=DesiredCapabilities.PHANTOMJS.copy ()#randomly select a browser header from the User_agents list to disguise the browserdesired_capabilities["phantomjs.page.settings.userAgent"] =(Random.choice (headers.my_headers))#crawl pages much faster without loading picturesdesired_capabilities["phantomjs.page.settings.loadImages"] =False#Open the Phantomjs browser with configuration informationDriver = Webdriver. PHANTOMJS (executable_path=phantomjs_driver,desired_capabilities=desired_capabilities) driver.start_session (desired_capabilities)#implicitly waits 5 seconds, can adjust oneselfDriver.implicitly_wait (5) #set the 10-second page timeout to return, similar to the timeout option for Requests.get (), Driver.get () No timeout option#previously encountered Driver.get (URL) has not returned, but also do not error problems, then the program will be stuck, set the timeout option to solve the problem. Driver.set_page_load_timeout (20) #set the 10-second script time-outDriver.set_script_timeout (20)
Method 2
fromSelenium.webdriver.common.desired_capabilitiesImportdesiredcapabilities fromSeleniumImportWebdriver#set the request headerUser_agent = ( "mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)"+"applewebkit/537.36 (khtml, like Gecko) chrome/29.0.1547.57 safari/537.36") Dcap=dict (DESIREDCAPABILITIES.PHANTOMJS) dcap["phantomjs.page.settings.userAgent"] =User_agentdriver= Webdriver. PHANTOMJS (executable_path=r"/home/zhou/phantomjs-2.1.1-linux-x86_64/bin/phantomjs", Desired_capabilities=dcap)
PHANTOMJS and Selenium set proxy, headers