Python analog Landing Universal method

Source: Internet
Author: User
Tags uppercase letter xpath

This article turns from: https://zhuanlan.zhihu.com/p/28587931 transcription just for the convenience of learning, thanks to his sharing

Python simulation Landing let a lot of people hurt their brains, today on a universal landing method. You don't have to be proficient in HTML, and you don't even have to be proficient in python, but you can successfully perform a mock landing. This article is about a method of landing all sites, not limited to Tumblr and knowledge, only as an example to explain.

The libraries used are "selenium" and "requests". Through the selenium to simulate the landing, and then the cookies into the requests, and eventually use requests to crawl the site. The advantage is not only to avoid the "selenium" its own slow crawl problem (because it is only used as landing), but also to avoid the use of requests landing need to make cumbersome cookies process (because it is from selenium directly to the cookie). The steps and code are listed earlier in this article, supplemented by examples of landing Weibo and knowledge.

At the end of the article, a lazy man's method is given. A friend who wants to take a shortcut directly sees the fourth part of the landing. This method is applicable to all Web sites and is used only as an example to facilitate interpretation.

------------Start---------

Required materials: 1. Webdriver (must) 2.Anaconda (optional). Selenium is run with a browser, so you need to download a small browser. Anaconda recommend you also go to download a, it contains a lot of Python library, easy to use, and free! Links: 1. Google Web driver download 2. Anaconda Download

Part one: Landing with selenium

Import Selenium Library

from selenium import webdriver

Explicitly simulate where your browser is stored in your computer, such as I have a D drive

chromePath = r‘D:\Python Program\chromedriver.exe‘ 

Use the selenium webdriver equation to indicate the path of the browser and open a browser. Analog browsers are available in a variety of options, such as Firefox, Safari. This time with Google's simulation browser. Note: '. Chome ' is an uppercase letter.

wd = webdriver.Chrome(executable_path= chromePath)

Let Webdriver fill in the user name and password for you

wd.find_element_by_xpath(‘用户名选项卡位置‘).send_keys(‘用户名‘)wd.find_element_by_xpath(‘密码选项卡位置‘).send_keys(‘密码‘)

Let webdrive click Login, if the button is selected with Click (), if the form is selected submit ().

wd.find_element_by_xpath(‘登陆按钮所在位置‘).click() #若是按钮wd.find_element_by_xpath(‘登陆按钮所在位置‘).submit() #若是表单

Login complete, all cookies are now in the ' WD ' and can be called at any time.

Part II: Passing selenium cookies into the requests

Import the Requests library and build the session ()

import reqeustsreq = requests.Session()

Bring in cookies from ' WD '

cookies = wd.get_cookies()

Convert cookies in the form of selenium to requests available cookies.

for cookie in cookies:        req.cookies.set(cookie[‘name‘],cookie[‘value‘])

Done! Try to crawl the page with requests.

req.get(‘待测试的链接‘)

The above is the Python analog landing of the universal method, you do not have to analyze the website of the cookies. Just tell Python where to fill in the username and password. It's very convenient.

Part III: Weibo demo landing
ImportRequestsFromSeleniumImportWebdriverChromepath=R ' Browser storage location 'Wd=Webdriver.Chrome(Executable_path=Chromepath)#构建浏览器Loginurl=' Http://www.weibo.com/login.php 'Wd.Get(Loginurl)#进入登陆界面Wd.Find_element_by_xpath('//*[@id = "LoginName"] ').Send_keys(' Userword ')#输入用户名Wd.Find_element_by_xpath('//*[@id = ' pl_login_form ']/div/div[3]/div[2]/div/input ').Send_keys(' Password ')#输入密码Wd.Find_element_by_xpath('//*[@id = ' pl_login_form ']/div/div[3]/div[6]/a ').Click()#点击登陆Req=Requests.Session() #构建Session cookies = wd. Get_cookies ()  #导出cookie for cookie in cookies: req. Cookies. Set (cookie[ ' name '  cookie[ ' value ' ]  #转换cookies span class= "n" >test = req. Get ( "links to be tested" )        

Explain the following key steps:

1. Find a location. It is recommended to use Google Chrome to find the XPath for each element, see this: Get an XPath path from Chrome.

2. Select the click Function or the Submit function. Recommend each try, there will always be a success.

3. What happens when I log in to a micro-blog and ask for a verification code? Sometimes the login micro-blog will be asked to enter a verification code, this time we can add a line of code to manually enter the verification code. For example:

wd.find_element_by_xpath(‘//*[@id="pl_login_form"]/div/div[3]/div[6]/a‘).click() #点击登陆wd.find_element_by_xpath(‘//*[@id="pl_login_form"]/div/div[3]/div[3]/div/input‘).send_keys(input("输入验证码: "))wd.find_element_by_xpath(‘//*[@id="pl_login_form"]/div/div[3]/div[6]/a‘).click()#再次点击登陆

Enter the verification code when you need to click two times to login. Because the verification code of the input box only after a click on the landing will not pop out! It is very important to apply selenium flexibly according to the different of each website! But this is too small for the analysis of those cookies.

Fourth: The simulation of the landing

It is often updated, so even if the method is written, it may not work well. So I thought of an ultimate method of semi-manual landing. Only use selenium to open a browser, and then manually enter the account password, a verification code to fill in the verification code. Wait until the successful login and use the "get_cookies ()" function to bring up its cookies. This method looks stupid, but the efficiency is much higher than the hundreds of lines of code! And you can also use mobile phone scan QR code landing! As long as these landings are performed within the browser opened by selenium, selenium can record the cookies completely. The code is as follows:

import timeimport requestsfrom selenium import webdriverchromePath = r‘浏览器储存的位置‘wd = webdriver.Chrome(executable_path= chromePath) time.sleep(45)#设定45秒睡眠,期间进行手动登陆。十分关键,下面有解释。cookies = wd.get_cookies()#调出Cookiesreq = requests.Session()for cookie in cookies:    

Req.headers.clear () is the removal of information from the original req that is tagged with a python robot. This information is captured by some websites, such as the one you know. Causes a login crawl to fail. Be sure to delete!

Time.sleep () can pause execution of the following program. During this time you can manually log in, scan the QR code and so on. Then let Python execute the following "cookie = Wd.get_cookies ()" after 45 seconds. Selenium's get.cookies equation captures cookies that you have manually logged in. The time value is set according to the time you need. If you have the site name, user name, password, and so on in the program is left a verification code needs to be manually, only a few seconds to set! The advantage of joining Time.sleep is that the program itself does not need to stop execution! All of the following programs can be seamlessly linked.

Thank you for reading this, the article originally said the lazy method is I landed know how to use this method, semi-manual. But do not think it is bad, after all, our goal is to crawl the content of the site, as soon as possible to solve the landing problem. Starting to crawl is the right direction. This method can help you to quickly login to the site, save a lot of time. The principle of this method is that it invokes the real browser. So as long as under normal circumstances the browser can access the site can be logged in this way.

-------------------------------------------

Text End-Below is a collection of frequently asked questions and a code giveaway

1: What if the website disables selenium?

Solution: This is rarely the case. If you use this anti-crawler method, it is easy to accidentally hurt the real users. If this is the case, you just need to hide the information in the selenium that shows you are a robot. Reference Link: Can A website detect when is the using selenium with Chromedriver?

2: How do I get the newly opened webdriver with cookies that have been saved?

Solution: Save the acquired cookies locally. The next time you log in, import the local cookies directly. Reference links: How to save and load cookies using Python selenium webdriver

Friendship Gift Write Good login code-know
FromSeleniumImportWebdriverFromRequestsImportSessionFromTimeImportSleepReq=Session()Req.Headers.Clear()Chromepath=R ' D:\Python Program\chromedriver.exe 'Wd=Webdriver.Chrome(Executable_path=Chromepath)Zhihuloginurl=' Https://www.zhihu.com/signin 'Wd.Get(Zhihuloginurl)Wd.Find_element_by_xpath('/html/body/div[1]/div/div[2]/div[2]/div[1]/div[1]/div[2]/span ').Click()Wd.Find_element_by_xpath('/html/body/div[1]/div/div[2]/div[2]/form/div[1]/div[1]/input ').Send_keys(' Username ')Wd.Find_element_by_xpath('/html/body/div[1]/div/div[2]/div[2]/form/div[1]/div[2]/input ').Send_keys(' Password ')Sleep(10)#手动输入验证码Wd.Find_element_by_xpath(/html/body/div[1]/div/div[2]/div[2]/form/div[2]/button ' ) .< span class= "n" >submit () sleep (10 )  #等待Cookies加载 cookies = wdget_cookies () for cookie in cookies: req. Cookies. Set (cookie[ ' name '  cookie[ ' value ' ]       

Python Analog Login Universal

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.