This article reproduced: http://blog.csdn.net/u011541946/article/details/68485981
Practice Scenario: Some fields on a webpage are of interest to us, and we want to extract them and do other things. However, these fields may be in different places on a Web page. For example, we need to be on the Baidu page-contact us and pick up all the mailboxes.
Idea Split:
1. First, you need to get the source content of the current page, just like, open a page, right-view the page source code.
2. Find out the rules and use regular expressions to extract the matching fields and store them in a dictionary or list.
3. Cycle through the contents of a dictionary or list, implemented in Python with a for statement.
Technical aspects to achieve the relevant methods:
1. View the source code of the page, in selenium Driver.page_source this method gets
2. Using regular in Python, you need to import the RE module
3. For e-mail in emails:
Print Email
1 #Coding=utf-82 3 fromSeleniumImportWebdriver4 ImportRe5 6Driver =Webdriver. Chrome ()7 Driver.maximize_window ()8Driver.implicitly_wait (6) 9 TenDriver.get ("http://home.baidu.com/contact.html") One #get the page source code ADoc =Driver.page_source -Emails = Re.findall (r'[\w][email protected][\w\.-]+', Doc)#use Regular, find [email protected] field, save to emails list - #loop Print matching mailboxes the forEmailinchemails: - Print(email)
Explain:
In the python regular expression syntax, thestring in Python is preceded by R for the native string, and the \w is used to denote the matching alphanumeric and underscore. The FindAll method under the RE module returns a list of matching substrings.
Python+selenium all the mailboxes on the webpage