Python crawler tool: Selenium usage

Source: Internet
Author: User
Tags xpath

This article and we share the main is python  crawler   Sharp weapon selenium related content, together to see it, hope to you   learn Python crawler helpful. What is selenium  ? In a word, automated testing tools. It supports a variety of browsers, including   Chrome , Safari , Firefox  and other mainstream interface browser, if you install a   in these browsers  Selenium   plug-in, then you can easily implement the  web interface test. In other words, call   Selenium   support these browser drivers. Anyway,, phantomjs  is not a browser, so   Selenium   support? The answer is yes, so they can be seamlessly docked. And then what's the good news? selenium  supports multiple language development, such as   Java , C , Ruby , etc., with   Python   ? That's a must! Oh, this is really great news. Well, so what? Install  Python     Selenium   Library, and then install the   phantomjs&nbsp, and not be able to implement    Python +Selenium + PhantomJS   Seamless Docking!  PhantomJS   used for rendering parsing  JS , Selenium   for driving and docking with  Python  ,  Python   for the post-processing, the perfect Three Musketeers! Someone asked, why not just use a browser and a no-interface  PhantomJS  ? The answer is: high efficiency! selenium   has two versions, the latest version is   2.53.1  ( 2016/3/22 )   SeleniumSelenium 2, aka Webdriver, its main new feature is the integration of Selenium 1.0 and Webdriver (Webdriver used to be Selenium's competitor). That is to say, Selenium 2 is a merger of Selenium and Webdriver two projects, Selenium 2 compatible Selenium, which supports both Selenium API and Webdriver API.View detail Description WebdriverWell, from what we've described above, we should have a general understanding of Selenium, and then let's get into the new world of dynamic crawling. This article is for reference from Selenium official website Seleniumpython document installation first install SeleniumPipInstall Selenium Quick StartInitial ExperienceLet's start with a little example of a Selenium, here we use Chrome browser to test, easy to see the effect, to the real crawl when the change back to PHANTOMJS. From seleniumImportWebdriverbrowser = Webdriver. Chrome () browser.Get('') run this code, will automatically open the browser, and then visit Baidu. If the program executes incorrectly and the browser does not open, then it should not be installed in the Chrome browser or Chrome drive is not configured in the environment variable. Download the driver, and then configure the drive file path in the environment variable. Browser-driven Downloads: example, I am Mac OS, put the downloaded files in the/usr/bin directory is OK.Simulation SubmissionThe following code implements the function of the simulation submit submit search, first wait for the page to load complete, then enter into the search box text, click Submit. From seleniumImportWebdriverfrom Selenium.webdriver.common.keysImportKeysdriver = Webdriver. Chrome () driver.Get("") assert "Python"inchDriver.titleelem = Driver.find_element_by_name ("q") Elem.send_keys ("Pycon") Elem.send_keys (Keys.RETURN) print Driver.page_source is also tested in Chrome, feel it.Example DescriptionWhere the Driver.get method opens the requested URL, Webdriver waits for the page to be fully loaded before it returns, that is, the program waits for all the contents of the page to be loaded and the JS render is completed before proceeding. Note: If there is a lot of Ajax to use here, the program may not know if it is fully loaded. Webdriver offers a number of ways to find web elements, such as find_element_by_*. For example, an input box can be determined by looking up the Name property of the Find_element_by_name method. Then we enter the text and then simulate clicking on the carriage return, just like we hit the keyboard. We can use the Keys class to simulate keyboard input. Finally, the most important point, get the page after the rendering of the source code, output Page_source properties can be. In this way, we can do a dynamic crawl of the Web page.Test CasesWith the above features, of course we can write a test sample.ImportUnitTest fromSeleniumImportWebdriver fromSelenium.webdriver.common.keysImportKeysclassPythonorgsearch(UnitTest. TestCase):defsetUp(self): Self.driver = Webdriver. Chrome ()deftest_search_in_python_org:d river = Self.driverdriver.get ("") self.assertin ("Python", driver.title) Elem = Driver.find_element_by_name ("q") Elem.send_keys ("Pycon") Elem.send_keys (Keys.return)assert"No results found." notinchDriver.page_sourcedefTearDown(self): Self.driver.close ()if__name__ = = "__main__": Unittest.main () Run the program, the same function, we encapsulate it as a test standard class form.Test CasesThe test case is inherited by UnitTest. TestCase class, inheriting this class indicates that this is a test class. The SetUp method is the initialized method, which is automatically called in each test class. Each test method name has a specification and must start with test and will be executed automatically. The final TearDown method is called after each test method ends. This is equivalent to the final destructor method. In this method the Close method is written, and you can write the Quit method. However, the Close method is equivalent to closing the Tab tab, but quit exits the entire browser. When you turn on a tab tab, the entire browser is closed when you close it. Page actionspage InteractionJust crawl the page does not have a lot of eggs, what we really want to do is to do with the page interaction, such as click, input and so on. So the premise is to find the elements in the page. Webdriver provides a variety of ways to find elements. For example, there is a form input box below. <inputtype= "text" name= "passwd" id= "Passwd-id"/> We can get it like this element = driver.find_element_by_id ("Passwd-id") element = Driver.find_element_by_name ("passwd") element = Driver.find_elements_by_tag_name ("input") element = Driver.find_ Element_by_xpath ("//input[@id = ' Passwd-id ']") you can also get it through the text link, but be careful that the text must match exactly, so this is not a good way to match. And when you're using XPath, it's important to note that if more than one element matches the XPath, it will only return the first matching element. If it is not found, then the nosuchelementexception exception is thrown. Once the element has been acquired, the next step is of course to enter the content into the text, you can use the following methodelement. Send_keys ("some text") You can also use the keys class to simulate tapping a button.element. Send_keys ("and some", keys.arrow_down) you can use the Send_keys method on any of the elements that get to the element, just like you click the Send button in GMail. However, the result is that the input text is not automatically erased. So the input text will continue to be entered on the original basis. You can use the following method to clear the contents of the input text.element. Clear () The text that you enter is cleared.fill out the formWe already know how to enter text into a text box, but what about the other form elements? For example, the processing of the drop-down tab can be as follows element = Driver.find_element_by_xpath ("//select[@name = ' name ']") all_options = element.find_elements _by_tag_name ("option") foroptioninchAll_options:print ("Value is:%s"%option. Get_attribute ("value"))option. Click () First gets the first SELECT element, which is the drop-down tab. Each option option is then set on the Select tab in turn. As you can see, this is not a very effective method. In fact, Webdriver provides a method called Select that can help us accomplish these things. ImportSelectselect=Select(Driver.find_element_by_name (' name '))Select. Select_by_index (Index)Select. Select_by_visible_text ("text")Select. Select_by_value (value) as you can see, it is selectable according to the index, can be selected according to the value, can be selected according to the text. is very convenient. What if I cancel all my selections? Very simpleSelect=Select(driver.find_element_by_id (' id '))Select. Deselect_all () This allows you to cancel all selections. In addition, we can get all the selected options by using the following method. select = Select (Driver.find_element_by_xpath ("XPath")) All_selected_options = Select.all_selected_options gets all optional options that areOptions= Select.options If you fill out the form, you'll have to submit the form. How do you submit it? Very simpleDriver. find_element_by_id ("Submit"). Click () This is equivalent to analog click on the Submit button to do the form submission. Of course you can also submit an element individuallyelementThe. Submit () method, Webdriver will look in the form for the form it is in, and if it finds that the element is not surrounded by the form, the program throws a Nosuchelementexception exception.element DragTo complete the drag-and-drop of the element, you first need to specify the dragged element and drag the target element, and then use the Actionchains class to implement it. element = Driver.find_element_by_name ("source") target = Driver.find_element_by_name ("target") from Selenium.webdriverImportActionchainsaction_chains = Actionchains (driver) Action_chains.drag_and_drop (element, target). Perform () This implements the element from The source drag operation to SwitchingA browser will certainly have a lot of windows, so we must have a way to implement the window switch. Here's how to switch windowsDriver. Switch_to_window ("Windowname") in addition you can use the Window_handles method to get the action object for each window. For example forHandleinchDriver.window_handles:driver.switch_to_window (handle) Another way to switch frame is as followsDriver. Switch_to_frame ("FrameName.0.child ") The focus will switch to a frame named child.pop-up window processingWhen you start an event and a popup appears on the page, how do you handle the hint or get a message? Alert = Driver.switch_to_alert () The Popup object can be obtained by the above method.Historical RecordsSo how to manipulate the forward and backward functions of the page?Driver. Forward ()Driver. Back () Well, simple and clear.Cookie ProcessingTo add cookies to the page, use the following # Go to the correct domaindriver.get ("") # now set the cookie. This one's valid for the entire Domaincookie = {' name ': ' foo ', ' value ': ' Bar '}driver.add_cookie (cookie) Get page cookies, usage such as Down # Go to the correct domaindriver.get ("") # and now output all the available cookie for the current Urldriver.get_cookies () above is the processing of cookies, which is also very simple.element SelectionWith regard to the selection of elements, like the following API single element selectionfind_element_by_idFind_element_by_namefind_element_by_xpathfind_element_by_link_textfind_element_by_partial_link_textfind_ Element_by_tag_namefind_element_by_class_namefind_element_by_css_selector multiple Element selectionFind_elements_by_nameFind_elements_by_xpathfind_elements_by_link_textfind_elements_by_partial_link_textfind_elements_by_tag_ Namefind_elements_by_class_namefind_elements_by_css_selector You can also use the by class to determine which selection method fromSelenium.webdriver.common. byImport byDriver.find_element ( by. XPATH, '//button[text () = ' Some text '] ') driver.find_elements ( by. Some properties of the XPATH, '//button ') by class are as followsID= "id" xpath = "XPath" link_text = "link text" partial_link_text = "PARTIAL link text" name = "name" tag_name = "TAG name" CLASS _name = "Class NAME" Css_selector = "CSS SELECTOR" more detailed element selection methods see the Official document Element selection: Wait this is a very important part of the current Web page more and more using Ajax technology, so that the program can not determine when an element is fully loaded. This makes the elements difficult to locate and increases the probability of generating elementnotvisibleexception. So Selenium provides two ways to wait, one for implicit waiting and one for explicit waiting. The implicit wait is to wait for a specific time, and the explicit wait is to specify a condition until the condition is set up to continue execution.Explicit WaitExplicitly wait for a condition to be specified, and then set the maximum wait time. If the element is not found at this time, then an exception is thrown. fromSelenium Import Webdriver fromSelenium.webdriver.common. Import webdriverwait Import Expected_conditions asEcdriver = Webdriver. Chrome () driver.Get("Http://somedomain/url_that_delays_loading")Try: element = webdriverwait (driver, 10).until(Ec.presence_of_element_located ( by. ID, "Mydynamicelement")))finallyThe:d river.quit () program defaults to 500MS calls once to see if the element has been generated, and if the original element is present, it returns immediately. Here are some built-in wait conditions that you can call directly without having to write some wait conditions yourself.Wait ConditionTitle_istitle_containspresence_of_element_locatedvisibility_of_element_locatedvisibility_ofpresence_of_all_ Elements_locatedtext_to_be_present_in_elementtext_to_be_present_in_element_valueframe_to_be_available_and_ Switch_to_itinvisibility_of_element_locatedelement_to_be_clickable–it is Displayed and enabled.staleness_ofelement _to_be_selectedelement_located_to_be_selectedelement_selection_state_to_beelement_located_selection_state_to_ Bealert_is_present fromSelenium.webdriver.supPortImport expected_conditions as Ecwait = webdriverwait (driver, ten) element = Wait.until (ec.element_to_be_clickable ( , ' Someid '))) from import expected_conditions as ecwait = webdriverwait (driver, ten) element = Wait.until (ec.element_to_be_clickable (, ' Someid '))Implicit waitThe implicit wait is simple, which is to simply set a wait time in seconds. fromSeleniumImportWebdriverdriver = Webdriver. Chrome () driver.implicitly_wait () # Secondsdriver.get ("http://somedomain/url_that_delays_loading") Mydynamicelement = driver.find_element_by_id ("Mydynamicelement") of course, if not set, the default wait time is 0. Program framework for page testing and analysis, the official provides a relatively clear code structure, can be consulted. Conclusion above is the basic usage of Selenium, we explain the page interaction, page rendering after the source code. In this way, even if the page is JS rendering, we can also extremely easy. That's how it slips! Source: Network

Python crawler tool: Selenium usage

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.