Python Basics--the use of exception handling try and some considerations

Source: Internet
Author: User
Tags terminates xpath

Growth on the road, inevitably will be confused, will inevitably be overwhelmed, can do is to have an immortal belief, and all the way to adhere to the end. Don't lose hope, believe it, tomorrow will be better.

It's a little experience of your work this week. On the first day, I picked up a new notebook from the company and then configured the environment. In the afternoon and the next day are familiar with the company itself according to the framework of the Scrapy framework, and then step-by-step debugging to see the program execution sequence.

The next afternoon and the third day, completed a simple crawler, just grabbed an American official website. Robustness is poor ~ ~ ~ When using XPath crawl, some confused. The reason is that the Web site makes the label somewhat confusing. Or you have less experience, and then continue to add some of these aspects of knowledge.

The fourth day until now, has been writing another reptile. This is a bit of a hassle because it's not just the official website of the country, but also the official website of the United States and some European countries. Other countries are relatively better, the general change is not small, but the domestic official website is generally and other countries of the official website of the gap is larger. For single-item fetching, the class method used, due to different countries, need to pass the region parameter. And then according to the different circumstances of processing ~ ~ ~

In fact, these two days in the crawl of the data, the code has been ready. But sometimes the official website will be revised, this will lead to XPath can not be selected to the destination data, can only be re-modified.

Although my level is not high, but that code feel really can't see AH. A little modification ran a bit, and sure enough data to crawl, and then a closer look, several places there are logic errors. So decided to rewrite, according to the company code to compare the process of specification. have been thinking, the code is not important to be able to run correctly. Now found that can be run is the most basic of the program, and the other has more than the important part of the cross!

The following are just a few of them. Description of the item to be crawled (description). The corresponding HTML code is as follows:

<!--  code comparison   --><!-- http://www.hugoboss.com/uk/extra-slim-fit-jacket-%27ryan_cyl% 27-in-a-new-wool-blend/hbeu50275029.html?cgid=21600&dwvar_hbeu50275029_color=410_dark%20blue  -- > 

     Here we need to grab description  and  details. Forget it, just go to the code.

@classmethoddef  fetch_description (Cls, response, region=none, spider=none):      "" "     return the item description, use ' \ R ' divider between peers      because in detail, there is" print "and possible" show more "labels. So remove all text and replace     :p aram response:    :p aram spider:     :return:     "" "    sel = selector (response )     description = None    if region ==  ' CN ':         description_node = sel.xpath ('//div[@id = "LYR1"] [ Contains (@class, "description")]     else:         Description_node = sel.xpath ('//div[contains (@class,  "Product-detail")]//div[@id = "TAB1"])     if description_node:        try:             description =  ' \ R '. Join (Cls.reformat (val)  for val  in description_node.xpath ('.//text () '). Extract ())              print_node = description_node.xpath ('.//*[contains (@class,  "Print-page" )]/text () '). Extract () [0]            if print_ Node:                print_ Node = cls.reformat (Print_node)                  description = description.replace (print_node,  ')              show_more_node = description_node.xpath ( './/*[contains (@class,  "ShowMore")]/text () '). Extract () [0]             if show_more_node:                 show_more_node = cls.reformat (Show_more_node)                  description = description.replace (show_more_ node,  ')         except (typeerror, indexerror):             pass    description =  cls.reformat (description)     return description

Not hard to see, in fact, the code is quite simple. It's just that when you start writing code, you don't think about it all, bugs are everywhere. Then, step-by-step debugging to see if your code executes as expected.

The program is generally implemented: first judge the country, according to the country's different, XPath node selection is different. When the node is present, continue down, because Xpath.extract () returns a list, so when you want to take a value, you need to select the first element using the slice to the list. However, the list may be an empty list, and an indexerror error is reported when performing a [0] operation on an empty list.  So use try ... except ... To catch an exception, the exception that occurs at this point does not need to be processed and is executed directly down the line. The key to the problem is the try ... In the following code block. The previous code was modified three times and is now normal. The earliest code is as follows:

    if description_node:        try:             print_node = description_ Node.xpath ('.//*[contains (@class,  "Print-page")]/text () '). Extract () [0]             show_more_node = description_node.xpath ('.//*[contains (@ class,  "ShowMore")]/text () '). Extract () [0]             description =  ' \ R '. Join (Cls.reformat (val)  for val in description_ Node.xpath ('.//text () '). Extract ())             if  print_node:                 print_node = cls.reformat (Print_node)                  deScription = description.replace (print_node,  ')              if show_more_node:                 show_more_node = cls.reformat (Show_more_node)                  description =  Description.replace (show_more_node,  ")         except (TypeError,  indexerror):             pass

It's easy to see that there are serious problems with this piece of code. When executed into a try code block, the description node exists.

But at this point, if Print_node or Show_more_node's XPath returns null, they are an empty list, and the program terminates execution of the remaining code in the try and goes directly to the except exception handling block. The following modifications are completed:

    if description_node:        try:             description =  ' \ R '. Join ( Cls.reformat (val)  for val in description_node.xpath ('.//text () '). Extract ())              print_node = description_node.xpath ('.// *[contains (@class,  "Print-page")]/text () '). Extract () [0]             show_more_node = description_node.xpath ('.//*[contains (@class,  " ShowMore ")]/text (). Extract () [0]            if  print_node:                 print_node = cls.reformat (Print_node)                  deScription = description.replace (print_node,  ')              if show_more_node:                 show_more_node = cls.reformat (Show_more_node)                  description =  Description.replace (show_more_node,  ")         except (TypeError,  indexerror):             pass

     If there is a  description in the HTML, it will be able to crawl. However, there is ' print ' and   may exist ' show more ' in the code. By performing the discovery of ' print ' the word, sometimes appears and disappears. It felt strange then, and then again, it might be that the HTML code was somewhat different, causing the XPath to extract  print_node. However, using  scrapt shell url, you can find the ' print ' when debugging. Then one-step debugging, found that after the execution to show_more_node , directly into the except  code snippet. Suddenly understand that this description does not ' show more ', the rest of the replacement code, not executed. And then modify the code:

    if description_node:        try:             description =  ' \ R '. Join ( Cls.reformat (val)  for val in description_node.xpath ('.//text () '). Extract ())              print_node = description_node.xpath ('.// *[contains (@class,  "Print-page")]/text () '). Extract () [0]             if print_node:                 print_node = cls.reformat (Print_node)                  description = description.replace ( print_node,  ')             show_more_node  = description_node.xpath ('.//*[contains (@class,  "ShowMore")]/text () '). Extract () [0]             if show_more_node:                 show_more_node = cls.reformat (Show_more_node)                  description =  Description.replace (show_more_node,  ")         except (TypeError,  indexerror):             pass

The code is completely normal at this point.

Another thing to note is the order of the code in try. Because the main purpose of this paragraph is to crawl description, if it exists, the ' print ' node may exist, ' show more ' may exist, but ' print ' must appear in front of ' show more ', so the order needs: Description, Print_node Show_more_node

This, of course, is somewhat related to the writing of the Code. If you use if to determine whether the list returned by the fetch is empty, you do not have to use try exception handling.

Like a little fable story, the final conclusion is:

# when using Try:pass # Be sure to note the order of the face statement # Once an exception occurs, the code terminates execution of the remainder of the code except:pass# so try to use, must be cautious # ' know '-' experience ' or a little distance

Has been tangled, is it prudent to use, or use caution. It seems to be all right, and it seems to be not quite sure ~ haha ~ ~ ~

Python Basics--the use of exception handling try and some considerations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.