Topic:
Problem Solving Ideas:
The topic has been made clear that the characters may be in the source code of the Web page. Right-click on the page source code and find a section: Find rare characters in the mess below. Some people directly copy the long section below and then deal with it. I think it's a little rough and simple. My workaround is to crawl the Web page with Urllib2 and then get the text and process it through regular expressions.
Implementation method:
Import Urllib2import rereq = Urllib2.urlopen (' http://www.pythonchallenge.com/pc/def/ocr.html ') res = Req.read () mess = ' '. Join (Re.findall ('--) (. *)--", Res,re. S)) chars = '. Join (Re.findall (R ' [a-z]|[ a-z]| [0-9] ', mess)) print chars
Method Explanation:
- Urllib2 a simple Urllib2.urlopen (URL). read () to get the content of the Web page.
- In order to get the text to be processed, the crawled Web page content needs to be processed through regular expressions. For the handling of line breaks, here is a very simple way to add re in the FindAll method. s parameter, which will make '. ' can match any character including newline characters. If there is no re. The s parameter, '. ' will match any character that does not include a line break.
- The FindAll method returns a list containing the characters that match to, and for the next step, add the elements from the list to a blank string by using the. Join method. ". Join means there are no separate symbols between the elements, '. '. Join indicates that the string is joined by a. Delimited, "can be any symbol."
- Finally, match the uppercase and lowercase letters and numbers in the string. I just matched [a-z] at first, matching all lowercase letters. Although the result is the same, but the topic does not say characters is uppercase or lowercase numbers, so adding [a-z] and [0-9] will be more rigorous.
Output:
Equality
Replace the OCR in the URL with equality to enter the next level.
Pythonchallenge 2: Reptiles and regular expressions