When using Python regular expressions, you need to pay attention to many issues. For Python regular expressions, such as continuous learning, can be better solved. Next we will look at how we can better learn. I hope you will have some gains.
Java code
- # Encoding = UTF-8
- '''
- Python learn regular express
- Url: http://docs.python.org/library/re.html
- Parse html url: http://www.boddie.org.uk/python/HTML.html
- Author: liuzheng
- '''
- Import re
- Import urllib
- # Analyze the javaeye blog Channel
- Class ParseHTML:
- '''
- Parse html for infomation
- Parse javeeye page
- '''
- Def _ init _ (self, url ):
- Self. url = url
- Pass
- # Analyses html
- Def parse (self ):
- Sock = urllib. urlopen (self. url)
- Html = sock. read ()
- Self. _ puts (html)
- Pass
- # Print html matching data
- Def _ puts (self, html ):
- B = re. compile (r "<a href = '([\ w./: \] + ?) '[\ S] * title = ([^ <>] + ?)
[\ S] * target = ([^ <>] +?)> ([^ <>] + ?) </A> ", re. I)
- M = re. findall (B, html)
- # Is there an encode problem ?, I don't know. Can you help me?
- Print m
- If _ name _ = '_ main __':
- Url = "http://www.javaeye.com/blogs"
- P = ParseHTML (url)
- P. parse ()
- If _ debug __:
- Print "debuging is % s" % _ debug __
- Print "regular" + "*" * 30
- # Math
- Str = "800-820-8800"
- M = re. match (r "(\ d {3})-(\ d {3})-(\ d {4})", str)
- Print "result:", m. groups ()
- # Split
- Print "split: % s" % re. split ('\ W', 'words, Words, words .')
- # Findall
- Text = "He was carefully disguised but captured quickly
By police ."
- Print "findall: % s" % re. findall (r "\ w + ly", text)
- # Sub
- Text = "hello world! "
- Print "sub: % s" % re. sub (r "\ s +", "--", text)
The above is a detailed introduction to Python regular expressions.