Python real-time tagging project exercise notes

Last Update:2016-06-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is the practice behind the basic Python tutorial, which is written to familiarize yourself with Python's code, and practice using basic and non-basic syntax in Python to make perfect.

The project was simple at first, but after refactoring it was a bit more complicated, but more flexible.

According to the book, the reconstructed program is divided into four modules: The handler module, the filter module, the rule (which should actually be the processing rule), and the parser.

First, the handler module, which has two functions, one is to provide the output of those fixed HTML tags (each tag has start and end), and the other is to provide a friendly interface for the start and end of the markup output. Take a look at the program handlers.py:

The code is as follows:

Class Handler:
'''
'''
Def callback (self, prefix, name, *args):
method = GetAttr (Self,prefix+name,none)
If callable (method): Return method (*args)
def start (self, name):
Self.callback (' Start_ ', name)
Def end (self, name):
Self.callback (' End_ ', name)
def sub (self, name):
def substitution (match):
result = Self.callback (' sub_ ', name, match)
If result is None:match.group (0)
return result
return substitution

Class Htmlrenderer (Handler):
'''

'''
def start_document (self):
print '...'
def end_document (self):
print '
def start_paragraph (self):
print '

'
def end_paragraph (self):
print '

'
def start_heading (self):
print '

'
def end_heading (self):
print '

'
def start_list (self):
print '

'
def start_listitem (self):
print '

'
def end_listitem (self):
print '

'
def start_title (self):
print '

'
def end_title (self):
print '

'
def sub_emphasis (self, Match):
Return ' %s'% Match.group (1)
def sub_url (self, Match):
Return '%s '% (Match.group (1), Match.group (1))
def sub_mail (self, Match):
Return '%s '% (Match.group (1), Match.group (1))
Def feed (self, data):
Print data

This program is the cornerstone of the whole "project": it provides the output of the label, and the substitution of the string. It's easier to understand.

Then look at the second module "Filter", this module is more simple, is actually a regular expression string. The relevant code is as follows:

The code is as follows:

Self.addfilter (R ' \* (. +?) \* ', ' emphasis ')
Self.addfilter (R ' (http://[\.a-z0-9a-z/]+) ', ' url ')
Self.addfilter (R ' ([\.a-za-z]+@[\.a-za-z]+[a-za-z]+) ', ' Mail ')

This is the three filters, respectively: The emphasis card filter (marked with x), the URL tag filter, the email tag filter. Students who are familiar with regular expressions have no pressure to understand them.

And then look at the third module "Rules", this module, aside from that grandfather class does not say, other classes should have two methods are condition and action, the former is used to determine whether the read in the string is not in accordance with the rules of the House, the latter is used to perform operations, the so-called "handler module ", output front label, content, post label. Look at the code of this module, in fact, the inside of a few classes of the relationship, drawing into the class diagram will be more clear. rules.py:

The code is as follows:

Class Rule:
def action (self, block, handler):
Handler.start (Self.type)
Handler.feed (Block)
Handler.end (Self.type)
Return True

Class Headingrule (Rule):
Type = ' Heading '
def condition (self, Block):
return not ' \ n ' in Block and Len (block) <= + not block[-1] = = ': '

Class Titlerule (Headingrule):
Type = ' title '
First = True

def condition (self, Block):
If not Self.first:return False
Self.first = False
Return Headingrule.condition (self, block)

Class Listitemrule (Rule):
Type = ' ListItem '
def condition (self, Block):
return block[0] = = '-'
def action (Self,block,handler):
Handler.start (Self.type)
Handler.feed (Block[1:].strip ())
Handler.end (Self.type)
Return True

Class Listrule (Listitemrule):
Type = ' List '
Inside = False
def condition (self, Block):
Return True
def action (Self,block, Handler):
If not self.inside and Listitemrule.condition (self,block):
Handler.start (Self.type)
Self.inside = True
Elif Self.inside and not Listitemrule.condition (Self,block):
Handler.end (Self.type)
Self.inside = False
Return False

Class Paragraphrule (Rule):
Type = ' paragraph '
def condition (self, Block):
Return True

Supplemental utils.py:

The code is as follows:

def line (file):
For line in File:yield line
Yield ' \ n '

def blocks (file):
block = []
For line in lines (file):
If Line.strip ():
Block.append (line)
Elif BLOCK:
Yield '. Join (block). Strip ()
block = []

Finally, a grand look at the "parser module," The role of this module is to coordinate the reading of the text and other modules of the relationship. In the point of emphasis, there are two lists of "rules" and "filters", and the advantage of this is that the flexibility of the entire program is greatly improved, so that the rules and filters become hot-swappable, And, of course, this is due to the previous rules and filters. Each type of rule (filter) is written in a separate class instead of the IF. else to differentiate. Look at the code:

The code is as follows:

Import SYS, RE
From handlers Import *
From util import *
From rules Import *

Class Parser:
def __init__ (Self,handler):
Self.handler = Handler
Self.rules = []
Self.filters = []

def addRule (self, rule):
Self.rules.append (rule)

def addfilter (self,pattern,name):
DEF filter (block, handler):
return re.sub (Pattern, handler.sub (name), block)
Self.filters.append (Filter)

Def parse (self, file):
Self.handler.start (' document ')
For block in blocks (file):
For filter in Self.filters:
Block = Filter (block, Self.handler)
For rule in Self.rules:
If Rule.condition (block):
Last = rule.action (block, Self.handler)
If Last:break
Self.handler.end (' document ')

Class Basictextparser (Parser):
def __init__ (Self,handler):
parser.__init__ (Self,handler)
Self.addrule (Listrule ())
Self.addrule (Listitemrule ())
Self.addrule (Titlerule ())
Self.addrule (Headingrule ())
Self.addrule (Paragraphrule ())

Self.addfilter (R ' \* (. +?) \* ', ' emphasis ')
Self.addfilter (R ' (http://[\.a-z0-9a-z/]+) ', ' url ')
Self.addfilter (R ' ([\.a-za-z]+@[\.a-za-z]+[a-za-z]+) ', ' Mail ')

Handler = Htmlrenderer ()
Parser = Basictextparser (handler)

Parser.parse (Sys.stdin)

The idea in this module is to traverse the client (that is, the entrance to the program execution) to all the rules and filters that are plugged in, to process the read-in text.

There is a detail of the place to say, in fact, and the previous write echoes, that is, when traversing the rules by calling condition this thing to determine whether the current rule is met.

I think this program is very much like the command line mode, there is time to review the mode, in order to maintain the memory network node robustness.

Finally, what I thought was the purpose of this program:

1, used to do code highlighting analysis, if rewritten into JS version, you can do an online code editor.
2, can be used to learn, for me to write blog use.

There are other ideas that can leave your insights.
It is simple to add a class diagram, but it should be able to illustrate the relationship between the two. In addition, I suggest that if you look at the code to clear the relationship, you better draw your own drawing to familiarize yourself with the whole structure.



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python real-time tagging project exercise notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python real-time tagging project exercise notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support