Python real-time tagging project exercise notes

Source: Internet
Author: User
This is the practice behind the basic Python tutorial, which is written to familiarize yourself with Python's code, and practice using basic and non-basic syntax in Python to make perfect.

The project was simple at first, but after refactoring it was a bit more complicated, but more flexible.

According to the book, the reconstructed program is divided into four modules: The handler module, the filter module, the rule (which should actually be the processing rule), and the parser.

First, the handler module, which has two functions, one is to provide the output of those fixed HTML tags (each tag has start and end), and the other is to provide a friendly interface for the start and end of the markup output. Take a look at the program handlers.py:

The code is as follows:


Class Handler:
'''
'''
Def callback (self, prefix, name, *args):
method = GetAttr (Self,prefix+name,none)
If callable (method): Return method (*args)
def start (self, name):
Self.callback (' Start_ ', name)
Def end (self, name):
Self.callback (' End_ ', name)
def sub (self, name):
def substitution (match):
result = Self.callback (' sub_ ', name, match)
If result is None:match.group (0)
return result
return substitution

Class Htmlrenderer (Handler):
'''

'''
def start_document (self):
print '...'
def end_document (self):
print '
def start_paragraph (self):
print '

'
def end_paragraph (self):
print '

'
def start_heading (self):
print '

'
def end_heading (self):
print '

'
def start_list (self):
print '
      '
      def end_list (self):
      print '
'
def start_listitem (self):
print '
  • '
    def end_listitem (self):
    print '
  • '
    def start_title (self):
    print '

    '
    def end_title (self):
    print '

    '
    def sub_emphasis (self, Match):
    Return ' %s'% Match.group (1)
    def sub_url (self, Match):
    Return '%s '% (Match.group (1), Match.group (1))
    def sub_mail (self, Match):
    Return '%s '% (Match.group (1), Match.group (1))
    Def feed (self, data):
    Print data

    This program is the cornerstone of the whole "project": it provides the output of the label, and the substitution of the string. It's easier to understand.

    Then look at the second module "Filter", this module is more simple, is actually a regular expression string. The relevant code is as follows:

    The code is as follows:


    Self.addfilter (R ' \* (. +?) \* ', ' emphasis ')
    Self.addfilter (R ' (http://[\.a-z0-9a-z/]+) ', ' url ')
    Self.addfilter (R ' ([\.a-za-z]+@[\.a-za-z]+[a-za-z]+) ', ' Mail ')

    This is the three filters, respectively: The emphasis card filter (marked with x), the URL tag filter, the email tag filter. Students who are familiar with regular expressions have no pressure to understand them.

    And then look at the third module "Rules", this module, aside from that grandfather class does not say, other classes should have two methods are condition and action, the former is used to determine whether the read in the string is not in accordance with the rules of the House, the latter is used to perform operations, the so-called "handler module ", output front label, content, post label. Look at the code of this module, in fact, the inside of a few classes of the relationship, drawing into the class diagram will be more clear. rules.py:

    The code is as follows:


    Class Rule:
    def action (self, block, handler):
    Handler.start (Self.type)
    Handler.feed (Block)
    Handler.end (Self.type)
    Return True

    Class Headingrule (Rule):
    Type = ' Heading '
    def condition (self, Block):
    return not ' \ n ' in Block and Len (block) <= + not block[-1] = = ': '

    Class Titlerule (Headingrule):
    Type = ' title '
    First = True

    def condition (self, Block):
    If not Self.first:return False
    Self.first = False
    Return Headingrule.condition (self, block)

    Class Listitemrule (Rule):
    Type = ' ListItem '
    def condition (self, Block):
    return block[0] = = '-'
    def action (Self,block,handler):
    Handler.start (Self.type)
    Handler.feed (Block[1:].strip ())
    Handler.end (Self.type)
    Return True

    Class Listrule (Listitemrule):
    Type = ' List '
    Inside = False
    def condition (self, Block):
    Return True
    def action (Self,block, Handler):
    If not self.inside and Listitemrule.condition (self,block):
    Handler.start (Self.type)
    Self.inside = True
    Elif Self.inside and not Listitemrule.condition (Self,block):
    Handler.end (Self.type)
    Self.inside = False
    Return False

    Class Paragraphrule (Rule):
    Type = ' paragraph '
    def condition (self, Block):
    Return True

    Supplemental utils.py:

    The code is as follows:


    def line (file):
    For line in File:yield line
    Yield ' \ n '

    def blocks (file):
    block = []
    For line in lines (file):
    If Line.strip ():
    Block.append (line)
    Elif BLOCK:
    Yield '. Join (block). Strip ()
    block = []

    Finally, a grand look at the "parser module," The role of this module is to coordinate the reading of the text and other modules of the relationship. In the point of emphasis, there are two lists of "rules" and "filters", and the advantage of this is that the flexibility of the entire program is greatly improved, so that the rules and filters become hot-swappable, And, of course, this is due to the previous rules and filters. Each type of rule (filter) is written in a separate class instead of the IF. else to differentiate. Look at the code:

    The code is as follows:


    Import SYS, RE
    From handlers Import *
    From util import *
    From rules Import *

    Class Parser:
    def __init__ (Self,handler):
    Self.handler = Handler
    Self.rules = []
    Self.filters = []

    def addRule (self, rule):
    Self.rules.append (rule)

    def addfilter (self,pattern,name):
    DEF filter (block, handler):
    return re.sub (Pattern, handler.sub (name), block)
    Self.filters.append (Filter)

    Def parse (self, file):
    Self.handler.start (' document ')
    For block in blocks (file):
    For filter in Self.filters:
    Block = Filter (block, Self.handler)
    For rule in Self.rules:
    If Rule.condition (block):
    Last = rule.action (block, Self.handler)
    If Last:break
    Self.handler.end (' document ')

    Class Basictextparser (Parser):
    def __init__ (Self,handler):
    parser.__init__ (Self,handler)
    Self.addrule (Listrule ())
    Self.addrule (Listitemrule ())
    Self.addrule (Titlerule ())
    Self.addrule (Headingrule ())
    Self.addrule (Paragraphrule ())

    Self.addfilter (R ' \* (. +?) \* ', ' emphasis ')
    Self.addfilter (R ' (http://[\.a-z0-9a-z/]+) ', ' url ')
    Self.addfilter (R ' ([\.a-za-z]+@[\.a-za-z]+[a-za-z]+) ', ' Mail ')

    Handler = Htmlrenderer ()
    Parser = Basictextparser (handler)

    Parser.parse (Sys.stdin)

    The idea in this module is to traverse the client (that is, the entrance to the program execution) to all the rules and filters that are plugged in, to process the read-in text.

    There is a detail of the place to say, in fact, and the previous write echoes, that is, when traversing the rules by calling condition this thing to determine whether the current rule is met.

    I think this program is very much like the command line mode, there is time to review the mode, in order to maintain the memory network node robustness.

    Finally, what I thought was the purpose of this program:

    1, used to do code highlighting analysis, if rewritten into JS version, you can do an online code editor.
    2, can be used to learn, for me to write blog use.

    There are other ideas that can leave your insights.
    It is simple to add a class diagram, but it should be able to illustrate the relationship between the two. In addition, I suggest that if you look at the code to clear the relationship, you better draw your own drawing to familiarize yourself with the whole structure.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.