Python Text Parser

Source: Internet
Author: User

Python Text Parser I. Course INTRODUCTION

This course explains a small program that uses Python to parse plain text to generate an HTML page.

Second, related technology

Python: An object-oriented, interpreted computer programming language that can be used for WEB development, graphics processing, text processing, and mathematical processing, and so on.

HTML: Hypertext Markup Language, used primarily to implement Web pages.

Third, the project

Plain text files:

Welcome to ShiYanLouShiYanLou is the first experiment with IT as the core of online education platform.*Our aim is to do the experiment, easy to learn IT*.Course-Basic Course-Project Course-Evaluation CourseContact us-Web:http://www.shiyanlou.com-QQ Group:241818371-E-mail:[email protected]

After parsing the generated HTML page as

Iv. Project explanation 1. Text Block Builder

First we need to have a text block generator that divides plain text into a single block of text so that each text is parsed quickly, and the util.py code is as follows:

#!/usr/bin/python# encoding:utf-8def lines (file):  "" "generator, add a blank line" "" for line in file: yield line yield  \ n ' def Span class= "Hljs-title" >blocks (file):  "" "generator, generate separate text block" "" block = [] for line in lines (file):  If Line.strip (): Block.append (line) elif Block: yield         
2. Handling Procedures

Through the text generator we get a block of text, and then need to have a handler for the different blocks of text with the corresponding HTML tags, the handlers.py code is as follows:

#!/usr/bin/python# Encoding:utf-8ClassHandler:"" Handler Parent Class ""DefCallback(self, prefix, name, *args): Method = GetAttr (self, prefix + name,None)If callable (method):Return method (*args)DefStart(self, name): Self.callback (' Start_ ', name)DefEnd(self, name): Self.callback (' End_ ', name)DefSub(self, name):DefSubstitution(match): result = Self.callback (' Sub_ ', name, match)If resultIsNone:result = Match.group (0)return resultreturn substitutionClassHtmlrenderer(Handler):"" "HTML handler, add the corresponding HTML tag" "to the text blockDefStart_document(self):Print' DefEnd_document(self):Print' </body>DefStart_paragraph(self):Print' <p style= ' color: #444; " > 'DefEnd_paragraph(self):Print' </p> 'DefStart_heading(self):Print' 

DefEnd_heading(self):Print' DefStart_list(self):Print' <ul style= ' color: #363736; " > 'DefEnd_list(self):Print' </ul> 'DefStart_listitem(self):Print' <li> 'DefEnd_listitem(self):Print' </li> 'DefStart_title(self):Print'

DefEnd_title(self):Print' DefSub_emphasis(self, Match):Return' <em>%s</em> '% match.group (1)DefSub_url(self, Match): return ' <a target= "_blank" style= "Text-decoration:none;color: #BC1A4B;" href = "%s" >%s</a> '% (Match.group (1), Match.group (1)) def Sub_mail(self, Match): return c11> ' <a style= "Text-decoration:none;color: #BC1A4B;" href= "mailto:%s" >%s</a> "% (Match.group (1), Match.group (1)) def feed(self, data): print data

3. Rules

With handlers and text block generators, you'll need some rules to determine what markup each chunk of text will give the handler, and the rules.py code is as follows:

#!/usr/bin/python# Encoding:utf-8ClassRule:"" "Rule Parent Class" "DefAction(self, block, handler):"" "Tagged" "" "Handler.start (Self.type) handler.feed (block) handler.end (Self.type)ReturnTrueClassHeadingrule(Rule):"" "A title Rule" "" type =' Heading 'DefCondition(self, Block):"" Determines whether the text block conforms to the rule "" "ReturnNot' \ n 'In blockand Len (block) <=70andNot block[-1] = =‘:‘ClassTitlerule(Headingrule):"" "Second title rule" "" type =' title ' first =TrueDefCondition(self, Block):IfNot Self.first:ReturnFalse Self.first =FalseReturn Headingrule.condition (self, block);ClassListitemrule(Rule):"" List item Rule "" "type =' ListItem 'DefCondition(self, Block):Return block[0] = =‘-‘DefAction(self, block, handler): Handler.start (Self.type) handler.feed (block[1:].strip ()) Handler.end (Self.type)ReturnTrueClassListrule(Listitemrule):"" List Rule "" "type =' List ' inside =FalseDefCondition(self, Block):ReturnTrueDefAction(self, block, handler):IfNot self.insideand Listitemrule.condition (self, Block): Handler.start (self.type) self.inside = True elif self.inside  and not listitemrule.condition (self, Block): Handler.end (self.type) self.inside = false return false class Paragraphrule(rule): "" " paragraph Rule" "" type = ' paragraph ' def condition(self, Block): return True              
4. Parsing

Finally, we can parse the markup.py code as follows:

#!/usr/bin/python# Encoding:utf-8Import SYS, REFrom handlersImport *From UtilImport *From rulesImport *ClassParser:"" "Parser Parent Class" "Def__init__(self, handler): Self.handler = Handler self.rules = [] Self.filters = []DefAddRule(Self, rule):"," Add Rule "" "self.rules.append (rule)DefAddFilter(self, pattern, name):"" Add Filter "" "DefFilter(block, Handler):return re.sub (Pattern, handler.sub (name), block) Self.filters.append (filter)DefParse(Self, file):"" "" "" "" "Self.handler.start (' Document ')For blockIn blocks (file):For filterIn Self.filters:block = Filter (block, Self.handler)For ruleIn Self.rules:If Rule.condition (block): last = rule.action (block, Self.handler)If last:Break Self.handler.end (' Document ')Classbasictextparser (Parser):  "" "Plain text parser "" "def __init__r ' \* (. +?) \* ',  ' emphasis ') Self.addfilter (r ' (http://[\.a-za-z/]+) ', Span class= "hljs-string" > ' URL ') self.addfilter (r ' ([\.a-za-z][email protected][\. a-za-z]+[a-za-z]+) ',  ' mail ')  "" "Run Program" "Handler = Htmlrenderer ( ) parser = Basictextparser (handler) parser.parse (Sys.stdin)        

Run the program (plain text file is test.txt, generate HTML file as test.html)

< test.txt > test.html
Five, code download

You can download the relevant code for this course using the following command:

clone http://git.shiyanlou.com/shiyanlou/python_markup
Vi. Summary

In this applet, we use Python to parse plain text files and generate HTML files, this is just a simple implementation, through this case you can try to parse the Markdown file.

Python Text Parser

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.