Highlight the org-mode code block with Python

Last Update:2015-01-09 Source: Internet

Author: User

Tags lexer virtualenv

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Highlight the org-mode code block with Python
1 Preface recently I have been studying the use of org-mode to write a blog. Everything else has been deeply touched by my heart and even my mind. That is, the issue of code dyeing and publishing html should be a negative comment. Org-mode uses the htmlize plug-in to color the code in the src block, so that the code block in the article outputs the same html color as the one you see on emacs. The problem is that my emacs background is dark, and my blog is light color, so the code highlight style is not harmonious, not to mention the single highlight topic cannot be customized, the output code line number is ugly. Of course, this can all be solved by elisp, but it must be complicated and obscure (color tone ...) As a result, I once again invested in the embrace of the omnipotent Python, directly using its pygments library to highlight the code. 2. The implementation framework first introduces pygments. Pygments can highlight and output a piece of native code in html, latex, png, and other formats. It also provides various style controls. Because the pygments library is a native python library, it is unrealistic to control the org-mode release by writing the elisp plug-in. After thinking twice, you can only start with the html file published by org-mode, changed the html of the code block. Let's take a look at the html features output by the org-mode code block: src block: # + begin_src pythonimport pygmentsprint "aa" # + end_src output html: <div class = "org-src-container"> <pre class = "src-python"> <span style = "color: # 66D9EF; "> import </span> pygments <span style =" color: # 66D9EF; "> print </span> <span style =" color: # E6DB74; ">" aa "</span> </pre> </div>, after the code block outputs html, it will always be included in <div class = "org-src-container">... </div>, the code language is specified by the class attribute of <pre>. OK. The goal is to replace the above html code with pygments to the highlighted topic we want. Program process: extract: include in <div class = "org-src-container">... </div> the html code in array A is extracted to mark: Use BeautifulSoup to parse the html code in array A, remove the html tags from it, and wait until the native code is highlighted: use pygments to highlight the native code and output a new html replacement: use the new html to replace the old one and write the file style again: specify for code block or design CSS3 specific implementation complete code visible my _ pygment-html.py 3.1 virtual environment because in my Jekyll, you need to write multiple python scripts for processing, therefore, I first set up a virtual environment, and then all the scripts are developed in this virtual environment. 3.1.1 VirtualenvVirtualenv is used to create an independent Python environment. Multiple Python environments are independent of each other and do not affect each other. It can: if you do not have the permission to install the new suite, different applications can use different suite versions. The upgrade does not affect installation of other applications. pip install virtualenv creates: virtualenv/your/path/of/env, by default, the virtual environment depends on the site packages in the system environment, that is, the third-party packages installed in the system will also be installed in the virtual environment. If you do not want to rely on these packages, you can add the parameter -- no-site-packages to create a virtual environment and start the virtual environment: cd/your/path/of/env, source. /bin/activate. Note that the command line has one more ENV, which is the name of the virtual environment. Then, all modules will only be installed in this directory. Exit the virtual environment: deactivate 3.1.2 VirtualenvWrapperVirtualenv is very useful, but the operation is complicated (think about switching multiple envs back and forth). Therefore, Virtualenvwrapper can be used to simplify the operation: integrate all virtual environments into one directory for Management (add, delete, and copy). Change the virtual environment... Installation: pip install virtualenvwrapper writes the following code to. bashrc/. zshrc: if ['id-U '! = '0']; thenexport VIRTUALENV_USE_DISTRIBUTE = 1 # <-- Always use pip/distributeexport WORKON_HOME = $ HOME /. virtualenvs # <-- Where all virtualenvs will be storedsource/usr/local/bin/virtualenvwrapper. shexport PIP_VIRTUALENV_BASE = $ WORKON_HOMEexport PIP_RESPECT_VIRTUALENV = truefi create $ HOME /. virtualenvs directory, you can create a new Virtualenv in it later. If your Virtualenv does not want to be placed in it, you can also create only symbolic links. Use: List Virtual Environments: workon or lsvirtualenv create virtual environment: mkvirtualenv [Virtual Environment name] Start/switch virtual environment: workon [Virtual Environment name] Delete virtual environment: rmvirtualenv [Virtual Environment name] leaves the virtual environment: deactivate should note that when you enter ENV, the python program you call is under the ENV/bin directory, so the script starts #! /Usr/bin/python is useless. You need to explicitly call the python interpreter when running the script. 3.1.3 install the Shell script of ENV because the entire ENV directory is not suitable for uploading to the github page repository (various build page errors occur after uploading ). So I wrote a Shell script for installing ENV: mkdir _ py_virtualenv pip2 install virtualenv & virtualenv _ py_virtualenv -- no-site-packages & source _ py_virtualenv/bin/activate & pip2 install pygments & pip2 install beautifulsoup4 only run with source, it cannot be run as an executable file. Because source is executed directly in the Current shell environment, the executable file method will only be executed in the new sub-shell (errors will occur when the source is executed) 3.2 coding problems because I am using python2.7, and python2.7 coding problems have been criticized. Python2.7 uses ascii encoding by default. When non-ascii encoding occurs in a program, python often reports the following error: UnicodeEncodeError: 'ascii 'codec can't encode characters blalbla has two ways to deal with this: one is to add encode ("utf8") after a string involving non-ascii encoding "), however, this method seems to be ineffective, and if you write less, it will lead to a large number of error reports, which are not recommended. The other is to change the interpreter encoding to utf8 at the beginning of program loading, which is also used by me: import sysreload (sys) sys. setdefaultencoding ('utf8') 3.3 command line interaction this script runs through the command line. The highlighted file is executed by the user through the command line parameter command, and the sys module can parse cli parameters well, therefore, you can easily use shell features to input parameters. The specific code is as follows: if len (sys. argv) = 1: print 'no Arguments! 'Else: for file in sys. argv [1:]: if '.html 'in file: hightlight_instance = Pygments_Html (file) hightlight_instance.colorize () sys. argv is a list, sys. argv [0] is the program name, sys. argv [1:] is the name of each parameter in cli. 3.4 Pygments_HtmlPygments_Html is a Class I wrote for highlighting code. It only contains two functions: _ init _ and colorize. 3.4.1 _ init _ initialization function def _ init _ (self, file): self. filename = file self. export age_dict = {'sh': 'sh', 'matlab ': 'matlab', 'C': 'C', 'c ++ ': 'C ++ ', 'css ': 'css', 'python': 'python', 'scheme ': 'scheme', 'latex ': 'latex', 'Ruby ': 'Ruby ', 'css ': 'css', 'html': 'html', 'others ': 'text'} filename is the html file to be processed on behalf, language_dict maps the language names supported by org-mode to the language names supported by pygments (because there are slight differences between the two). If the language in org-mode is not supported by pygments, maps to text and outputs in plain text mode. Note: The languages supported by org-mode can be ls/usr/share/emacs/site-lisp/org-mode/ob, pygments supports the 3.4.2 colorize highlighted function on pygments.org/docs/lexers to highlight the code block contained in the filename file. Read file: First Read the corresponding file stream to file_read: try: # open the html file = open (self. filename, 'R') handle T IOError: print self. filename, 'not exists' return file_read = file. read () print "Opening", self. filenamefile. close () RE: Then extract the file contained in <div class = "org-src-container">... </div> in the html code: import resrc_html_list = re. findall (R' <div class = "org-src-container">. *? </Div> ', file_read, re. S) is extracted using the re module .*? It indicates the inert match. The reason is that it matches as few characters as possible. It starts from the first character. Once the condition is met, it is saved to the matching set immediately, then, continue searching. What is the opposite?. Re. S is a flag of the regular expression, because the text to be searched spans multiple lines. If this falg is not added, the python re will match only one line at a time, if this flag is added,. * It will match line breaks including \ n. BeautifulSoup: Next we need to start processing each element in src_html_list: import BeautifulSoup4for src_html in src_html_list: soup = BeautifulSoup (src_html) src_soup = soup. find ("div", class _ = "org-src-container") language = (src_soup.pre ['class'] [1]). split ('-') [1] Here BeautifulSoup is used to parse the html contained in src_html. Here soup. find uses two parameters. The first one is the tag to be searched, and the next class _ is the class attribute in the tag. A soup object-src_soup that meets these two conditions is returned. The language of the code block is stored in the class attribute of <pre> and extracted into the language. Map the language to the language name supported by pygments: if language in self. required age_dict: language = self. language_dict [language] else: language = self. export age_dict ['others '] Pygments: You can use pygments to highlight the code: from pygments import highlightfrom pygments. lexers import get_lexer_by_namefrom pygments. formatters import HtmlFormatter lexer = get_lexer_by_name (language, stripall = True) formatter = HtmlFormatter (linespans = 'line', c Ssclass = "highlight") src_colorized = highlight (src_soup.text, lexer, formatter) pygments is a python Algorithm for highlighting code. src_soup.text in line 1 can remove all html tags from the soup object, leaving only the native code of plain text. The highlight function has three parameters: the first is the code string used for highlighting, the second is lexer, used to specify the code language, and the third is formatter, used to specify the output style. Here, the formatter is specified as HtmlFormatter, that is, the output is html code. cssclass is used to specify the div style name, linespans is specified as line, and the id prefix of <span> is specified as line, the output format is as follows: <div class = "highlight"> <pre> <span id = "line-1">... <span> <span id = "line-2">... <span> <span id = "line-3">... <span> <span id = "line-4">... <span> </pre> </div>. hight designs CSS and controls the code and row number styles. Replace: src_colorized now stores the pygments highlighted html code. replace the original: file_read = file_read.replace (src_html, src_colorized) Replace with two parameters. The first one is the old text to be replaced, the second is the new text. Rewrite: After the for loop is complete, it means that all the code has been highlighted. You can Rewrite the new html: file = open (self. filename, 'w') file. write (file_read) file. close () 3.5 The pygments above CSS only outputs the html structure, but CSS is not specified yet. First, generate the code Color Style: pygmentize-S default-f html> the style file generated by your/path/pygments.css is added to our webpage: <link rel = "stylesheet" href = "/your/path/pygments.css"> because I use jekyll, so I will post the css file under assets/themes/havee/css/, and then I need to specify the row number style. decided by hightlight pre span :. highlight pre {counter-reset: linenumbers ;}. highlight pre> span: before {font-size :. 9em; color: # aaa; content: counter (linenumbers); counter-increment: linen Umbers; text-align: center; width: 2.5em; left:-0.5em; position: relative;-webkit-touch-callout: none;-webkit-user-select: none; -khtml-user-select: none;-webkit-user-select: none;/* Chrome all/Safari all */-moz-user-select: none; /* Firefox all */-ms-user-select: none;/* IE 10 + * // * No support for these yet, use at own risk */-o-user-select: none;} the row number is automatically generated by counter, Line 14 to line 21 *-user-select prohibits row numbers from being selected, so you can easily copy the code by browsing the code.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More