A tutorial on creating declarative mini languages in Python

A tutorial on creating declarative mini languages in Python _python

Last Update:2017-01-19 Source: Internet

Author: User

Tags inheritance xslt in python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When most programmers consider programming, they have to envision the command-line styles and techniques that are used to write applications. The most popular general-purpose programming languages, including Python and other object-oriented languages, are mostly command-style. On the other hand, there are a number of programming languages that are declarative, including functional and logical languages, as well as common languages and private languages.

Let's list a few languages that belong to each category. Many readers have used many of the tools in these tools, but they do not necessarily consider the different kinds of differences between them. Python, C, C + +, Java, Perl, Ruby, Smalltalk, Fortran, Basic, and xBase are all simple imperative programming languages. Some of them are object-oriented, but that's just a matter of organizing code and data, not basic programming style. Using these languages, you command a program to execute a sequence of instructions: putting some data into a (put) variable, getting (fetch) data from a variable, looping (loop) a block of instructions until (until) satisfies certain conditions, or, if (if) a proposition is true, doing something. One of the beauty of all these languages is that it's easy to think about them in the familiar metaphors of everyday life. Everyday life is made up of doing, choosing, and doing another thing, and may use some tools during the period. The computer that runs the program can be simply imagined as a chef, masons, or car driver.

Languages such as Prolog, Mercury, SQL, XSLT, EBNF syntax, and true configuration files in various formats all declare that something is the case or that some constraints have been applied. Functional languages (such as Haskell, ML, Dylan, Ocaml, and Scheme) are similar, but they emphasize the internal (function) relationships between programming objects (recursion, lists, and so on). Our daily life (at least in terms of narrative quality) does not provide direct simulations of the programming constructs of these languages. However, for issues that can be described in these languages, declarative descriptions are far simpler and less error-prone than imperative solutions. For example, consider the following linear equation group:
Listing 1. Linear equation System sample

10x + 5y-7z + 1 = 0
17x + 5y-10z + 3 = 0
5x-4y + 3z-6 = 0

It's a pretty nice simple expression of several relationships between objects (x, y, and Z). In real life you may find these answers in different ways, but actually it's annoying and error-prone to "solve X" with pen and paper. From a debugging standpoint, it might be worse to write a solution step in Python.

A Prolog is a language that is closely related to logic or mathematics. With this language, you just write the statements that you know are correct, and then let the application draw the results for you. Statements are not made in a particular order (as in the case of linear equations, not in order), and you (programmers or users) do not know what steps are taken to produce the results. For example:
Listing 2. Family.pro Prolog Sample

/* Adapted from sample in:

It is not exactly the same as the EBNF (extended Backus paradigm, Extended backus-naur form) syntax, but is essentially similar. You can write some of the following statements:
Listing 3. EBNF sample

Word    : = Alphanums, (wordpunct, alphanums) *, contraction?
Alphanums  : = [a-za-z0-9]+
wordpunct  : = [-_]
contraction: = "'", ("Clock"/"D"/"ll"/"M"/"Re"/"s"/"T"/ "VE")

If you come across a word and want to describe what it might look like, and actually don't want to give a sequence of instructions on how to recognize it, it's a simple method. A regular expression is similar to this (and in fact it satisfies the needs of this particular grammatical product).

For another declarative example, consider the document type declaration that describes the dialect of a valid XML document:
Listing 4. XML Document Type Declaration

<! ELEMENT dissertation (chapter+) >
<! ELEMENT Chapter (title, paragraph+) >
<! ELEMENT title (#PCDATA) >
<! ELEMENT paragraph (#PCDATA | figure) +>
<! ELEMENT Figure empty>

As with other examples, the DTD language does not contain any instructions on how to identify or create a valid XML document. It only describes what it would be like if the document existed. Declarative language uses subjunctive mood.
Python as an interpreter vs Python as an environment

The Python library can take advantage of declarative languages in one of two distinct ways. Perhaps the more common technique is to parse and process non-Python declarative languages as data. An application or library can read from an external source (or a string that is internally defined but used only as a "blob"), and then indicate a set of command steps to be performed, which are in some form consistent with those external declarations. In essence, these types of libraries are "data-driven" systems, and there are conceptual and categorical differences between declarative language and the operations that Python applications perform or utilize their declarations. In fact, it is quite common that libraries that handle the same declarations are also used to implement other programming languages.

All of the examples given above belong to the first technique. The library Pylog is the Python implementation of the Prolog system. It reads a prolog data file like a sample, and then creates a Python object to model the Prolog declaration. The EBNF sample uses a specialized variant of simpleparse, a Python library that converts these declarations to be MX. The state table used by the Texttools. Mx. Texttools itself is a Python extension library that uses the underlying C engine to run code stored in the Python data structure, but it has little to do with python in nature. Python is an excellent binder for these tasks, but the cohesive language differs greatly from Python. Moreover, most Prolog implementations are not written in Python, as are most EBNF parsers.

DTDs are similar to other examples. If you use a validation parser such as Xmlproc, you can use DTDs to validate the dialect of an XML document. But the language of the DTD is not Python, xmlproc it only as data that needs to be parsed. Also, XML validation parsers have been written in many programming languages. XSLT transformations are similar and not specific to Python, and modules like Ft.4xslt only use Python as a "binder".

Although the above method and the tools mentioned above (which I have been using all the time) are nothing wrong, if Python itself is a declarative language, it may be more subtle and some aspects will be more clearly expressed. If there are no other factors, libraries that help this will not allow programmers to consider whether to use two (or more) languages when writing an application. Sometimes it's simple and useful to rely on Python's introspection to implement a "native" statement.

The magic of Introspection

Parsers Spark and PLY let users declare python values in Python, and then use some magic to make the Python run-time environment parse-configurable. For example, let's look at the PLY syntax equivalent to the previous Simpleparse syntax. Spark is similar to the following example:
Listing 5. PLY sample

Tokens = (' alphanums ', ' wordpunct ', ' contraction ', ' whitspace ')
t_alphanums = r "[a-za-z0-0]+"
t_wordpunct = r "[ -_] "
t_contraction = r" (clock|d|ll|m|re|s|t|ve)
def t_whitespace (t):
  R "\s+"
  t.value = ""
  return t
import Lex
lex.lex ()
lex.input (sometext) while
1:
  t = lex.token ()
  if not t: Break

I've written about PLY in my upcoming book Text Processing in Python, and I've written about Spark in this column (see Resources for links). Instead of delving into the details of the library, here's what you should be aware of: the Python binding itself is configured to understand (in this example, lexical analysis/tagging). The PLY module runs in a Python environment to act on these schema declarations, so it's just as well aware of the environment.

Ply how to learn what it does by itself, which involves some very singular Python programming. At first, intermediate programmers would find it possible to identify the contents of the Globals () and locals () dictionaries. It would be nice if the declaration style was slightly different. For example, the hypothetical code is more like this:
Listing 6. Using the imported module namespace

Import Basic_lex as _
_.tokens = (' Alphanums ', ' wordpunct ', ' contraction ')
_. Alphanums = r "[a-za-z0-0]+"
_. Wordpunct = r "[-_]"
_. contraction = r "' (Clock|d|ll|m|re|s|t|ve)"
_.lex ()

The declarative nature of this style is not bad, and you can assume that the Basic_lex module contains simple content similar to the following:
Listing 7. basic_lex.py

def Lex ():
  for T in tokens:
    print t, ' = ', globals () [t]

This will produce:

% python basic_app.py
alphanums = [a-za-z0-0]+
wordpunct = [-_]
contraction = ' (clock|d|ll|m|re|s|t|ve)

PLY managed to insert the namespace of the import module using stack frame information. For example:
Listing 8. magic_lex.py

Import sys
try:raise runtimeerror
except runtimeerror:
  e,b,t = Sys.exc_info ()
  caller_dict = t.tb_ Frame.f_back.f_globals
def Lex ():
  for T in caller_dict[' tokens ':
    print t, ' = ', caller_dict[' T_ ' +t]

This produces the same output as the basic_app.py sample, but has a declaration that uses the previous T_token style.

The actual PLY module is more magical than this. We see that the tag named with the pattern T_token can actually be a string that contains a regular expression, or a function that contains a regular expression document string and an action code. Some types of checks allow the following polymorphic behavior:
Listing 9. Polymorphic_lex

# ... determine caller_dict using RuntimeError ...
From types Import *
def Lex (): for
  T in caller_dict[' tokens ':
    t_obj = caller_dict[' T_ ' +t]
    if type (t _obj) is functiontype:
      print t, ' = ', t_obj.__doc__
    else:
      print t, ' = ', t_obj

Obviously, the real PLY modules can do more interesting things with these declared patterns than the examples used to play, but these examples demonstrate some of the techniques involved.

The Magic of Inheritance

Allow the support library to insert and manipulate the application's namespace everywhere, which enables subtle declarative styling. In general, however, the use of inheritance structures and introspection can make flexibility even better.

Module Gnosis.xml.validity is a framework for creating classes that map directly to a DTD product. Any gnosis.xml.validity class can only be instantiated with parameters that conform to the validity constraints of an XML dialect. In fact, this is not quite true; when there is only one clear way to "elevate" the parameter to the correct type, the module can infer the correct type from simpler parameters.

Since I have written the gnosis.xml.validity module, I tend to think about whether its use is interesting. But for this article, I just want to look at the declarative style of creating a validation class. A set of rules/classes that match the previous DTD sample include:
Listing 10. gnosis.xml.validity Rule Declaration

From gnosis.xml.validity Import *
class figure (EMPTY):   Pass
class _mixedpara (Or):   _disjoins = (PCDATA , figure)
class paragraph (Some):  _type = _mixedpara
class title (PCDATA):   Pass
class _paras (Some) :    _type = Paragraph
class chapter (SEQ):    _order = (title, _paras)
class dissertation (Some): _type = Chapter

You can use the following command to create an instance from these declarations:

CH1 = Liftseq (Chapter, ("1st title", "Validity is important"))
CH2 = Liftseq (Chapter, ("2nd Title", "Declaration is Fun") "))
diss = dissertation ([Ch1, CH2])
print diss

Note that these classes match very well with the preceding DTDs. Mappings are essentially one by one-aligned, except for the quantification and alternating use of nested tokens (the mediator name is marked with a leading underscore).

Also note that although these classes were created in standard Python syntax, they are also unusual (and more concise): They have no method or instance data. Define a class separately to inherit a class from a framework that is limited by a single class attribute. For example, <chapter> is another tag sequence, that is, <title> followed by one or more <paragraph> tags. But to ensure that the constraints are adhered to in the instance, all we need to do is to declare the chapter class in such a simple way.

The main "trick" involved in writing a parent class like GNOSIS.XML.VALIDITY.SEQ is to study the. __class__ property of an instance during initialization. The class chapter itself does not initialize, so it invokes the __init__ () method of its parent class. But the self passed to the parent class __init__ () is an instance of chapter, and self knows chapter. To illustrate this point, some GNOSIS.XML.VALIDITY.SEQ implementations are listed below:
Listing 11. Class Gnosis.xml.validity.Seq

Class Seq (tuple):
  def __init__ (self, inittup):
    if not hasattr (self.__class__, ' _order '):
      raise Notimplementederror, \
        "Child of Abstract Class Seq must specify order"
    if not isinstance (Self._order, tuple): 
   raise Validityerror, "Seq must have tuple as order"
    self.validate () self._tag
    = self.__class__.__name__

Once an application programmer attempts to create a chapter instance, the instantiated code checks to see if the chapter is declared with the required. _order Class property and whether the property is the desired tuple object. method. Validate () to make further checks to ensure that the object used to initialize the instance belongs to the appropriate class specified in the. _order.

When to declare

Declarative programming styles are almost always more straightforward to declare constraints than imperative or procedural styles. Of course, not all programming problems are about constraints--or at least it's not always the law of nature. But if rule-based systems (such as grammars and inference systems) can be declarative, their problems are easier to deal with. The syntax-compliant command-line validation quickly becomes a very complex, "Spaghetti Code" (spaghetti), and is difficult to debug. Statements of patterns and rules can still be simpler.

Of course, in Python at least, the validation and enhancements of declarative rules always boil down to procedural checks. However, it is appropriate to put this process check in the library code for good testing. Separate applications should rely on simpler declarative interfaces provided by libraries such as Spark or PLY or gnosis.xml.validity. Other libraries such as Xmlproc, Simpleparse, or ft.4xslt can use declarative styles, although they are not declared in Python (Python certainly applies to their domain).

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More