How to design a programming language (10) regular expression and domain specific language (DSL)

Source: Internet
Author: User
Tags character set compact expression readable regular expression requires expression engine

A few months ago, bloggers were concerned about DSL, so I thought, I created some DSL in gac.codeplex.com, so I said it today.

Creating a DSL I'm afraid it's the first time a lot of people have ever designed a language. The first time I did this was when I wrote this stupid arpg in high school. It was a very simple scripting language, and it was similar to a compilation, although each instruction was written as a function-calling form. Although this game requires scripts in the plot to control the movement of some characters or something, but fortunately not complicated, so still completed the task. Blink 10 years later, now in writing gacui, in order to develop the convenience, I have made some DSL, or to achieve the other people's DSL, and gradually understand some of the design of DSL methods. But before we talk about these things, let's look at a dsl--regular expression that makes us love (to everyone) and hate (anyway I won't)!

First, regular expressions

The difference of the readability of regular expressions we all know, and the difficulty of regular expression is well worth it. O ' Reilly out a book of two centimeters. According to my experience, as long as the first to learn the principles of compiling, and then according to the specifications of the. NET oneself a regular expression, basically this book will not see. Because the regular expression has to be written in a strange way, just because the engine in your hand is so implemented, you need to follow him to write it, no particular reason. And my own regular expression has a DFA and an NFA two sets of parsers, my regular expression engine checks your regular expression to see if you can use the DFA, which allows you to use the DFA first, eliminating a lot of trouble that is not so important (say a** would be silly). This thing I used to be particularly happy, the code is also placed above the gac.codeplex.com.

Regular expressions are well-deserved as a DSL-because it uses a compact syntax that allows us to define a set of strings and take out the features inside. In general grammar I still like, the only thing I don't like is the function of the parentheses of the regular expression. Parentheses, as a method of assigning precedence, are almost impossible to avoid using. But it's really surprising that many popular regular expressions have the ability to capture the parentheses--because most of the time I don't have to capture it, and this time it's just a waste of time and space to do something extra. So in my own regular expression engine, parentheses are not captured. If you want to capture, you have to use special syntax, such as <name>pattern, to capture the pattern to a group called name.

So what can we learn from the grammar of regular expressions about DSL design principles? I think that the principle of DSL is actually very simple, only the following three:

Short syntax to be assigned to commonly used functions

syntax is either very readable (and thus directly written directly in C #) or compact (thus much shorter than directly written in C #).

APIs are easy to define (and thus easy to invoke in C #, and to ensure that the goal of the DSL is clear and simple)

Many DSLs actually satisfy this definition. SQL is part of the simple and readable API (think Ado.net), and regular expressions are part of the API's simplicity and syntax. Why can regular expressions be designed so tightly? Now let us one by one uncover its veil of mystery.

The basic elements of regular expressions are very few, only connections, branches and loops, and some simple syntactic sugars. The connection does not require a character, the branch requires a character "|", and the loop requires only one character "+" or "*", and "." Representing any character, and {5,}, which represents multiple loops, and [a-za-z0-9_], which represents the character set. For a collection of individual characters, we don't even need to [] write directly. Besides, because we use some special characters, we have to have the escape (escaping) process. So let's count how many characters we've defined: "|+*[]-\{},". ()”。 Not much, right.

Although it may seem messy, the regular expression itself has a rigorous grammatical structure. The syntax tree definition for my regular expression can be seen here: Https://gac.codeplex.com/SourceControl/latest#Common/Source/Regex/RegexExpression.h. Here we can sort out a syntax:

DIGIT:: = [0-9]
LITERAL:: = [^|+*\[\]\-\\{}\^,. ()]
Any_char:: = LITERAL | "^" | "|" | "+" | "*" | "[" | "]" | "-" | "\" | "{" | "}" | "," | "." | "(" | ")"
    
Char
    :: = LITERAL
    :: = "\" Any_char
    
charset_component
    :: = Char
    :: = char "-" Char
    
CHARSET
    : : = CHAR
    :: = "[" ["^"] {charset_component} "]"
    
regex_0
    :: = CHARSET
    :: = regex_0 "+"
    :: = Regex_0 " * "
    :: = Regex_0" {"{DIGIT} [", "[{DIGIT}]]"} "
    :: =" ("Regex_2") "regex_1:: =
    regex_0
    :: = Regex_1 regex_0
    
regex_2
    :: = regex_1
    :: = Regex_2 "|" Regex_1
    
regular_expression
    :: = Regex_2

This is just a handy syntax, though it may not be so rigorous, but it represents all the structures of the regular expression. Why do we have to master EBNF's reading and writing? Because when we use EBNF to look at our language, we will not be bothered by the increasingly superficial, we will cast the grammar of the cloak, see the language itself structure. It's always nice to take off other people's clothes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.