Code runaway and state machine (bottom)

Source: Internet
Author: User
Tags dashed line mysql manual postgresql
Preface

At the end of the article "Code Runaway and state machine", we left a "job" that resolves the "Member access expression", so we can do this work together.

First of all, why do you want to write a parser that looks useless? Because in some IOC or AOP container (unfortunately I need to implement an IOC container), it is often necessary to dynamically solve the value of a member access expression, and parsing an expression is the first step. In fact, this "job" is the compiler technology morphemes resolution of the simplified version, their own manual, to understand the "compiler principle" front-end processing skills is a good entry practiced hand.

Second, I'm building an ORM data engine that has a cool feature of supporting features like GRAPHQL in CRUD (that is, data schema expressions), so I need to write a parser for class graphql, which should be considered a valuable case.

As above, the handwriting of various "expression" parser is very practical and valuable.

Source
    • General lexical parsing module (syntax parsing and compiler not implemented)

      Github.com/zongsoft/zongsoft.corelibrary/tree/master/src/expressions

    • Member Access Expression parser

      Github.com/zongsoft/zongsoft.corelibrary/tree/feature-data/src/reflection/expressions

    • Data Pattern Expression Parser

      Github.com/zongsoft/zongsoft.corelibrary/blob/feature-data/src/data/schemaparser.cs

Basic knowledge

BNF (Backus-Naur Form) Basque paradigm, in the field of computer science,BNF is a symbolic representation of a context-independent grammar, often used to describe the syntax (grammar) of a computational programming language, Document format, instruction set, communication protocol, etc., in short it is suitable for the need to accurately describe the place, if not use this thing, we have little way to accurately and concisely express things like computer programming language needs to be accurately expressed.

In addition to the basic BNF , people expanded and enhanced for a more concise expression, such as:EBNF(Extended Backus–Naur F orm),ABNF(Augmented Backus–Naur Form), I have found a few articles for your reference (especially the first three articles):

    • The meaning and usage of BNF and ABNF
    • "Grammatical norms: BNF and ABNF"
    • "Ebnf:extended Backus–naur Form"
    • "Abnf:augmented BNF for Syntax Specifications (rfc5234)"
    • "C # Language Specification"(the most powerful example of applying BNF: C # language Specification)

Unless you're writing a compiler for programming languages, usually we don't have to read and write like YACC(Yet Another compiler Compiler) or ANTLR ( An other Tool for Language Recognition) Those tools are very "precise" in BNF 's syntax. For a specific case of YACC and ANTLR, I recommend this article (instead of keying in details, focusing on the syntax definition section):

"Tidb Source reading Series (v) tidb SQL Parser Implementation"

I recommend that you read and use the BNF dialect used in each SQL manual to learn about the application, because they have simple syntax conventions and are sufficient for general application scenarios. Below are their links (personal preference for my soft Transact-SQL), please eat.

    • Transact-SQL Syntax conventions

      docs.microsoft.com/zh-cn/previous-versions/sql/sql-server-2012/ms177563 (v=sql.110)

    • PostgreSQL Manual

      Www.postgresql.org/docs/10/static/index.html

    • MySQL Manual

      dev.mysql.com/doc/refman/8.0/en

    • Oracle Manuals

      Docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf

Syntax specification

For a detailed syntax (grammar) of "Member access Expressions", refer to the C # language specification, let's take a look at the example of the member expression previously written:

PropertyA.ListProperty[100].MethodA(PropertyB, 'String\'Constant for Arg2', 200, ['key'].XXX.YYY).Children['arg1', PropertyC.Foo]

I try to express the meaning of the above code in natural language:

    1. Access to a PropertyA member (attribute or field) named in an object;
    2. Access the member named in the above member value Object ListProperty (the member is a list type or has an indexer for that member's type);
    3. Access the method named in the above member value Object MethodA (the method has no limit on the number of arguments, this example is 4 parameters);
    4. Access the above method returns a member named in Children the value object (the member is a list type or has an indexer for the type that the member belongs to).

Additional notes:

    • The number of method parameters is variable (0 or more), and the parameter type can be a constant (string, number) or member access expression;
    • The list property or indexer parameter is at least one or more, and the parameter type is the same as the method parameter;
    • String constants are marked with single or double quotes and support backslash \ escape characters;
    • Numeric constants support the suffix callout, which means "L" for Long, "M" or "M" for the decimal type, and so on.

As above, even if I write such a long text, still do not complete the precise and complete the "member expression" of the grammatical expression, we can see that we have to use BNF such things to express accurately. Here is its BNF paradigm (using the Transact-SQL syntax specification):

expression ::= {member | indexer}[.member | indexer][...n]member ::=  identifier | methodindexer ::= "[" {expression | constant}[,...n] "]"method ::= identifier([expression | constant][,...n])identifier ::= [_A-Za-z][_A-Za-z0-9]*constant ::= "string constant" | numbernumber ::= [0-9]+{.[0-9]}?[L|m|M|f|F]

As above, even though we are not using a "high-precision" BNF expression that can directly generate the lexical parser (Parser), it is still accurate and concise enough.

State machine Diagram

With the exact syntax specification/grammar (i.e. BNF paradigm expression), we can draw the state machine diagram of the expression parser in a targeted way.

Status Description:

    • IDentifier: The identity state, which indicates the name of the member (attribute, field, method);
    • SEparator: The delimiter state, which indicates the status of the member delimiter (that is, the dot);
    • GUtter: The void state, indicating the space in which the indexer or method parameter ends;
    • Indexer: The indexer state, which indicates the ready state within the indexer, can continue to accept a valid non-terminator, or it can be a terminator;
    • Parameter: A parameter state that represents the end state of an indexer or method parameter, which must wait for a terminator (comma or parenthesis);
    • String: A string constant state, which represents the inside of a string constant, which can accept any character and, if encountered, Terminator (matching single or double quotes) into the parametric state;
    • Number: A numeric constant state that represents a numeric literal, which can accept any number of characters and, if encountered Terminator (suffix), into a parametric state.

Because the parameters of the method and indexer are likely to be expressions, recursive stack processing is required on the implementation, so the flowchart has the behavior of a stack (push), a stack (pop), and a dashed line representing the corresponding excitation operation. All the left bracket paths excite the stack operation, and the right [ bracket ] path fires the corresponding stack operation; Because of the layout problem, the above flowchart does not indicate the part of the access stack of the parentheses (method parameter) path, but the logic equals the square brackets (indexer) section.

Tips:

    • If a character that is not defined in the state diagram appears in the State migration decision, the input parameter has a specific syntax error.
    • If the recursive stack is still not empty when the text parsing is complete, the parameters of the indexer or method do not match.

About the design of the parser state machine, I did not find the general design guidance scheme, we can set a different state definition according to their own understanding, as to the state of the granularity of the grasp, the overall principle is to have logical or conceptual self-alignment, and easy to draw and programming implementation on it.

SOURCE parsing

Interfaces and classes located in the Zongsoft.Reflection.Expressions namespace are similar to the design of related classes in the System.Linq.Expressions namespace as a whole. The approximate class diagram is as follows:

Parsing is provided by Memberexpressionparser, the internal static class (state machine Class), whose parse (string text) is a state-driven function that iterates through the text characters of the input parameters and gives it to the specific private method doxxx (context) The State migration is determined, so that the cycle is completed the entire parsing work, the overall structure and the "code out of Control and state machine (top)" described in the state machine program structure, the specific code is as follows:

public static Imemberexpression Parse (string text, action<string> onError) {if (string.    IsNullOrEmpty (text)) return null; Create a parse context object var context = new Statecontext (text.    Length, OnError); State migration driver for (int i = 0; i < text. Length; i++) {context.        Character = Text[i]; Switch (context. State) {Case State.None:if (!                Donone (ref context, I)) return null;            Break Case State.Gutter:if (!                Dogutter (ref context, I)) return null;            Break Case State.Separator:if (!                Doseparator (ref context, I)) return null;            Break Case State.Identifier:if (!                Doidentifier (ref context, I)) return null;            Break Case State.Method:if (!                Domethod (ref context, I)) return null;    Break        Case State.Indexer:if (!                Doindexer (ref context, I)) return null;            Break Case State.Parameter:if (!                Doparameter (ref context, I)) return null;            Break Case State.Number:if (!                Donumber (ref context, I)) return null;            Break Case State.String:if (!                Dostring (ref context, I)) return null;        Break }}//Get the final parse result return context. GetResult ();}

Code Jianyi:

    • The enumeration that represents the state is exactly the same as the definition of the parser state machine flowchart above.
    • The internal statecontext structure is used to preserve various data, states, character caches, and other context-sensitive methods of operation during parsing.
    • The internal statevector structure is used to preserve the state of the token switch (Boolean) during parsing, such as the type of the current numeric constant, whether the current character is in the escape state of the string constant, whether the identity (Identifier) contains white space characters, and so on.
Other extensions

In the Zongsoft.data data engine, there is the concept of a data schema (schema), which is an expression that defines the shape of the data in a data operation, somewhat like the function of a GRAPHQL expression (without query criteria).

For example, there is a name Corporation of the enterprise entity class, which in addition to the enterprise number, name, abbreviation and other single-valued attributes, there are legal entities, departments, such as "one-to-many" and "multiple" composite (navigation) properties. Now suppose we call the method of the data access class Select to make the query call:

var entities = dataAccess.Select<Corporation>(    Condition.GreaterThanEqual("RegisteredCapital", 100));

The above code represents Corporation the table for the query entity, the condition is a RegisteredCapital record with a registered capital greater than or equal to 1 million, but lacks the semantics to express the Corporation navigation properties associated with the entity. Use the Data schema (schema) to define the data shape of the operation, roughly as follows:

var schema = @"CorporationId, Name, Abbr, RegisteredCapital,Principal{Name, FullName, Avatar},Departments:10(~Level, NumberOfPeople){    Name, Manager    {        Name, FullName, JobTitle, PhoneNumber    }}";var entities = dataAccess.Select<Corporation>(    schema,    Condition.GreaterThanEqual("RegisteredCapital", 100) &    Condition.Like("Principal.Name", "钟%"));

Through the data access method schema parameters, we can easily define the data shape (including a pair of multi-navigation property paging and sorting settings), thus eliminating the multiple access to the database for data traversal operation, greatly improve the efficiency of operation, while simplifying the code.

Each member of the data schema is separated by commas, and if it is a composite attribute you can use curly braces to qualify its internal property set, and for one-to-many composite properties, you can also define its paging and sorting settings. The following is its BNF paradigm:

schema ::={    * |    ! |    !identifier |    identifier[paging][sorting]["{"schema[,...n]"}"]} [,...n]identifier ::= [_A-Za-z][_A-Za-z0-9]*number ::= [0-9]+paging ::= ":"{    {*|?}|    number[/{?|number}]}sorting ::="("    {        [~|!]identifier    }[,...n]")"

Hint: an exclamation point denotes an exclusion, an exclamation mark that excludes all member definitions before it is excluded, and a member identifier that starts with an exclamation mark, which means that the member that was defined earlier is excluded (not ignored if it was previously defined).

Interpretation of paging settings:

*Returns all records (i.e. no paging);
?Return to the first page, the page size is the system default, equivalent to the 1/? format (data engine default settings);
nReturns n records, equivalent to the 1/n format;
n/mReturn to page N, per page m rows;
n/?Return to page N, the page size is the system default value;

The above is the data pattern Expression Parser state machine diagram, the implementation of the code is not mentioned here, in general, the "member access expression" parser is similar.

End

In many application state machine scene programming, drawing a state machine diagram is very important for the realization, hope that through these two concrete cases can enlighten everyone.

In fact Linux/unix in the command line, is also a good case, interested can try to write down its BNF and analytic state machine diagram.

This time we introduced the text parsing related state machine design and implementation, in fact, and workflow-related common state machine is also a very interesting application scenario, Universal state machine can be applied in the game, workflow, business logic drive and so on. In the second half of last year because of business line needs, I spent almost one or two weeks to achieve a complete universal state machine, self-feeling design is good, but because of time is cramped, in the status of a generic implementation of a small flaw, after the optimization to introduce its architecture design and implementation, this series is first and end.

Remind:

This article may be updated, please read the original text: Zongsoft.github.io/blog/zh-cn/zongsoft/coding-outcontrol-statemachine-2 to avoid the fallacy caused by outdated content, There is also a better reading experience.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.