Use Boost. Spirit. X3, boostmsvc in msvc
Preface
"Examples of designs that meet most of the criteria for" goodness "(easy to understand, flexible, efficient) are a recursive-descent parser, which is traditional procedural code. another example is the STL, which is a generic library of containers and algorithms depending crucially on both traditional procedural code and on parametric polymorphism." --Bjarne Stroustrup
First, move the Bj quotes referenced in the Boost document to the town building. Here, I want to say something about it. Boost spirit is a recursive-descent parser, which is depending on traditional procedural code, static (parametric) polymorphism and expression template. procedural Code control process, Static Polymorphism implements pattern matching and dispatching, and the Expression Template management syntax is used to generate the magic of spirit.
This article does not discuss the performance of Spirit. It only introduces some basic concepts and simple usage methods of Spirit. X3, and provides a simple example at the end. In the following one or two sections, we will introduce some basic compilation knowledge, such as lexical analysis, syntax analysis, and abstract syntax tree (AST), if X3), comprehensive attributes, inherited attributes, Terminator, and non-Terminator.
Terminals & Nonterminals
namespace x3 = boost::spirit::x3;
Terminator X3 represents a set of basic lexical units (parser), which are usually unary parser ), the source code of spirit will be analyzed in the subsequent sections for detailed explanation. Terminator is the most basic unit when you expand the syntax generative form. For example, x3: char _ matches a character,X3: ascii: alphaMatches a letter with an ascii code,X3: float _Matches a single-precision floating-point number. The matching string uses the Regular Expression Engine. For details, see character unit, number unit, and string unit.
A non-Terminator is generally composed of an Terminator according to a certain logical relationship. A non-terminator uses a combination of terminologies to generate a complex Syntax. For exampleX3: float _> x3: floatMatched with "16.0 1.2",> indicates an ordered relationship.* X3: char _Matching with "asbcdf234" is successful, but it also matches with "assd s ddd". In the world of lexical units, there are spaces or custom skipper (such as comments) will be ignored and skipped. For more information, see the x3.
We can see that in X3, we use the Terminator and the operator of C ++ to generate a non-Terminator. What type is the non-Terminator. In fact, expression template is used to create a syntax generated by a static tree structure. The process of expanding the generative form is a top-down in-depth traversal. When a non-Terminator is encountered, x3 tries to match its subsyntax units only to the Terminator.
Synthesized Attribute
Whether it is a terminator or a non-Terminator, after the string is matched successfully, they use the string as the input and will always output a value of a certain type. This value is the comprehensive attribute of the syntax unit. For exampleX3: char _The comprehensive attribute of is the char type value,X3: float _The value corresponding to the float type. The attributes of non-terminator are complex. You can refer to the comprehensive attributes of the composite syntax unit.
In addition to comprehensive attributes, there is also an inheritance attribute. The inherited attribute is also a value of a certain type like a comprehensive attribute. This value may come from the comprehensive attribute of a syntax-generated node. For example, for xml nodes <Node> </Node>, when parsing </Node>, it must match the previous one. Here is the scenario where the inherited attributes are used. Unfortunately, the inheritance attribute in x3 has not yet been implemented. In boost: spirit: qi, there is an implementation of the inheritance attribute. John is trying to implement the inherited attributes. However, this document does not discuss the inherited attributes.
Start Rule
At the beginning of compiling and parsing the source language, x3 needs to know the starting Syntax of its syntax generative formula, that is, the root node of the syntax generative static tree data structure. The entire analysis process begins to recursively follow the root node. The integrated tree structure of the root node can be the abstract syntax tree representing the source code. We can find that the lexical analysis and syntax analysis of X3 are merged into One Pass. Of course, you can also perform lexical analysis in the first step, and set the overall attribute of the root node as a string, and then perform syntax analysis in the second step.
Simple Examples1. parse "1.2, 1.3, 1.4, 1.5"
#include <boost/spirit/home/x3.hpp> // x3 core #include <boost/fusion/adapted.hpp> // adapt fusion.vector with std::vector// ......std::string source = "1.2 , 1.3 , 1.4 , 1.5";auto itr = source.cbegin();auto end = source.cend();std::vector<float> result;auto r = phrase_parse(itr, end, x3::float_ >> *(',' >> x3::float_), x3::ascii::space, result);
X3: float _> * (','> x3: float _)Indicates that data of the float type is followed by several(','> X3: float _). When you try to write the combined syntax generator, consider the syntax before considering the comprehensive attributes. Here we need to explore what is the comprehensive attribute of the combined generative form.','It is a character constant. It can be seen in the x3 document that it is a String constant.X3: bytesThe comprehensive attribute of isX3: unusedThis means that it only consumes (consume) source code strings and does not consume (consume) The placeholder of the comprehensive attribute. In short','> X3: float _',' Can be ignored, and its comprehensive attribute is the value of the float type. The comprehensive attribute of the entire production formula is the value of the std: vector <int> type, or its type is compatible with std: vector <int> (fusion. adapt ).
auto r = phrase_parse(itr, end, x3::float_ % ',', x3::ascii::space, result);
X3: float _> * (','> x3: float _)Can be simplifiedX3: float _ % ','.
2. parse "1.2, Hello World" and generate a custom comprehensive attribute
struct user_defined{ float value; std::string name;};BOOST_FUSION_ADAPT_STRUCT( user_defined, value, name)// .....std::string source = "1.2, Hello World";auto itr = source.cbegin();auto end = source.cend();user_defined data;auto r = phrase_parse(itr, end, x3::float_ >> ',' >> x3::lexeme[*x3::char_], x3::ascii::space, data);
With the Boost. Fusion library, we can adapt a struct to a tuple. Macro.BOOST_FUSION_ADAPT_STRUCTAdapt struct user_defined to boost: fusion: vector <float, std: string>.
X3: lexemeIs a lexical detector. The Lexical detector is also a parser and also has comprehensive attributes. The overall attribute of lexeme is a string value, but it modifies the behavior of the string iterator and does not skip spaces during matching. If spaces are skipped by default* X3: char _The space between strings is skipped, and the matching result is "HelloWorld". This is an incorrect result.X3: lexeme [* x3: char _]The matching result is "Hello World ".
The phrase_parse function is defined in the namespace boost: spirit: x3. Here, phrase_parse is an unqualified name. You can use ADL to find the function entry.
3. parse the identifier of C ++
The identifier of C ++ requires that the first character must be a letter or underline, and the subsequent characters can be letters, numbers, or underscores;
auto const identifier_def = x3::lexeme[x3::char_("_a-zA-Z") >> *x3::char_("_0-9a-zA-Z")];
The first method is intuitive. X3: char _ matches only one character. The x3: char _ overloaded operator call can list all the matching characters. Do not forget to use lexeme without skipping spaces.
auto const identifier_def = x3::lexeme[(x3::alpha | x3::char_('_')) >> *(x3::alnum | x3::char_('_'))];
The second method uses the built-in charactor parser. x3: alpha is the parser of a letter, and x3: alnum is the parser of letters and numbers.
auto const identifier_def = x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')];
This seems more concise, but it is actually incorrect. The reason is that '_' is a constant character, and x3: sums does not have a comprehensive attribute. Therefore, when we use this parser to parse an identirier, it will miss the underline.
auto const identifier_def = x3::raw[x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')]];
This example will give us a deeper understanding of the relationship between the matching string and the comprehensive attribute. Although the comprehensive attribute of the expression in the expression x3: raw's overloaded operator index ignores the underline, the matching string does not ignore the underline! X3: raw detector, which is an unary parser. Its comprehensive attribute type is a string. It ignores the comprehensive attributes of parser in its operator index and replaces them with matching strings! For example, in "_ foo_1", x3: lexeme [(x3: alpha | '_')> * (x3: alnum | '_')] the matched string is "_ foo_1" and its comprehensive attribute is "foo1". The comprehensive attribute of identifier_def replaces "foo1" with the matched string "_ foo_1.
4. parse C ++ comments
There are two types of annotations in C ++: "//" and "/**/". "//" Is comments until the end of the line. Comments are comments between "/*" and the next.
auto const annotation_def = (x3::lit("//") > x3::seek[x3::eol | x3::eoi]) | (x3::lit("/*") > x3::seek[x3::lit("*/")]);
Operator> and operator> are sequential relations, but the former is stricter than the latter. The latter can be matched if the parser connected by operator> sequence does not exist. However, the former has a predicate property, and the operator> connected parser must be matched to succeed. X3: eol and x3: eoi are two charactor parser, indicating the line break of the file and the end character of the file respectively. We care about the annotation matching strings, which will be ignored in real parsing, rather than the comprehensive attributes of the annotation syntax unit. X3: seek is another lexical detector. Its comprehensive attribute is still a string, which modifies the iterator behavior like x3: lexeme, matches a string until a specified character is displayed.
Use x3 in msvc
X3 uses C ++ 14 standard features, such as Expression SFINAE (basically its pot) and Generic Lambda. Most of the features of C ++ 14 are implemented in the vs2015 compiler, except Expression SFINAE. john passed the official X3 example and found that he only changed the code using Expression SFINAE to the traditional SFINAE method. In addition, there are bugs in the msvc14.0 compiler when Boost. Preprocessor library and decltype are used together. By the way, Microsoft and msvc have begun to implement the C ++ 17 Proposal. Even the C ++ 11 standard has not been fully implemented yet!
1. modify the code in <boost \ spirit \ home \ x3 \ nonterminal \ detail \ rule. hpp>
//template <typename ID, typename Iterator, typename Context, typename Enable = void> //struct has_on_error : mpl::false_ {}; // //template <typename ID, typename Iterator, typename Context> //struct has_on_error<ID, Iterator, Context, // typename disable_if_substitution_failure< // decltype( // std::declval<ID>().on_error( // std::declval<Iterator&>() // , std::declval<Iterator>() // , std::declval<expectation_failure<Iterator>>() // , std::declval<Context>() // ) // )>::type // > // : mpl::true_ //{};template <typename ID, typename Iterator, typename Context>struct has_on_error_impl { template <typename U, typename = decltype(declval<U>().on_error( std::declval<Iterator&>(), std::declval<Iterator>(), std::declval<expectation_failure<Iterator>>(), std::devlval<Context>() ))> static mpl::true_ test(int); template<typename> static mpl::false_ test(...); using type = decltype(test<ID>(0));};template <typename ID, typename Iterator, typename Context>using has_on_error = typename has_on_error_impl<ID, Iterator, Context>::type;//template <typename ID, typename Iterator, typename Attribute, typename Context, typename Enable = void>//struct has_on_success : mpl::false_ {};////template <typename ID, typename Iterator, typename Attribute, typename Context>//struct has_on_success<ID, Iterator, Context, Attribute,// typename disable_if_substitution_failure<// decltype(// std::declval<ID>().on_success(// std::declval<Iterator&>()// , std::declval<Iterator>()// , std::declval<Attribute&>()// , std::declval<Context>()// )// )>::type// >// : mpl::true_//{};template <typename ID, typename Iterator, typename Attribute, typename Context>struct has_on_success_impl { template <typename U, typename = decltype(declval<U>().on_success( std::declval<Iterator&>(), std::declval<Iterator>(), std::declval<Attribute>(), std::declval<Context>() ))> static mpl::true_ test(int); template<typename> static mpl::false_ test(...); using type = decltype(test<ID>(0));};template<typename ID, typename Iterator, typename Attribute, typename Context>using has_on_success = typename has_on_success_impl<ID, Iterator, Attribute, Context>::type;
2. modify the code in <boost/spirit/home/x3/support/utility/is_callable.hpp>
//template <typename Sig, typename Enable = void> //struct is_callable_impl : mpl::false_ {}; //template <typename F, typename... A> //struct is_callable_impl<F(A...), typename disable_if_substitution_failure< // decltype(std::declval<F>()(std::declval<A>()...))>::type> // : mpl::true_ //{}; template <typename Sig> struct is_callable_impl : mpl::false_ {}; template <typename F, typename ... A> struct is_callable_impl<F(A...)> { template <typename T, typename = decltype(std::declval<F>()(std::declval<A>()...))> static mpl::true_ test(int); template <typename T> static mpl::false_ test(...); using type = decltype(test<F>(0)); };
3. Modify BOOST_SPIRIT_DEFINE in <boost/spirit/home/x3/nonterminal/rule. hpp> to the following code:
#define BOOST_SPIRIT_DEFINE_(r, data, rule_name) \ using BOOST_PP_CAT(rule_name, _t) = decltype(rule_name); \ template <typename Iterator, typename Context, typename Attribute> \ inline bool parse_rule( \ BOOST_PP_CAT(rule_name, _t) rule_ \ , Iterator& first, Iterator const& last \ , Context const& context, Attribute& attr) \ { \ using boost::spirit::x3::unused; \ static auto const def_ = (rule_name = BOOST_PP_CAT(rule_name, _def)); \ return def_.parse(first, last, context, unused, attr); \ } \ /***/
Expression SFINAE is not implemented in msvc. The reason for the modification in section 3 is that BOOST_SPIRIT_DEFINE seems to be in conflict with decltype. John wrote some test code and finally locked the problem in the use of decltype (rule_name) as the parameter type. There is no problem in compiling on gcc. It should be that msvc's support for decltype is not complete yet. BOOST_SPIRIT_DEFINE involves the use of x3: rule, which will be detailed in the next article.
Ending
At first glance, Boost. Spirit makes the C ++ syntax completely unrecognizable. In fact, the best practice is to reload operator when processing Expression templates. In the UE4 UI framework, this technique is also widely used in some Expression Template-based mathematical libraries. Recursive Descent-iterations are human beings, recursion is God; Static Polymorphism-shapes are scattered while God does not. The Expression Template is applied in it, just like the bone framework of the previous two. However, if Expression Template is built with a particularly complex Syntax, it will make the compiler very heavy, reduce Compilation speed, and even lead to a type identifier length greater than 4 K! These issues will be discussed in the later sections together with the efficiency issues during the Spirit runtime. In general, Mr. Smith thinks that Spirit is still elegant.