Self-made dynamic language Medusa interpreter

Source: Internet
Author: User
Tags arithmetic operators expression engine

Today compiled a study of the principles of compiling, the implementation of language Medusa when writing notes, made a similar description of the document, the introduction of the next Medusa


Syntax section

Medusa is a dynamic scripting language in which syntax and code formats draw on Python and run code at the top level.

MedusaNot recognized ': '; ' and other symbols, but identify white spcae and newline characters, user programming Note: 1 +-9: Execution Result Yes-81 +-9: Parsing error

The statement statement a class of Func defination, if, while, return, expression. Several statements block, separated by ' {', '} ', Block is Def, if, while sentencethe main structure
if expression {statements} else {statements}while expression {statements}def name (paras) {statements}

The variables defined by the top-level code are placed in the globalEnvironment, the block's execution will have its own local environment, and the local env has a pointer to the parent environment
The function definition starts with the ' def ' keyword and currently does not support defining functions in block, all functions are saved by the Func_list list, support built-in function definitions, and currently print is the only built-in function.
Medusa with simple error handling, the wrong message has line number, character position, error tick flow and prompt, the level is limited, does not support error correction, prediction processing, after the error is terminated program.

internal design and structure
Interpreter: The Interpreter class, which contains the data structures that are required in the interpretation process, such as a tick flow, a function table, a global environment, a stack, and so on. The INIT process completes the initialization and adds built-in functions to the function table, with the current built-in function having only the print function
Lexical Analysis:Use your own regular expression engine (supports basic regular notation: ' * ' + ' | ') ' (' and ') '), build state machine output, state transition matrix. The lexical Scaner checks the source program text, outputs a tick stream, encounters a string that does not conform to regular rules, and exits with an error. The algorithm of constructing state machine: first converts the regular infix form to the suffix form, for example A|b|c|d to a b c D | | |, using the stack to complete the construction of uncertain finite state automata, and then using the closure algorithm to convert the NFA to DFA.
Syntax Analysis:The procedure does not use automatic generators such as YACC, bison, and so on, using the recursive descent algorithm, which is the context-independent syntax for the program as follows: PROC-def ID (ID ...) {statements}|statement
Statement-returnexp         | exp
    | Ifexp {Statements} else {statements}
    | Whileexp {statements}
The expression exp uses arithmetic operators, comparison operators, assignment operators to concatenate nodes such as callfunc (function calls), variable (variables), literal (literal constants), and so on.The representation in the parsing module is AST. The Medusa does not have a BOOL type variable, unlike C, C + +, which converts a numeric type to bool type.if, while sentences only discriminate exp result not 0 and 0. The parsing process also adds a custom function to the function table
Semantic Analysis:Since Medusa is a dynamic type of language, variables do not need to be declared, defined and used directly, so the type-checking part of the traditional semantic analysis is difficult to implement, the interpreter will perform this function as the execution period, and the variable symbol table is moved to the execution period to establish
Execution:execute.cc for executing the procedure Code, executing statements, the most important exp execution process is the post-order traversal of the AST tree, recursively calling the value of the Solver expression. The execution process interpreter will also complete the garbage collection work, the GC algorithm is Mark&sweep,symbol TableThe program context is represented in the form of pair<string,mdsobject*>, where multiple env objects can differentiate the scope of the program, each Env object has a parent domain pointing to the Father environment, and the topmost env is the global env, Search the upper environment when there is no variable declaration for the current scope;The stack is used to hold temporary values for exp calculations, which have no variable names.
memory allocation:Using the two-level allocator model, the lowest layer is the memory pool that encapsulates the malloc, free function, which is used to request large chunks of memory from the kernel, the sub-allocator is the object allocator, and the list of idle areas is maintained, and the linked table places the idle areas of size 8, 16, 24...126 bytes. The interpreter space request will be satisfied from here, and a new space will be applied to the memory pool when the idle area is insufficient. The allocated memory block is pre-positioned with a Blockmeta class, placing the mark flag bit and size information. (The memory module is a wheel made for personal understanding of the memory model and allocation strategy, in fact malloc, free, explained here)
Variable Model:The Medusa type discards the original type design in C, Java,the value of a variable is not placed within the variable, butreference to Python, using a reference model. Medusa contains three built-in types, int, string, list, to reduce the workload, eliminating the types of float, bool, long, and so on. Three built-in types correspond to Mdsintobject, Mdsstringobject, mdslistobject three classes, inherit from Mdsobject, implement different calls with polymorphism, and provide a unified upper interface. all value objects have only a unique copy in memory and are placed in Int_map, String_map, list_map three tables, and the table's index is the hash value of the object value, respectively (int, char*, void*[]). Env maintains a "pair<string (variable name),mdsobject*array of > ".
The expression "c = A+b" is performed as follows: Look for the keyword "a", "B" in the local env, extract the value of the variable, i.e.The mdsobject Pointer calculates the sum of two objects and uses the hash value of the result toInt_map, String_map, List_map look for objects, not new and inserted, and finally write the contents of the keyword "C" in Env as the address of the result object.
GC:MedusaSupports garbage collection of objects, the starting point of the mark&sweep is the global environment, the function Local environment list, the objects in the stack container for the operation process (saving temporary variables). The GC trigger time is every 10 times the memory request operation.



Self-made dynamic language Medusa interpreter

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.