Reprint Source: http://my.oschina.net/xhan/blog/309615
(connected to the previous article)
--------------------------------------
Realize
--------------------------------------
The extended language is always interpreted in some way by the application. Simple extended language can be interpreted directly from the source code to execute. Embedded languages, on the other hand, are often powerful programming languages with complex syntax and semantics. A more efficient embedded language implementation technique is to design a virtual machine that is suitable for language requirements, compile the bytecode of the extension into a virtual machine, and then simulate the virtual machine by interpreting the bytecode (Betz 1988, 1991; Franks 1991). We chose this hybrid architecture to implement LUA, and it has the following advantages over the direct execution of the source code:
Because lexical and syntactic parsing is done only once, it is possible to use an external parser before actually embedding, to identify simple early errors, to achieve shorter development cycles and faster execution times;
If you use an external compiler, you can only provide extensions in the form of bytecode, which is precompiled, which can make loading faster, the environment more secure, and the runtime smaller (however, connecting several precompiled extensions can be a daunting task).
This architecture was first used for Smalltalk (Goldberg–robson 1983; Budd 1987) (bytecode is the term borrowed from it) is also successfully used in the Pascal system of UCSD based on P-code (Clark–koehler 1982). In these systems, bytecode virtual machines are used to reduce complexity and increase portability. This method is also used to migrate the BCPL compiler (Richards–whitby-strevens 1980).
Compiler code for extenders can be generated using standard tools, such as Lex and Yacc (Levine–mason–brown 1992). The existence of a good tool for building compilers and its widespread use in the late 70 was the main reason for the germination of small languages, especially in UNIX environments. We use YACC for parsing when we implement Lua. Initially, we used the parser written by Lex. By performing a performance analysis of the production process, we found that the module took up almost half of the time it takes to load and execute the program. Then we rewrite the module directly with C, and the new parser is more than twice times faster than the old one.
-------------------
Virtual Machines for Lua
-------------------
The virtual machine used in Lua is a stack machine. This means that it does not have random access memory: All temporary values and local variables are stored in the stack. In addition, it does not have a universal register, only a few special control registers to control the stack and program execution. These register stacks, top of stack and program counters (base of stack, top of stack and programs counter).
A program for a virtual machine is a sequence of instructions, called a byte code. The execution of the program is implemented by interpreting bytecode, and each instruction operation is performed at the top of the stack. For example, the statement
A = B + f (c)
is compiled as:
Pushglobal "B"
Pushglobal "F"
Pushmark
Pushglobal "C"
Callfunc
ADJUST 2
ADD
Storeglobal "a"
LUA's virtual machines have about 60 instructions, and accordingly, they can be represented using 8-bit bytecode. Many instructions (for example, ADD) do not require additional parameters; These instructions run directly on the stack, and the compiled code consumes only one byte. Other directives (for example, Pushglobal and Storeglobal) require additional parameters and require more than one byte to occupy. Because parameters can take one, two, or four bytes, which causes byte alignment problems on some architectures, you can solve the problem of boundary alignment by filling in empty (NOP) directives.
Many instructions exist for optimization purposes only. For example, there is a PUSH instruction that requires a number as a parameter and pushes it, but there is also a single-byte optimized version for the stack of commonly used values, such as 0 and 1. Therefore, we have PUSHNIL,PUSH0,PUSH2,PUSH3. This optimization reduces the space consumption of bytecode and the elapsed time of instruction execution.
Recall that Lua supports multiple assignments and multiple return values. So, sometimes, a value list must be adjusted at run time to a given length: if the actual value is more than required, then the extra value is discarded, and if more than the actual value is required, the nil extension in the list is required. The adjustment is done on the stack via the Adjust command.
Although multiple assignments and multiple returns are a powerful feature in Lua, they are also an important source of compiler and interpreter complexity. Because the function has no type declaration, the compiler does not know how many values the function will return. Therefore, the adjustment must be done at run time. Similarly, the compiler does not know how many arguments the function uses. Because this number may be different at run time, it is equal in the argument list between the Pushmark and Callfunc directives.
One way to extend Lua's use of functions provided by the host is to assign each such function to a byte code as an instruction (Betz 1988). While this strategy simplifies the interpreter, its disadvantage is that only less than 200 external functions can be added, because LUA has only 8-bit bytecode, and LUA itself has used 60 of them as a fundamental instruction. So we chose the host to explicitly register the external functions and treat them as if they were native LUA functions. Therefore, a single callfunc instruction is sufficient; The interpreter determines what to do based on the type of function being called.
A rather different strategy is presented by Franks (1991): All external functions in the host can be called by the embedded language, and no explicit registration is required. This is done by reading and interpreting the symbol table generated by the linker. This solution is convenient for application programmers, but is not portable, depends on the format of the symbol table file and the relocation policy of the operating system used (Franks uses a specific DOS compiler).
-------------------
Internal data structure
-------------------
As mentioned earlier, Lua's variables have no type; Therefore, the value is implemented by a struct (struct) with two fields: a type and a union (union) that contains the actual value. These structures appear in stacks and symbol tables, and the symbol table holds all the global symbols.
Values are stored directly into the union. The string is stored in an array, and a string (string) of type is a pointer to the array. The value of a function type is a pointer to a byte-code array. The type cfunction value is a pointer to the C function actually pointing to the host program, and the value of the user data type (userdata) is similar.
Table is implemented as a Hashtable, which handles hash collisions by separate links (which is why indexes in a table are arbitrary). If a table is created with its size (size), then the size is used as the size of the hash table. Therefore, by giving a hash table a size that is approximately equal to the number of elements in the table, some hash collisions are reduced, resulting in a more efficient index position. In addition, if the table is used as an array, that is, only numeric subscripts, selecting the appropriate size when creating the table will ensure that no hash collisions are made.
All LUA internal data structures are dynamically allocated arrays. When there are no more empty locations (free slots) in these arrays, garbage collection is performed automatically, and the Lua garbage recovery algorithm uses the tag-purge (mark-and-sweep) algorithm. If no space is reclaimed (since all values are referenced), the array is redistributed and the dimensions are enlarged one-fold.
Garbage collection provides convenience to programmers because it avoids explicit memory management. Garbage collection is valuable when Lua is used as a standalone language (it is often). However, when Lua is used in the host program (which is its main purpose), garbage collection brings new annoyance to application programmers interacting with LUA: Be careful not to store tables and strings in Lua in C variables, because these values may be reclaimed during garbage collection, if the LUA They have no other references. Rather, the programmer must explicitly copy these values into the C variable before controlling the return to Lua. While this is a different pattern, it is at least as poor as memory management using the Malloc-free protocol in the standard C language library.
--------------------------------------
Conclusion
--------------------------------------
Lua has been used extensively in production since 93 and performs the following tasks:
The user's configuration in the application;
General data entry, using user-defined validation procedures;
Description of the user interface;
Programming description of the Application object;
Stores structured graphical metafile for communication between the graphical editor and the application.
In addition, Lua is the basis for a visual programming system that is currently being considered.
Loading and executing Lua programs at run time makes configuration easy for users and developers. In addition, the presence of a common embedded language reduces language incompatibility and encourages better design, separating application configuration issues from the other main problems of the application.
The LUA implementations described in this article can be downloaded from anonymous ftp: http://www.lua.org/ftp/lua-1.1.tar.gz
-------------------
Thanks
-------------------
Thanks to all the staff working in ICAD and Tecgraf for using and testing Lua. The paper mentions the development of industrial applications and the PETROBRAS (cenpes) and Eletrobras (Cepel) research centres as partnerships.
--------------------------------------
Reference documents
--------------------------------------
M. Abrash, D. Illowsky, "roll your own minilanguages with Mini-interpreters", Dr Dobb ' s Journal (9) (Sep 1989) 52–72.
A. v. Aho, B. W. Kerninghan, P. J. Weinberger, the AWK programming language, Addison-wesley, 1988.
B. Beckman, "A Scheme for Little languages in interactive graphics", software, Practice & Experience 21 (1991) 187–207 .
J. Bentley, "Programming Pearls:little languages", Communications of the ACM 29 (1986) 711–721.
J. Bentley, more programming pearls, Addison-wesley, 1988.
D. Betz, "Embedded languages", Byte #12 (Nov 1988) 409–416.
D. Betz, "Your own tiny object-oriented language", Dr Dobb ' s Journal (9) (Sep 1991) 26–33.
T. Budd, A Little Smalltalk, Addison-wesley, 1987.
R. Clark, S. Koehler, the UCSD Pascal handbook:a Reference and guidebook for programmers, Prentice-hall, 1982.
M. Cowlishaw, the REXX programming language, Prentice-hall, 1990.
L.. H. de Figueiredo, C. de Souza, M. Gattass, L. C. Coelho, "Geração de interfaces para Captur A de dados sobre desenhos ", Anais do Sibgrapi V (1992) 169–175 [in Portuguese].
N. Franks, "Adding an extension language to your software", Dr. Dobb ' s Journal (9) (Sep 1991) 34–43.
A. Goldberg, D. Robson, Smalltalk-80:the language and its implementation, Addison-wesley, 1983.
R. Ierusalimschy, L. H. de Figueiredo, W. Celes Filho, "Reference Manual of the programming language Lua", Monografias em Ciência da Computação 4/94, Departamento de Informática, Puc-rio, 1994.
J. R. Levine, T. Mason, D. Brown, Lex & Yacc, O ' Reilly and Associates, 1992.
C. Nahaboo, A Catalog of embedded languages, available from [email protected]
M. Richards, C. Whitby-strevens, Bcpl:the language and its compiler, Cambridge University Press, 1980.
B. Ryan, "Scripts unbounded", Byte (8) (1990) 235–240.
R. Valdés, "Little languages, Big questions", Dr Dobb ' s Journal (9) (Sep 1991) 16–25.
Design and implementation of Lua1.1 Lua (II.)