How to Create a programming language ?, Create a programming language?
Programming language, as a bridge between people and computers, has an important and far-reaching significance. People who have experience in computer programming have learned or mastered one or more programming languages. There are hundreds of programming languages in the computer field, and there are dozens of mainstream programming languages. Each programming language has different fields and features, but ultimately it aims to solve the problem of communication efficiency between people and computers and improve computer productivity. I think many people will be very fond of those creators of mainstream programming languages, and I believe many will be curious about how a programming language was born. How can we create a programming language?
In general, creating a programming language requires the following processes:
(1) Features of the design language.
(2Defines the words, syntax, and semantics of a language.
(3) Implement the compiler or interpreter to translate the program into the underlying representation of the computer.
(4Generate the binary storage format of the computer program.
(5Improve the runtime environment and standard library of the language.
I. Language Feature Design
The so-called language feature is what kind of atomic features the programming language provides for developers. For example, whether mathematical expression calculation, string processing, variables, functions, and Recursion are supported, and whether Branch and cyclic compound statements are supported. The type of language variables is strong, weak, or dynamic. The program is procedural, functional, or object-oriented. Whether templates, generics, and reflection mechanisms are supported, whether multithreading and concurrency features are supported, and whether error and exception handling mechanisms are supported.
Language feature design is the most critical part of a programming language. It directly determines the basic features and prototype of the language. Of course, this is also the most difficult part, because language design is oriented to specific problem fields and is the summary and sublimation of Language designers from a large number of programming practices. For example, the C Language designers want to target the underlying computer and have the ability to directly manipulate the operating system and hardware. The Python designers want to minimize the tedious process of operating computer resources to gain the simplicity, high flexibility, and scalability of the language. The SQL designer targets specific data query and analysis fields and hopes to help developers quickly retrieve and manipulate data. The designers of the Go language hope to extend the programming language's support for high-concurrency environments based on the excellent functions of the C language, and have the garbage collection and rapid compilation capabilities.
In all these cases, programming language features are designed for specific problem areas and are the intermediate layer built by Language designers between developers and computers, it is an atomic "encapsulation" of repetitive functional logic during development. The ultimate goal is to improve the efficiency of software development in specific problem areas.
2. Words, syntax, and Semantics
Similar to natural languages used by humans, programming languages also have their own words, syntaxes, and semantics. They are specialized in lexical notation, linguistic grammar, and semantics.
Common lexical symbols can be divided into numbers, characters, strings, identifiers, keywords, operators used to connect expressions, operators used to split statements or program paragraphs, and other symbols. These are the basic units of programming language programs. Through their ordered combination, a variety of code snippets of a programming language are constructed.
The syntax of a programming language is used to describe the syntax rules of a language. Specifically, it specifies the order and rules of the arrangement and Combination Between Lexical marks. It describes the basic mode of programming language programs, and the arrangement of lexical marks that do not conform to this mode is out of the door of legal language programs. At the same time, it is also the most obvious differentiation feature for developers in various programming languages. An experienced developer can easily identify the computer program written in this programming language by scanning a piece of code.
The semantics of a programming language describes a program that conforms to the language syntax. The true meaning of a computer is the willingness and instructions that developers ultimately need to convey to the computer. Language semantics must be accurate and unambiguous. the compiler translates computer programs into computer-identifiable expressions through semantic guidance.
3. Program Translation
Computer programs are used for reading and modification. Computer hardware cannot understand the ideas and meanings in the program. Therefore, there must be a translation conversion process, which accurately transmits the wishes expressed by people to the computer, so that the computer can clarify the instructions issued by the executor. The tool for implementing such translation is the compiler or interpreter.
For compilers, the input is a computer language program written by humans, and the output is the underlying representation that computers can recognize. First, it needs to identify the word in the program, that is, lexical analysis. Then, the program's syntax structure is identified based on the word combination pattern, namely, syntax analysis. Finally, according to the semantics corresponding to different Syntax structures, the program is converted into a computer-identifiable command sequence in the form of each syntax module, that is, Semantic Analysis and target code generation.
As we all know, the implementation of compilers has a certain degree of complexity. The root cause is the structural flexibility of the language syntax and the diversity of expressions at the bottom of the computer. This is also the core step for creating a programming language.
Iv. Binary Storage
After translating a language program, the compiler needs to store the converted results so that the computer can load and execute it as needed. Two problems are inevitable:
(1) What is the form of the converted result?
(2) Where are the converted results saved?
The first problem is how a computer program is converted into a form that can be identified by a computer. Because the hardware module of the program running on the computer is the CPU, only the binary Instruction format of the computer program converted to the CPU can be correctly identified and executed. For example, the CISC Instruction format of the common Intel System and the ARM system's RISC execution format.
The second problem describes how a computer program is stored on a computer disk after being converted to a binary Instruction format. Because most computer programs need to be loaded and run by operating systems running on computer hardware, the Binary Expression of computer programs must be stored in a file format that can be recognized by the corresponding operating system. For example, the common Windows operating system PE file format and Linux operating system ELF file format.
V. runtime environment and standard library
Theoretically, if a programming language can provide complete atomic functions for operating systems and hardware, it will be successful. However, without powerful Runtime Environment Support and standard libraries, it is difficult to make a programming language really useful and popular. No one wants to simply print a line of string, but also needs to use the basic features provided by the programming language to call the printing interface logic provided by the operating system. The Java language has been booming because it not only provides a sound Runtime Environment and Development Library support, but also provides a more powerful development framework and tool support.
Therefore, in addition to the complete language features, it can be seen that it provides developers with more convenient and easy-to-use libraries and frameworks to eliminate complex and repetitive logic during software building, it is a great way to develop a good programming language.
6. Start it on your own!
The book "self-built compilation system-compilation, compilation and linking" details the process of a programming language from scratch, from functional feature design to lexical, grammar, and semantic analysis; from the design implementation of the compiler and assembler to the link of the target file to generate the executable file; even the implementation of the compiler, binary commands, executable file formats, and concepts of language runtime and standard library are carefully analyzed in the book. I believe that reading this book will be a good experience of gaining knowledge!