A powerful language

Source: Internet
Author: User

I am increasingly feeling the beauty of language, the strong language.

The language here is a broad concept that can be either Chinese, English or natural, or c,c#,python,lisp, or a domain-specific language (DSL) of its own definition. More broadly, it can be music and DNA sequences.

A language is a string, a set of sequential chains consisting of different strings. However, the more simple and simpler the model is, the more powerful its ability is.

Language represents knowledge, e=mc^2;

The language represents the beauty: "If life is just like the beginning, what autumn wind sad painting fan"

The language represents the power: "Perish and the name of the dead, not waste streams of eternal flow"

Language represents wisdom: (define (fib N) (Cond (= n 0) 0) ((= n 1) 1) (Else (FIB (-N 1)) (Fib (-N 2)

The language represents the mode: AA BB cc AAA BBB CCC ...


Language is a work of art, as a programmer, you should treat your own code like a work of art, elaborate its structure, optimize its performance, as beautiful as a poem, it is a symbol of wisdom and glory. Unfortunately, the vast majority of people write code, even the right is not to mention the aesthetic!

Language classification

I feel the beauty of language, because the ability to express language is endless. Combining a large amount of information and understanding of learning, the language can be divided into the following levels, each of which belongs to its next level.

LV1: Table

The most basic language, obviously, is a table, which can also represent dictionaries, key-value pairs, lists, and so on. It has only the structure. General conditional statements and decisions can be represented by a table.

Basic operations: And, branching, looping

Tool: List,dict

Processing mode: loop, indexed access

Do not underestimate the performance of the table, the usual table is just bearer data, if you can host actions and commands, then it will be very powerful, think of Lisp and S-expression.

LV2: Regular language

The students who have used regular expressions know that the regular expression is powerful, and it defines a set of closed regular languages through basic operations. However, it cannot count and implement nested structures.

Basic operations: Basic operation of tables, sequences and difference sets

Tools: State Machine (NFA,DFA)

Processing modes: matching, pattern analysis

LV3: Context-independent language

Because the state machine cannot handle nested structures, it introduces context-independent languages. The language introduces a stack to hold information. But typically, in implementations, the stack is implicitly in the recursive stack of a recursive descent process rather than explicitly implemented.

Basic operations: basic operation of regular language, nesting, recursion

Tools: Recursive descent syntax analysis

Processing mode: Generate syntax tree, calculate

LV4: Context-sensitive language

The individual statements of a context-free language are independent, with no environment and context. Therefore, it is difficult to save and transfer states. Thus the context-sensitive language is generated.

Basic operations: Basic operation of the context language + reference, assignment

Tools: Tool + Symbol table for context-independent languages

Processing mode: Implements common language, Turing complete

LV5: Natural language

These languages, which represent strict logic and conditions, express intentions with a definite semantic model, without ambiguity and dynamics. Such languages are computable, and there are explicit semantic models to represent them.

However, when the blur is added, its dynamics and functionality become much stronger. The typical natural language is that it has no explicit semantic model. There is no right technology to really thoroughly analyze natural language.

It is noteworthy that we refer to branching, looping, and, sequencing, replenishing, nesting, recursion, storing symbols. But there is still no mention of feedback . Feedback is not a static language concept, but a self-tuning at runtime.

LV6: Consciousness and experience

A lot of experience and knowledge is difficult to express in words, even if you can recite the car driving guide, if there is no practical practice, it is not driving. The experience of others, even if they are barely expressed in words, will be compromised no matter how clearly they are expressed. It can be thought that, even if the language is the same, and the most powerful compiler interpreter in the brain is different for each person, there is a completely different understanding. Here, feedback has an important role to play.

I regard consciousness and experience as the "natural language" of the runtime.

Natural language

Natural language uses words to define morphemes, to express vague semantics with phrases and sentences, and to replace the established symbol tables with vague contextual contexts. Blurring creates beauty and can describe stories, novels, poems, and all the good things.

The Grammar of natural language

Because of ambiguity, grammar becomes secondary in natural language. Grammatical sentences are good, but non-grammatical sentences can still work well. Grammar and vocabulary are constantly evolving in the development of language. Every day there is a new expression, the change is the biggest unchanged. An interesting question is that grammar is also a language, can we discover the law of the development of natural language in the development process of grammar?

To say a digression, I think English teaching, grammar long ago should be relegated to the second line, and the language of reading and writing, is the right way to learn languages.

Basic operations

Because of the fuzziness of natural language, probability theory has become a powerful tool to analyze natural language.  Even so, what we can do is very limited, by combining the probability theory with certain rules, we can achieve word segmentation, named entity recognition, speech prediction, syntactic tree generation, affective analysis, keyword extraction, automatic summary generation ... These techniques can only be referred to as "natural language processing" and cannot be called "natural language resolution".

If the corpus is comprehensive enough and can cover the whole context of a topic, I believe that pure probability theory will win the final victory. The problem is that the acquisition and analysis of Corpora is very complex and expensive, and this is not something that can be done by grabbing a batch of news data from the Internet. At this stage, I believe that the rule of experience and the combination of probability theory of the treatment method.

Computational scalability

Can natural language be calculated? As an example:

If it rains today, then take the bus to work. This is a typical computable sentence.

Buy three pounds of apples, if there is watermelon, buy three catty watermelon. This represents two semantics, without contextual context, and it is difficult to determine the real strategy.

Is this the most common example of a poem and a story that can be counted? What is the result of such a calculation?

The computable here may be different from the "computational nature" of the standard concept. But since people can deal with language, natural language is computable. The core of dealing with natural language is not derivation, but association. The brain has the ability to connect a multitude of concepts, through association, with derivation and learning, eventually deriving new concepts, methods and experiences, and finally putting them into action.

What can we do?

now that the natural language is so magical and complex that the current technology cannot construct a brain-like Lenovo Machine, how do you make the language go farther? programming language in the development process, has developed a very perfect compilation algorithms and tools, so, a simple idea is, can you use these powerful techniques in dealing with natural language?

Although the language is changeable, but it is regular, its morpheme is stable, such as describing the concept of "1", the number of its expression is bound to be limited. There is always a limit to describing the expression of numbers. A complex combination that becomes stable when split into sub-units. So we can always use rules to construct the expression of basic units, such as numbers and time. On top of that, a set of rules tree structure is formed by describing the way of unit combination.

Some treatments are more suitable for probabilistic analysis, such as word segmentation, speech prediction, and affective analysis.

My consideration is that through a set of DSLs, using rules and probabilities, to some extent , to circumvent the diversity and ambiguity of natural language, the text can be converted into computable and unambiguous statements, the process is called "text normalization", as a natural language compiler front-end, Eventually into the general sense of the programming language compiler.

Everything is the pattern, the sub-pattern is combined into the parent mode, the pattern is matched, the new pattern is modified and assembled.

At present, the work in this area has made good progress. It is now possible to calculate such things as:

27 +15*15     14.5 squared plus 83 divided by 3.5
Limitations of language

Speaking of the power of so many languages, but language also has innate limitations. A higher level of knowledge than language is consciousness and experience. The language itself is like code, and if it is not compiled, it is a bunch of useless strings.

Human understanding of some problems, language may not be the best medium, sometimes a picture wins thousands of words, the ability of multimedia expression to better inspire association. The value of practice may be much more important than the language itself, but it is unfair to compare the concept of runtime with the concept of compile time.

When I was 1.5, I was asked what I was doing, and I had a hard time answering the question before, in fact, my first year was primarily a text-processing front-end for speech synthesis (TTS), and for half a year, I was interested in compiling principles, grammar inference and pattern classification. The problem slowly became clear, that is, to study "language".

Obviously this article has a lot of mistakes, I even a compilation principle have not read it, welcome to spit Groove. After that, I'll summarize a series of knowledge and ideas about DSLs.

A powerful language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.