How to design a language (1) -- What is pitfall ()

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The reason for this series is that Wang Yin wrote a blog article about go injection. Why? M = 8cc4f95228f942f8886109d876d1b041, which has a detailed explanation. Then this article was posted on Weibo, and many bloggers immediately showed the ugly side of human nature:
1. those advocates of go, because go was sprayed, they felt that their personality was insulted. They were too late to see the link in the last section and began to dance.
2. Wang Yin is really not compatible with people, so many people determine that his things have no reference value ".

But to be honest, the article is a bit rude, which also prevents those who are not learnt to continue to read the essence of the later part. If all the articles are like this, it would be nice. Then the rotten people will always be bad people. If they do not correct their own mentality, they will never get any useful knowledge. They will always go through the days of the moon, using spam language to write programs that are worthless for a lifetime.

Let's talk about this nonsense. Next I will talk about my own opinion on language. Why design a new language? There are only two reasons, either the old language is really unacceptable or the specialized language designed for the domain. I will not talk about the next one, because this kind of things will never be good if there is no specific domain knowledge (for example, SQL will never come from a very bad database ), basically, this is not a question of language design. Therefore, this series only targets the previous situation-design a common language. General languages actually have their own "fields", but there are too many, so they are deprecated. Throughout history, you have asked a person who has worked in a few fields to design a language. If he has not been systematically educated in programming language theory, he can only make a shit. For example, go is one of them-although he is awesome, it does not include the "design language.

Therefore, in the 21st century, you still need to be a language, which is nothing more than not satisfied with all the general languages, so you want to do it yourself. What is dissatisfaction reflected in? For example, the reason for C # is that he is not handsome enough. For example, the reason for C ++ is that his IQ is too low to hold, for example, the reason for Haskell is that there are too few people to recruit, for example, the reason for C is that it is really impossible to complete and abstract, so people without the linus level will write the C language as shit, but you cannot recruit linus. In short, there are various reasons. However, in terms of excluding users' IQ, there are actually several languages that I really like-C ++, C #, Haskell, Rust, and Ruby. If I want to rank the top five languages in the world, the top five will be the top five, although it may be difficult for them to decide whether to win or lose. Even so, in fact, these languages make me uncomfortable. I have always wanted to create a new language (for myself (?)), The evidence is-"view my blog ".

So. What are the advantages of a good language? For a long time, people think that language can be used only when libraries are easy to use. In fact, this completely reverses the causal relationship. If there is no good syntax, how can we write a good library? It is also very easy to look for examples. It is enough to compare Java and C. The reason why C #'s library is easy to use is inseparable from the strong expression ability of other languages, such as using linq (, to xml, to SQL, to parser, etc ), for example, WCF (only usability), for example, WPF. Can Java write these libraries? Hard to write or write, but you will find that you cannot use them in any way. In fact, this is caused by Java syntax garbage. At this time, you can look up and see the five languages listed above. Their features are: the library is very easy to use due to syntax reasons.

Of course, this does not require everyone to learn the language to write the database. The distribution of programmers is the same as that of the pyramid structure. It is good to let a few people write the database. Even if most people use it, they do not need to learn so much unless you want to become the ones who write the database. However, there has recently been a bad trend, that is, some people feel that it is difficult for a language to [easily] become a database writer. They started to say that it is not good here, I won't name anyone. Everyone knows, hehahaha.

In addition to easy and easy to use, a good language has two important features: easy to learn and easy to analyze. This is not to say that you can learn it at will, but that you can guess many unknown features as long as you have mastered the portal. In this case, there is a syntax consistency problem. Syntax consistency is an easy-to-ignore problem. All errors caused by poor syntax consistency are obscure and difficult to see at a glance. Here I will give you a few examples to establish this concept.

The first example is the definition of pointer variables in C language, which we like to hear:

int a, *b, **c;

I believe many people have been put into this kind of thing, so many textbooks have told us that when defining a variable, the asterisks at the end of the type should be written before the variable to avoid misunderstanding. So many people will think, why is it designed like this? It is obvious that it is a pitfall for people to jump down. But in fact, this is a good example of syntax consistency. As for why it is a pitfall, the problem lies elsewhere.

We all know that when B is a pointer to an int, * B returns an int. Defining a variable int a is also equivalent to saying "defining a as an int ". Let's take a look at the above variable Declaration: int * B ;. What is this about? In fact, it really means "defining * B is an int ". This "consistent definition and use" method is exactly what we need to respect. Function Definition parameters in C language are separated by commas (,), and they are also separated by commas. Function Definition parameters in Pascal are separated by semicolons. When called, they are separated by commas, which reduces the consistency.

Here you may say, how do you know that's what he thinks about C language? I personally think that if he doesn't think so, it won't be worse, because there is another example:

int F(int a, int b);int (*f)(int a, int b);

This is also an example of "consistent definition and use. In the first line of code, how do we look at the writing of "int F (int a, int B? In fact, just like above, he said, "The result of defining F (a, B) is int ". As for what a and B are, he also tells you: Define a as int, and B as int. So it is equivalent. The following line is also "the result of the definition (* f) (a, B) is int ". The function type can also be used without the parameter name, but we encourage you to write the parameter name, in this way, Visual Studio's intelliisense will let you list the parameter names when you press "(". When you see the prompt, sometimes you do not need to go back to the source code.

There is another example of "consistent definition and usage" in C language. This example is also wonderful:

int a;typedef int a;int (*f)(int a, int b);typedef int (*f)(int a, int b);

Typedef is such a keyword: it modifies a symbol from a variable to a type. So whenever you need to give a type name a name, you should first think about how to define a variable of this type, write it, and add a typedef to the front, and the process is complete.

But to be honest, in terms of consistency, the C language is so far. As for the reason, the above several seemingly beautiful "same definition and use" rules cannot be combined. For example, let's look at the following line of code:

typedef int(__stdcall*f[10])(int(*a)(int, int));

Who knows what this is! In addition, this cannot be explained using the above method. The reason is that the "consistent definition and use" method adopted by C language is just a method for solving equations. For example, if int * B is defined as "* B is int", then what is B? After we see it, we have to think about it. The human intuition is that we have something to say. So if we know that int * is an int pointer, then int * B knows clearly-"B is an int pointer ".

Because the C language violates human intuition, this is a good principle and is implemented in a wrong way. As a result, it leads to a "pitfall. Because everyone is used to "int * a;", and then the C language tells everyone that the correct method is "int * a;", when you have two or three variables in succession, the problem arises, and you will fall into the trap.

At this time, let's look back at the declaration of the long function pointer array variable above. We will find that in this case, the C language still wants you to regard it as "int * B; f is an array, and the array returns a function pointer. The function returns an int. The function parameter is int (* a) (int, int) so he is still a function pointer.

Why do we think that C language is especially difficult to learn on this knowledge point, because it uses both principles to Design Syntax. So what is a good design? Let's take a look at some other language practices:

C++:function<int __stdcall(function<int(int, int)>)> f[10];C#:Func<Func<int, int, int>, int>[] f;Haskell:f :: [(int->int->int)->int]Pascal:var f : array[0..9] of function(a : function(x : integer; y : integer):integer):integer;

Although these languages do not comply with the "definition and use are consistent" principle, what is better than the C language is, they only adopt one principle-this is much better than mixing the good and the bad (the go is also doing worse than the C language ).

Of course, the above statement is not fair to Haskell. Haskell is a language with full type derivation. He does not consider a type declaration as part of a declaration, and regards a type declaration as part of a "prompt. So when you really need a function of this complex structure, you don't actually write its type, but write a correct function body, then let the Haskell compiler help you export the correct type. Here is an example:

superApply fs x = (foldr id (.) fs) x

There is a good way to understand foldr. For example, foldr 0 (+) [,] means 1 + (2 + (3 + (4 + 0 ))). (.) Is actually a function that combines two functions into one: f (.) g = \ x-> f (g (x )). So the code above means that if I have the following three functions:

add1 x = x + 1mul2 x = x * 2sqr x = x * x

When I write the following code:

superApply [sqr, mul2, add1] 1

In fact, sqr (mul2 (add1 (1) = (1 + 1) * 2) * (1 + 1) * 2) = 16. Of course, Haskell can also be more straightforward:

superApply [(\x->x*x), (*2), (+1)] 1

The simplicity of the Haskell code is really disappointing, because if we want to use C ++ to write the corresponding content (the C language parameter cannot be an array type with length, so it is actually not equivalent ), it will look like the following:

template<typename T>T SuperApply(const vector<function<T(T)>>& fs, const T& x){    T result = x;    for(int i=fs.size()-1; i>=0; i--)    {        result = fs[i](result);    }    return result;}

C ++ should not only clearly write every step, but also describe the type, and the entire code becomes particularly messy. In addition, C ++ has no way to create a vector with three functions like Haskell and then directly call it in this SuperApply. Some may say that this is not because Haskell has foldr. Let's take a look at how C # With foldr (reverse + aggregate = foldr) can be written:

T SuperApply<T>(Func<T, T>[] fs, T x){    return (fs        .Reverse()        .Aggregate(x=>x, (a, b)=>y=>b(a(y)))        )(x);}

C # basically, it has reached the same description process as Haskell, and the following code can also be written, that is, the noise of the declaration and the syntax used is a little greater ......

SuperApply(new Func<T, T>[]{    x=>x*x,    x=>x*2,    x=>x+1    }, 1);

Why do we need to talk about these issues when talking about syntax consistency? Here I want to show you another "consistent definition and use" approach of Haskell. The entire Haskell language must be understood using pattern matching, so the above Code

superApply fs x = (foldr id (.) fs) x

If you see a pattern similar to superApply a B, you can think of it as (foldr id (.) a) B. For example

superApply [(\x->x*x), (*2), (+1)] 1

Actually

(foldr id (.) [(\x->x*x), (*2), (+1)]) 1

As long as superApply refers to this function, no matter what context,You can rest assured that the meaning of the program will never change.-- This is haskell's principle of consistency. Let's take a look at how Haskell executes his consistency. Here we need to know that if we have an operator +, we need to consider + as a function and write (+ ). If we have a function f, if we want to regard it as an operator, we need to write it as 'F' (this is the button! The symbol on the left ). Therefore, Haskell actually allows us to make the following statement:

(Point x y) + (Point z w) = Point (x+z) (y+w)(+) (Point x y) (Point z w) = Point (x+z) (y+w)(Point x y) `Add` (Point z w) = Point (x+z) (y+w)Add (Point x y) (Point z w) = Point (x+z) (y+w)

The simple form of the Fibonacci series can even be written as follows:

f 1 = 1f 2 = 1f (n+2) = f(n+1) + f(n)

Even recursion can be written:

GetListLength [] = 0GetListLength (x:xs) = 1 + GetListLength xs

Haskell implements the "replacement relationship between functions and operators" and the "pattern matching" principle everywhere to make the "definition and implementation consistent" basis, thus, we have achieved a much better principle than the confusing principle of the C language.

Some may say that Haskell is so easy to write recursion, will it be because it encourages people to write recursion, And the whole program is full of recursion, so it is easy to stack overflow or reduce the running efficiency? Here, you can go up. In front of this article, there is a saying: "Good language, in addition to easy and easy to use for database writing, there are two important features: easy to learn and easy to analyze. ", This is fully embodied in Haskell.

We know that the loop is tail recursion, so if we write the code as tail recursion, the Haskell compiler will recognize it and process it into a loop when generating the x86 code. The exit point of a tail recursive function is either an expression that does not contain its own function call, or it is called using its own function and other parameters. It sounds pretty easy, but to put it bluntly, it is actually:

GetListLength_ [] c = xGetListLength_ (x:xs) c = GetListLength_ xs (c+1)GetListLength xs = GetListLength_ xs 0

When you write such code, Haskell compiles your code, and it will actually output a loop, so that all the above concerns are swept away.

In fact, many performance tests show that on most platforms, Haskell's speed will not be more than doubled by C/C ++, but it will be much higher than go's performance. In Windows, the fastest functional language is F #. Scala is used in Linux. Haskell has always been the second place, but it is only a little slower than the first place.

In order not to let the article be too long, it should be divided into several releases with a short interval each time. So today's pitfalls I just want to talk more about the one-C ++ pointer. The remaining pitfalls are left in the next article. If I was not asked in the fan group, I still don't know that someone would do this:

class Base{  ...};class Derived : public Base{  ...};Base* bs = new Derived[10];delete[] bs;

I would like to say that this is completely C ++ compatible with C language, and then the C language is pitfall. In fact, this problem does not occur in the C language, because the C language pointer actually has only one: char *. Many C-language functions accept char * and void * is only available later. The malloc and free operations for pointers in C language are actually reading it as char. So when you malloc something, cast it into the type you need, and finally free it, the existence of cast in this step does not exist, it is no difference whether free can be correctly executed.

But it is different when it comes to C ++. C ++ has inheritance. If it has inheritance, it has implicit type conversion of pointers. Let's look at the code above. We have a new [] pointer of the Derived * type, and then implicitly converted it to Base *. Finally, we use his delete [] Because delete [] needs to call the destructor, however, the Base * type pointer cannot correctly calculate the position of the this pointer required by the ten destructor of the Derived array. Therefore, the code is finished at this time (if it is not enough, that's just a coincidence ).

To be compatible with the C language, the "new [] pointer must be deleted []" and the "sub-class pointer can be converted to the parent class Pointer" rules are successfully conflicted. In fact, if you need to solve this problem, how should you change the type? In fact, we can introduce the Derived [] pointer type like C. This is what comes out of new []. In C ++, delete [] can also be required, but the difference is that it can no longer be converted to Base. Unfortunately, T [] is occupied by C language and used as T * in the function parameter type. C language is a waste of syntax ......

To be continued

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to design a language (1) -- What is pitfall ()

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to design a language (1) -- What is pitfall ()

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support