Missy late night

Source: Internet
Author: User

The development of the world of mankind has become nothing more than today. Recently I read some posts about the measurement economics and found that a person in our circle may be an extremely unfamiliar computer language called R. The predecessor of this R language is S, which is a functional scripting language. It is specially designed for statistics purposes. It is understood that the team that invented the s language also won the ACM best Software System Award in, which is the only one in statistics software.

After reading this pile of articles, I have to feel that the biggest gap between science and today is not the depth of knowledge in the vertical sense. I believe that the deep knowledge problem should be well solved. For example, in the early 1990s s, most domestic software talents were estimated to stay on C for the development language, in fact, C ++ has been developing for more than a decade. Even at the beginning of the 21st century, some schools will still stay in C, or even Pascal's antique language teaching (note: "Stay here ", these old languages are simple and easy to get started, but they will not be taught any more ). Of course, there are already many people in China that can catch up with the trend of the times. For example, many people are learning and researching when. Net 4.0 is still in CTP. Therefore, this is really not a problem. After all, it is not difficult to continue in a field.

The real gap lies in the problems that exist in different fields. For example, when the economic field needs to use the statistical method or, conversely, the statistical method is applied to the economic field. Specifically, when a pure economics requires a statistical method to solve the problem, these statistical methods may become a new and relatively unfamiliar field for him, it takes a lot of effort and time to solve the problem. The more troublesome problem at this time is that it is difficult for him to find a master of Statistics interested in economics to teach him the latest and most advanced statistical methods, it is good to find a cainiao who can understand the most basic economic terms. Therefore, at this time, we can only start with the simplest problem and gradually integrate it. This kind of cross-discipline development is possible only when talents in both disciplines understand it. Of course, today's metered economics is no longer the kind of state I have taken as an example, but I am sure that when this cross-discipline first appeared, it is similar to what I said.

The reason why I say this is because I found that this R language is called advanced, which is not the case in my rough view (just say it is a false argument ). Some people claim that this language is "Object-Oriented", but I do not think it is really modern object-oriented. Of course, the concept of "object" in this language is indeed correct, and this advanced concept can also be considered in the statistical circle. The key is that in the rough information I see, it seems that few people in the r language really understand or even care about what kind of language the r language is, for example, static or dynamic? Strong or weak type? Dynamic or static type? Compiled or explained? And so on. What they care about is what kind of problems the language can solve. I think this is a matter of course, and it is understandable. However, just as "no rejection does not mean that it is acceptable" in statistics, caring about what problems can be solved does not mean that the attributes of a language are unimportant. The attributes of the language determine the performance of all aspects, and even decide what kinds of problems are suitable for solving, and how efficient programming can solve these problems. My current Cognition: The R language is a good * language * for me to use feeling like Thanks God, there is a good * language * for me to use. I have to admit that, for now, this language is enough to solve most of the current problems. However, from the developer's perspective, this language still has some defects:

1. It is not an object-oriented method in the modern sense. Its implementation method is a bit like the C language, and it needs to be encapsulated and assigned by itself. In addition, the Members do not have public, private, and protected modifiers, and cannot be reloaded or closed. interfaces (or multi-inheritance) are missing ), in addition, there are some strange problems caused by the design of the language itself. Let's take a specific example. For example, the following function is used to calculate the mean:

Mean <-function (x ,...)
Usemethod ("mean ")

When we call this function (result 3 ):

Mean (1:5)

During the call, the system first places the mean of the current function name on the context variable. generic, and then places the integer of the X type into the. Class variable. In this example, the usemethod function will. class and usemethod ("mean") in the first parameter type name integer to find the appropriate method mean. integer, and then call it (a little like C's derivative inheritance ). You can also write usemethod ("mean", x), usemethod (, x), or usemethod (). The missing parameters are found in. Class and. generic.

Well, now there are two very "strange" problems. First, if someone calls it directly through mean. INTEGER (), there is no content in. Generic. Therefore, usemethod is very dangerous to lack the first parameter. The second is that after usemethod is called, other code will not be executed (it is really strange ).

From the above description, we can see that this kind of object-oriented model is not formal and integrates the generic object model. Even according to my current understanding, "object-oriented" functions must first have open generic functions. The special behavior of usemethod causes every "object-oriented" function to have to write and only write nonsense.

(Note: I have never used the r language, so I am not sure what the. Class in the above example will be, maybe it is not an integer, but the concept should not be understood correctly. See http://cran.r-project.org/doc/manuals/R-lang.html#UseMethod for details)

 

2. There is no namespace concept. This design defect will inevitably lead to the naming pollution problem. The same as above, there is not a long enough time, and this problem will not be noticed.

3. If many other advanced language features are missing, we will not detail them here. Because this is a natural thing. After all, we cannot expect a language that aims to solve the statistical problem. It will be able to keep up with the general language development pace in a short period of time.

It should be particularly noted that, if I say this, it may cause misunderstandings and the r language is definitely lagging behind. Also, the R language has its own garbage collector implementation, and. net is similar to the generation mark collection method (divided into three generations, maybe it is still. net references R or similar languages ). In fact, internal accounting is not at the forefront, but it cannot be said that the soil has fallen. The reason for the current situation is that the language design itself is not advanced, as mentioned above, is because the statistics field does not really have a good language to solve their problems, including the corresponding function libraries or modules. I estimate that in the current development stage, the language should focus on "How many statistical problems can be easily solved ", rather than the computer language defects that this language faces when solving these problems. In this case, it may take about ten years for my inference to be confirmed. Of course, I want to be able to do it earlier. From my point of view, this language is a bit ugly for me and is expected to be quite uncomfortable to use, this kind of unhappiness is not caused by my lack of statistics knowledge (maybe the statistics researchers may not be able to understand it, but may insist that it is caused by my lack of understanding of statistics, I also don't want to argue too much about this, because I do lack such knowledge ).

Another evidence of the gap still exists between the computer field and the statistics field is that in our field, the perception of statistics is extremely scarce. I estimate that at least 50% of bloggers in the blog Park are not familiar with regression analysis (or even have never heard of it). 75% of bloggers may not know the least square method, this includes me. I believe that my estimation is already quite conservative, and I believe that most of the bloggers here use the r language in statistics, it is much stronger in computer language and software engineering. As mentioned above, we can find that there are few people discussing the nature of the r language through Internet search, and there are few questions about the software engineering of the r language.

In this regard, I have an interesting idea, that is, whether F # can undertake similar work.

On the good side, F # is more advanced in terms of language level, and may be faster. In this regard, I will not give more examples. At least the object-oriented concept is more complete and rigorous, and the underlying core should be more advanced.

There is also a major benefit of the statistical framework above F #, that is, some content that is irrelevant to statistics or has a small relationship, it can be easily done by professional coding people or companies. For example, database access, page display, parallel computing, and load balancing.

The downside is that F # does not have many statistical function libraries available, and there are already a lot of off-the-shelf code on R used by scholars around the world. This accumulation problem is not smooth in a day or two, just as the gap between Java and. Net has not yet been smooth. (I am thinking, F # has already appeared at least, but it also has certain challenge capabilities. What about the Java camp ?)

Another pain point is M $'s $. IDE asks for money, Windows asks for money, etc. I have read a PHP question before and I don't want to talk about anything. To tell the truth, this is not really a matter of asking for money. Many problems in the world come from fear ". How many people will really think about all the costs, most of which are only chemical reactions to some sensitive data. I will not talk about this question more. Just give it a SRSS. It is said that it is rented on a yearly basis and the cost is mostly calculated by dozens of k USD. What about this cost? It may be compared with running R on Linux to buy a Windows License, but this does not seem to be affordable by ordinary scholars. If you erase the benefits of IDE and use free. NET Framework and notepad, it is estimated that the gap between $ should not be too large. (I do admit that, on a large scale, free resources on windows, or even paid resources, are fewer or worse than those on Linux, but at least it should not be a big problem for individual users .)

 

P.s .:

Someone has used f # To study the statistical direction (http://cs.hubfs.net/forums/permalink/326/335/ShowThread.aspx), it seems that the evaluation is not bad.

In addition, using R in F # also has a solution (http://cs.hubfs.net/blogs/thepopeofthehub/archive/2007/11/06/FSharpWithR.aspx), it seems the problem is not too big.

It is too difficult to search for something about R, because the letter "R" is too common.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.