The difference between Python and R language __python

Source: Internet
Author: User

Data mining technology is becoming mature and complex, with the development of the Internet and the arrival of a large amount of data, the traditional relying on SPSS, SAS and other visual tools to achieve data mining modeling has become increasingly unable to meet the day-to-day needs, according to the United States data scientists (scientist) requirements, To become a real data scientist, programming implementation algorithms and programming implementation modeling is a necessary condition; At present, many of the people engaged in data mining, mostly from the non-computer professional, their programming base is relatively low, so find a fast-start and efficient programming language is essential, Good tools and programming languages can have a multiplier effect.

The most common programming languages currently used in data mining algorithms are: Java, C + +, C, Python, R, etc.

Because the author itself belongs to the mathematical statistics, complex and advanced language for me is not cost-effective, so want to start from the Java, C + +, C began to learn, waste of time and energy and harvest obviously not proportional. So Python and R languages are the best choices. I strongly recommend choosing one of the same data practitioners who are similar to my background.

There are three reasons:
First: Python and R itself in the data analysis and data mining have a more professional and comprehensive modules, many commonly used functions, such as matrix operations, vector operations have a relatively advanced usage, so the use of higher output than;

Second: These two languages for the platform, the applicability of a wide range of Linux, Windows can be used, and code portability is not bad.

Third: For the study of Mathematical statistics, most people should have used MATLAB and mintab tools, Python and r more close to these commonly used mathematical tools, use a kind of intimacy

As for the choice between Python and R, I have some superficial understanding:

These two tools are very convenient, do not need very advanced programming capabilities, are suitable for algorithmic development, there are a large number of package for you to use.

Python is easy to get started with, and R is relatively difficult (pure personal feeling, depending on each person's previous experience, may be different experience).
R do text mining is still a bit weak, of course, its advantage is that the function is written to you, you only need to know the form of the parameters of the line, sometimes even if the parameter form is not correct, R can "intelligently" help you adapt. This simple software is suitable for those who want to focus on the business.
Python can do almost anything, with more functions than R, faster than R. It's a language, R is more like a software, so python can develop flexible algorithms.

Python is suitable for processing a large number of data, and R in this area has a lot of powerless, of course, so the premise is that the basic programming for the general children's shoes, for Daniel, more flexible use of vector programming, r speed is not too bad.

On performance, Python is somewhere between these high-level languages and r languages, although performance is not as high as those of the advanced languages, but the average daily data is basically implemented in Python, and for people who are not picky about performance requirements, enough C/c++/java

Python you need to install a series of packages such as Numpy,pandas,scipy,cython,statsmodels,matplotlib, and also need to install Ipython interactive environment, There is no function to support the statistical function of econometric analysis directly with Python; R is based on statistical analysis, performance and efficiency is slightly inferior to python. The advantage of R is that it is superior to Python in statistics and data calculation and analysis.

Python programming code with high readability, overall beauty, is simple and rough nature, a small amount of code in a short period of time to achieve complex functions; R's syntax is very strange, all kinds of packages do not adhere to the grammatical norms, resulting in the use of often feel the egg pain;

In terms of comprehensiveness, I think Python really beats R. Python has a clear advantage over the invocation of other languages, the connection, the reading of the data source, the operation of the system, the regular expression, and the word processing. After all, Python appears as a computer programming language, and R itself comes from statistical calculations. So from the comprehensiveness of language, the difference between the two is significant.

Python is used more by people in the machine learning field. As far as I know, the people who do marketing study, econometrics, statistics almost no use of Python
of course, now learning to program is much simpler than before. There is a saying, "I do not produce code, I just StackOverflow Porter" ...
The above is only personal sentiment, such as improper expression, welcome to point out, pat the hands of the brick Oh

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.