The difference and connection between Python and R

Source: Internet
Author: User

Reprint: http://bbs.pinggu.org/thread-3078817-1-1.html

Some people say that the difference between Python and R is obvious, because R is statistical, Python is designed for programmers, in fact, this is somewhat unfair to python. In the 2012 we said R was the mainstream of academia, but now Python is slowly replacing R in academia. I don't know if it's because of the big data age.

Python is faster than R. Python can directly deal with the data on the G, r No, r analysis data need to first through the database to transform big data into small data (through GroupBy) to the R for analysis, so R can not directly analyze the behavior of the list, can only analyze statistical results. So some people say:python=r+sql/hive, it is not unreasonable.

One of the most obvious advantages of Python is the nature of its glue language, which is also mentioned in many books, and some of the underlying C-written algorithms are very efficient when packaged in Python packages (Python data mining package orange Canve Decision tree Analysis 500,000 users 10 seconds out of the results, With R A few hours also can not come out, 8G memory full full). However, everything is not absolute, if r vectorization programming good (a little difficulty), will make r speed and the length of the program significantly improved.

The advantage of R is that there are all-encompassing statistical functions that can be called, especially in time series analysis, both classical and cutting-edge methods have corresponding packages to use directly.
python, by contrast, is poorly-resourced in this area. However, Python now has a pandas. Pandas provides a standard set of time-series processing tools and data algorithms. As a result, you can efficiently process very large time series, easily slice/dice, aggregate, resample periodic/irregular time series, and more. As you might have guessed, most of these tools are especially useful for financial and economic data, but you can also use them to analyze server log data. As a result, Python has a constantly improved library (mainly pandas) in recent years, making it a major alternative to data processing tasks.

Have done several experiments:
1. Using Python to implement a statistical method, which uses the ctypes,multiprocess.
After a project to do method comparison, and use back R, found that some bioconductor on the package has been used by default parallel. (But that bag is still very slow, all the threads have been used all of a sudden, resulting in the use of the entire computer can not, see the page very card ~)
2. Use Python pandas to do some data collation work, like a database, two or three tables back and forth, matching. The staff was very friendly and helpful. Although these work R can also be done, but the estimate will be slower, after all, hundreds of thousands of lines of entries.
3. Use Python matplotlib to draw. Pyplot the way of drawing and R difference is very big, R is a command to draw something, Pylot is ready to come out together. Pyplot color choice is a little awkward, the default color is less, after the use of HTML color, but the name is too long ~. Pyplot's legend is more useful than R, which is semi-automated. Pyplot can be free to zoom and then save as a picture, which is more useful than R.

In general, Python is a relatively balanced language, all aspects can be, whether it is a call to other languages, and data source connection, reading, the operation of the system, or regular expression and word processing, Python has a clear advantage. and R is more prominent in statistics. But data analysis is not only statistics, pre-collection, data processing, data sampling, data clustering, and more complex data mining algorithms, data modeling and so on these tasks, as long as the 100M above the data, R is very difficult to do, but Python is basically competent.

combined with its power in general programming, we can use only python to build data-centric applications.
but there is no best software or program in the world, and few people can use single language mining to the fullest. In particular, many people have previously learned R, now completely without and reluctant to, so for people who want to learn, if you can combine r and Python, it is better, I read an article earlyLet R and Python dance together, we have the original post in the jar, not much to say, after reading there will be more inspiration.

BTW: If you have not learned r before, you can learn Python first and then decide whether to learn R, if you learn R, learn Python will be faster to get started.

The difference and connection between Python and R

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.