Should I learn Python or R for statistics learning?

Source: Internet
Author: User
Reply content for Statistical Learning: give a long article to the teacher of DataCamp, a foreign training institution Martijn TheuwissenThe conclusion is that you need to select a tool based on your needs. SOURCE http://www.kdnuggets.com/ .

(By the way, make an advertisement. This article is translated by the data customer team, ID: idacker. please pay attention to the data with us)

Python and R are two of the most popular statistical programming languages. R functions are mainly considered by statisticians during development (R has powerful visualization functions ), python is accepted for its easy-to-understand syntax.

In this article, we will focus on the differences between R and Python and their position in data science and statistics.

Introduction to R
Ross Ihaka and Robert Gentleman created the open-source language R in S in 1995 to focus on providing a better and more humane way for data analysis, statistics, and graphic modeling languages.
At first, R was mainly used in academic and research, but recently the business community found that R is also very good. This makes R one of the fastest growing statistical languages in the world used by enterprises.
The main advantage of R is that it has a large community that provides support through the mailing list, user contributed documents, and a very active Stack Overflow group. There is also a CRAN Image. a user can easily create a knowledge base containing R packages. These packages have functions and data in R, and images from various regions are backup files on R websites. they are identical. Users can choose the latest technologies and functions from the images closest to you, instead of development from scratch.
If you are an experienced programmer, you may not feel that using R can improve efficiency. However, you may find that R learning often encounters bottlenecks. Fortunately, there are many resources.

Introduction to Python
Python was created in 1991 by Guido van Rossem and emphasizes efficiency and code readability. Programmers who want to deeply analyze data or apply statistical technology are major users of Python.
The more you need to work in the engineering environment, the more you like Python. It is a flexible language that performs well in processing new things and focuses on readability and simplicity. its learning curve is relatively low.
Similar to R, Python also has Packages. pypi is a Python package warehouse with many Python libraries written by others.
Python is also a big community, but it is a bit scattered because it is a common language. However, Python claims that they are more dominant in Data Science: expected growth, where the origin of more novel scientific data applications lies.

R and Python: Comparison of numbers On the Internet, we can often see numbers that compare the popularity of R and Python. although these numbers are often based on how these two languages continue to develop in the overall computer science ecosystem, however, it is difficult to compare them in parallel. The main reason is that R is only used in the data science environment. as a general language, Python is widely used in many fields, such as the development of networks. This often leads to ranking results in favor of Python, and practitioners have lower salaries. On the Internet, we can often see numbers that compare the popularity of R and Python. although these numbers are often based on how these two languages continue to develop in the overall computer science ecosystem, however, it is difficult to compare them in parallel. The main reason is that R is only used in the data science environment. as a general language, Python is widely used in many fields, such as the development of networks. This often leads to ranking results in favor of Python, and practitioners have lower salaries.

How to use R?
R is mainly used when data analysis tasks require independent computing or analysis of a single server. This is an exploratory work, because R has a lot of packages and testing available at any time, and can provide the necessary tools to quickly start and run a large number of data analysis of almost any type. R can even be part of a big data solution.
When you start to use R, it is best to install RStudio IDE first. We recommend that you check the following popular packages:
• Dplyr, plyr, and data. table can be operated easily
• Stringr operator string • zoo performs regular and irregular time series operations
• Ggvis, lattice, and ggplot2 for data visualization
• Caret machine learning

How to use Python?

If your data analysis tasks require Web applications or code statistics to be integrated into the production database, you can use python as a fully sophisticated programming language, it is a great tool for implementing algorithms.


Although the python package is still in its early stages of data analysis in the past, it has been significantly improved over the years. NumPy/SciPy (scientific computing) and pandas (data processing) must be installed for use so that Python can be used for data analysis. Also take a look at matplotlib to make graphics and scikit-learn machine learning.
Different from R, Python does not have a clear and very good IDE. We recommend that you check the Spyder and IPython websites to see which one is best for you.

R and Python: The performance of the data science industry
If you look at recent public opinion surveys, R is a clear winner in the programming language of data analysis. More and more people are switching from R & D to Python. In addition, more and more companies use these two languages for combination. If you want to engage in the data industry, you can use these two languages. Recruitment trends show that the demand for these two skills is increasing, and the salary is much higher than the average. If you want to engage in the data industry, you can use these two languages. Recruitment trends show that the demand for these two skills is increasing, and the salary is much higher than the average.


R: Advantages and disadvantages
Advantages
Strong visualization abilityVisualization usually allows us to better understand the numbers themselves. R and visualization are out of the box. Some visual software packages are ggplot2, ggvis, googleVis, and rCharts.
Complete ecosystemR has active communities and a rich ecosystem. R is included in CRAN, biocondu, and Github. You can use Rdocumentation to search for all R packages.
For data science
R is developed by statisticians who can exchange ideas and concepts through R code and packages. you do not have to have a computer background. In addition, the business community is increasingly accepting R.
Disadvantages
R is relatively slow
R makes it easier for statisticians, but your computer may be running slowly. Although the R experience is slow, there are multiple packages to improve r performance: pqR, renjin, FastR, Riposte and so on.
R is not easy to learn in depth
R is not easy to learn, especially if you want to perform statistical analysis from the GUI. If you are not familiar with it, even if it is found that the package may be very time-consuming.

Python: Advantages and disadvantages
Advantages
IPython Notebook
IPython Notebook makes it easier for us to use Python for data work. you can easily share Notebook with colleagues without installing anything. This greatly reduces the overhead of organizing code, output, and comment files. You can spend more time doing practical work.
General language
Python is a common language that is easy and intuitive. It is easier to learn. it can speed up writing a program. In addition, the Python testing framework is built in to ensure that your code is reusable and reliable.
A multi-purpose language
Python brings people of different backgrounds together. As a common and easy to understand, most programmers can easily communicate with Statisticians. you can use a simple tool to integrate all your work partners.
Disadvantages
Visualization
Visualization is an important criterion for choosing data analysis software. Although Python has some good visual libraries, such as Seaborn, Bokeh, and Pygal. However, compared with R, the presented results are not always so pleasing to the eye.
Python is a challenger
Python is a challenger for R and does not provide an essential R package. Although it is catching up, it is not enough.

What should you learn:
It's up to you! As a data worker, you need to select the most suitable language for your work. Asking these questions before learning can help you:
What problems do you want to solve?
What is the net cost of language learning?
What tools are commonly used in your field?

What are other available tools and how to do the commonly used tools involved?



Note: DataCamp is an online interactive education platform that provides data science and R programming courses.

Scala and rust, among which rust is the most suitable, because you have to create all the wheels yourself, including parallelism...
Let's take a look at your questions. learn R first. it takes effect quickly. You can learn it all. If you learn more, you will not be pregnant. What are you doing? Gb data volume, or mb? Gb uses Python and MB uses R, but it is best to learn it. For details, see "using python for data analysis". Generally, R is widely used abroad, because it is open source and simple. Python is used to capture data. I first came into contact with MATLAB. when I came into contact with R, I felt a variety of difficulties. I really didn't like the strange syntax. R is still annoying.
I have been familiar with Python since Java, C, and PHP, and I like it in an instant. What else do I need to ask? In just a few hours, both languages will work. Duxue

For every python program, we recommend that you learn python.

R can do everything in python. R is superior to python in matrix operations, but python has simple syntax and ease of use. I personally feel that resources are richer. I want to learn basic knowledge about R and python, can look at the Institute of Data Analysis (http://datacademy.io .

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.