To do statistical study
Reply content:
To a long article, a foreign training organization Datacamp teacher
Martijn TheuwissenWrite a detailed comparison of Python and R, and the conclusion is that you need to choose tools based on what you do. Source http://www. kdnuggets.com/
。
(By the way an advertisement, this article by the data Guest team translates, number Idacker, welcome with us to follow the data)
Python and R are two of the most popular programming languages in statistics, and the functionality of R is primarily for statisticians to consider when developing (R has powerful visualizations), and Python is accepted by everyone for its easy-to-understand syntax.
In this article, we'll focus on the differences between R and Python and their status in data science and statistics.
Introduction to R
Ross Ihaka and Robert Gentleman created the Open source language R in the S language in 1995 to focus on providing a better and more humane way of doing data analysis, statistics, and graphical models.
At first R was mainly used in academia and research, but recently the business community found R is also very good. This makes R a one of the fastest-growing statistical languages used in the enterprise.
The main advantage of R is that it has a huge community, supported by mailing lists, user contributed documents and a very active stack Overflow group. There is also a cran image, a user can easily create a knowledge base that contains the R package. These packages have functions and data in R, all the mirrors are backup files of R site, exactly the same, the user can choose the closest image from you to access the latest technology and features, without having to develop from scratch.
If you are an experienced programmer, you may not feel that using R can improve efficiency, but you may find that learning R often encounters bottlenecks. Fortunately, there is a lot of resources now.
Introduction to Python
Python was created by Guido Van Rossem in 1991 and emphasizes the readability of efficiency and code. Programmers who want in-depth data analysis or applied statistics are the primary users of Python.
The more you need to work in an engineering environment, the more you'll like Python. It is a flexible language that behaves well in dealing with new things, and focuses on readability and simplicity, and its learning curve is relatively low.
Like R, Python has a package, and PyPI is a python-wrapped repository with many Python libraries that others have written.
Python is also a big community, but it is a bit more scattered because it is a common language. However, Python claims that they are more dominant in data science: The expected growth, the origins of more novel scientific data applications here.
R and Python: Comparison of numbers The numbers that compare R and Python popularity can often be seen online, although these numbers tend to evolve in the overall ecosystem of computer science, but it is difficult to compare them in parallel. The main reason is that R is used only in the context of data science, and Python, as a common language, is widely used in many fields, such as the development of networks. This tends to lead to a preference for python in ranking results and lower wages for practitioners. The numbers that compare R and Python popularity can often be seen online, although these numbers tend to evolve in the overall ecosystem of computer science, but it is difficult to compare them in parallel. The main reason is that R is used only in the context of data science, and Python, as a common language, is widely used in many fields, such as the development of networks. This tends to lead to a preference for python in ranking results and lower wages for practitioners.
How does R work?
R is primarily used when data analysis tasks require independent calculation or analysis of a single server. This is exploratory work because R has a lot of packages and ready-to-use tests that can provide the necessary tools to quickly start and run a large number of almost any type of data analysis. R can even be part of a big data solution.
When you start using R, it's a good idea to first install the Rstudio IDE. After that, we recommend that you look at the following popular packages:
Dplyr, Plyr and data.table can easily operate the package
stringr Action String Zoo do regular and irregular time series work
Ggvis, Lattice, and ggplot2 for data visualization
Caret Machine Learning
How does python work?
If your data Analysis task requires the use of a Web application, or code statistics need to be incorporated into the production database for integration, you can use Python as a fully fledged programming language, which is a great tool for implementing algorithms.
Although Python packages have been in the early stages of data analysis in the past, there have been significant improvements over the years. NUMPY/SCIPY (scientific calculations) and pandas (data processing) need to be installed to make Python available for data analysis. Also look at Matplotlib, make graphics and Scikit-learn machine learning.
Unlike R,python there is no clear and very good IDE. We suggest you look at the Spyder and Ipython website to see which one is best for you.
R and Python: The performance of the data science industry
If you look at recent polls, R is the obvious winner in the programming language of data analysis. More and more people are moving from research and development to Python. In addition, more and more companies are using these two languages to combine. If you are going to work in the data industry, you should learn both languages well. Recruitment trends show that demand for both skills is increasing and wages are far above average. If you are going to work in the data industry, you should learn both languages well. Recruitment trends show that demand for both skills is increasing and wages are far above average.
R: Pros and cons
Advantages
Strong visualization CapabilityVisualizations often allow us to understand the numbers themselves more effectively. R and visualization are absolutely perfect. Some of the must-see visualization packages are Ggplot2,ggvis,googlevis and rcharts.
a complete ecological systemR has an active community and a rich ecosystem. R packages are on Cran,bioconductor and GitHub. You can search all r packages by rdocumentation.
for Data Science
R is developed by statisticians who can communicate ideas and concepts through R code and packages, and you don't necessarily need a computer background. In addition, businesses are increasingly accepting R.
Disadvantages
R relatively slow
R makes it easier for statisticians, but your computer may be running slowly. Although the experience of R is slow, there are multiple packages to improve R performance: Pqr,renjin,fastr, riposte, and so on.
r not easy to learn in depth
R isn't easy to learn, especially if you're going to do statistical analysis from the GUI. If you are unfamiliar with it, even discovering the package can be time-consuming.
Python: Pros and cons
Advantages
IPython Notebook
IPython notebook makes it easier to use Python for data work, and you can easily share notebook with coworkers without having to install anything. This greatly reduces the overhead of organizing code, output, and commenting files. Can spend more time doing the actual work.
Common Language
Python is a common language, easy and intuitive. It will be easier to learn, it can speed up your writing a program. In addition, the Python test framework is a built-in, which ensures that your code is reusable and reliable.
a multi-purpose language
Python sets together people of different backgrounds. As a common, easy to understand, most programmers can easily communicate with statisticians, you can use a simple tool to integrate each of your work partners.
Disadvantages
Visualization of
Visualization is an important criterion for selecting data analysis software. Although Python has some good visual libraries, such as Seaborn,bokeh and Pygal. But compared to r, the results are not always pleasing to the eye.
Python is a challenger
Python is a challenger to R, and it does not provide the necessary r packages. Although it is catching up, but not enough.
What should you learn in the end:
It's up to you! As a data worker, you need to choose the language that best suits your needs at work. Ask these questions before learning to help you:
What problem do you want to solve?
What is the net cost of learning a language?
What are the tools that are commonly used in your field?
What are the other tools available and how do you make these common tools involved?
Note: Datacamp is an online interactive education platform that offers courses in data science and R programming.
Scala, rust, where rust works best, because all the wheels you have to build yourself, including parallel hey ...
The landlord to see you ask the law, first learn R, quick. All learn, there is nothing good to ask. Learn more and not get pregnant. What are you going to do? GB data volume, or MB?GB with PYTHON,MB R, but it is best to learn. See "Using Python to Do data analysis". General foreign use of r more, because open source, simple. Python is used to crawl data. First contact with the MATLAB, when I touch r when I feel a variety of difficult to use, strange grammar really do not like. Straight now still hates R.
I touched the python after java/c/php, and instantly liked it. This is a question? It will be two languages in just a few hours. All Learn
Every python you want to program should suggest learning Python.
R can do the python can. R is better than Python in terms of matrix operations, but Python syntax is simple, easy to get started, personal feeling resources are richer, want to learn the basics of R and Python, you can look at the School of mathematical analysis ( / http Datacademy.io
) on some of the free courses.