Python "Invincible": is about to replace R

R: Not a real language

Part of the reason we learn r is very difficult is that it is not a real programming language. John Cook, an R expert, said: "R is a statistical interactive environment, not a real programming language." It is more helpful to think of R as an interactive environment containing programming languages. ”

But as Bob Muenchen emphasizes, R is even harder for people who are proficient in SAS and SPSS data-statistics tools. The question of R reducing complexity for analysts remains to be disputed, although R contains macros and matrix languages, and you need to master tools like SPSS. But for those who expect R to implement functions like Stata, they are doomed to disappointment.

In general, R makes it harder to learn.

Python lowers data technology barriers

However, Python is more approachable. On the one hand, a wide variety of developers are familiar with Python and use it in a wide range of applications. Unlike R, which is almost exclusively for data analysis, a developer can experience python the first time he writes his web site script or other applications.

As companies make every effort to use data, they are also trying to find qualified data scientists. However, as Gartner's Svetlana Sicular has assumed, it is more efficient to train employees in the company with simple, large data technologies than to train newly hired data scientists for complex business knowledge.

Python "Invincible"

But one of the biggest benefits of doing data science in Python, in addition to making use of off-the-shelf Python developer resources, is the increased efficiency of using a programming language in different applications. "It turns out that the benefits of using a language to do all the development and analysis are quite impressive," says Tal Yarkoni, research assistant at the University of Texas at Austin. On the one hand, when you can do all things in the same language, you don't have to keep reminding yourself that Ruby uses blocks instead of comprehensions, and that the size of the array in Python should call Len (array), not array.length ......

In addition, you do not need to worry about the different modules of the project using different language interface problems. Nothing is more annoying than parsing some text data in Python and then converting it into a format needed for internal use, and finally discovering that it must be written to disk in another format so that R or MATLAB can do the analysis. All these costs will disappear as long as you use a single language. ”

We can flatter a technology to solve a problem perfectly, but often the winning technology is a common tool for solving a series of problems. As David Himrod, AppNexus's optimization and analysis director, points out, "one of the biggest challenges facing AppNexus is how to get different employees to work with the same technology." Python provides a common, Easy-to-understand language for employees with different backgrounds, especially engineers, mathematicians, and analysts, for companies to standardize new functionality. ”

Using Python's mainstream data science

Python still has a lot of drawbacks compared to R's rich data analysis capabilities, but it is rapidly narrowing the gap. Keep in mind that the key to Python's success is not that it can handle more arcane functions than R or other analytics tools, but rather its approachable and universal nature. Data science is moving out of the field of number one geek, especially at the O ' Reilly's strata meeting in New York last month: Past attendees were PhD in academia, now key business analysts and others who were asked by companies to figure out big data businesses.

Compared with R, this new, early "data scientist" will use Python more. Python is relatively simple to use, and they may have already been used in a project. In other markets, tools that are familiar or easy to learn are easier to win than those with powerful but complex tools.

