Why do data scientists choose the Python language?

Source: Internet
Author: User

This article by Bole Online-hansir translation, Toolate School Draft

English Source: Quora

"Bole Online Guide": The problem comes from Quora, the main added, "It seems that a lot of data-making programmers are very good at Python, this is why?" "Here's a reply from Jeff Hammerbacher. (693 likes)

Python is an explanatory, dynamic language with a clear and efficient syntax. Python has a good repl (Read-eval-print loop, ' read-evaluate-output ' loops), and a new module can be developed from Repl using Dir () and the document string. This is one reason why programmers are more inclined to Python than C, C + +, or Java.

[1] The Python community invested in the development of the numeric in the middle of the 90, "an extension of Python to allow it to support numerical analysis as naturally as MATLAB". Numeric later evolved into numpy[2]. [3] A few years later, the graphing capabilities of MATLAB were ported to Python through the Matplotlib library. The Library of scientific calculations was built around NumPy and Matplotlib and packaged into a scipy package [4], which was commercially supported by Enthought [5]. Python's support for array manipulation and drawing capabilities in class Matlab is the main reason why it is more favored than Perl and Ruby.

Today, for data scientists, Python's most popular alternative is R, Matlab/octave, and Mathematica/sage. In addition to the previous work of porting the MATLAB features to Python, recent work has ported some of the popular features in R and Mathematica to Python.

The data frames and related operations in the R language (from the Plyr and reshape packages) are already implemented by the Pandas Library [6]. The Scikit-learn project [7] presents a common interface for many machine learning algorithms, similar to the caret package in R.

[8] The concept of "notebook" in Mathematica/sage has been implemented by Ipython notebooks.

In my personal opinion, Python is still lacking in some important areas.

1. First, Python's syntax for array manipulation and formula setting is relatively more verbose. Matlab/octave's syntax for array manipulation is still more popular (for example, this is why it was used by Stanford University's machine Learning course), and the syntax of the R language is pretty good in formula settings.

2. Furthermore, it is the corresponding Python library of the static graphics library Ggplot2 and the interactive graphics library D3. The Matplotlib library is not easy to install, it is difficult to use, and it is not easily built for interactive graphics for the web.

3. The third is the scalability of the numpy and pandas libraries when dealing with large datasets. Continuum is working to solve this problem, but there is still a long way to go to create something coherent and usable.

4. IV is the absence of an embedded, declarative language for data manipulation similar to a LINQ project. Pandas is useful as a low-level data manipulation Toolkit, but the specialized pandas syntax for tracking complex operations can be frustrating.

5. Finally, there is a lack of a high-quality IDE like R Studio for data scientists.

Resources:

[1] Http://hugunin.net/story_of_jyth ...
[2] http://numpy.scipy.org/
[3] http://matplotlib.sourceforge.net/
[4] http://www.scipy.org/
[5] http://www.enthought.com/
[6] Http://pandas.pydata.org
[7] Http://scikit-learn.org
[8] http://blog.fperez.org/2012/01/i ...
[9] http://continuum.io/

Written on August 29, 2012.

About Hansir

Why do data scientists choose the Python language?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.