This article by Bole Online-hansir translation, Toolate School Draft
English Source: Quora
"Bole Online Guide": The problem comes from Quora, the main added, "It seems that a lot of data-making programmers are very good at Python, this is why?" "Here's a reply from Jeff Hammerbacher. (693 likes)
Python is an explanatory, dynamic language with a clear and efficient syntax. Python has a good repl (Read-eval-print loop, ' read-evaluate-output ' loops), and a new module can be developed from Repl using Dir () and the document string. This is one reason why programmers are more inclined to Python than C, C + +, or Java.
[1] The Python community invested in the development of the numeric in the middle of the 90, "an extension of Python to allow it to support numerical analysis as naturally as MATLAB". Numeric later evolved into numpy[2]. [3] A few years later, the graphing capabilities of MATLAB were ported to Python through the Matplotlib library. The Library of scientific calculations was built around NumPy and Matplotlib and packaged into a scipy package [4], which was commercially supported by Enthought [5]. Python's support for array manipulation and drawing capabilities in class Matlab is the main reason why it is more favored than Perl and Ruby.
Today, for data scientists, Python's most popular alternative is R, Matlab/octave, and Mathematica/sage. In addition to the previous work of porting the MATLAB features to Python, recent work has ported some of the popular features in R and Mathematica to Python.
The data frames and related operations in the R language (from the Plyr and reshape packages) are already implemented by the Pandas Library [6]. The Scikit-learn project [7] presents a common interface for many machine learning algorithms, similar to the caret package in R.
[8] The concept of "notebook" in Mathematica/sage has been implemented by Ipython notebooks.
In my personal opinion, Python is still lacking in some important areas.
1. First, Python's syntax for array manipulation and formula setting is relatively more verbose. Matlab/octave's syntax for array manipulation is still more popular (for example, this is why it was used by Stanford University's machine Learning course), and the syntax of the R language is pretty good in formula settings.
2. Furthermore, it is the corresponding Python library of the static graphics library Ggplot2 and the interactive graphics library D3. The Matplotlib library is not easy to install, it is difficult to use, and it is not easily built for interactive graphics for the web.
3. The third is the scalability of the numpy and pandas libraries when dealing with large datasets. Continuum is working to solve this problem, but there is still a long way to go to create something coherent and usable.
4. IV is the absence of an embedded, declarative language for data manipulation similar to a LINQ project. Pandas is useful as a low-level data manipulation Toolkit, but the specialized pandas syntax for tracking complex operations can be frustrating.
5. Finally, there is a lack of a high-quality IDE like R Studio for data scientists.
Resources:
[1] Http://hugunin.net/story_of_jyth ...
[2] http://numpy.scipy.org/
[3] http://matplotlib.sourceforge.net/
[4] http://www.scipy.org/
[5] http://www.enthought.com/
[6] Http://pandas.pydata.org
[7] Http://scikit-learn.org
[8] http://blog.fperez.org/2012/01/i ...
[9] http://continuum.io/
Written on August 29, 2012.
About Hansir
Why do data scientists choose the Python language?