Programmer's data Analysis Python technology stack

Source: Internet
Author: User

Introduction: Python is a popular scripting language that provides a science and technology stack for fast and easy data analysis, and this series focuses on how to use the Python-based technology stack to build a collection of tools for data analysis. 工欲善其事, its prerequisite, let's take a look at these tools.

0. Data analysis and machine learning

The only constant in the information age is change. With the popularization and application of information technology, large-scale application of big data technology, data analysis, data mining, machine learning and even the former high-rise artificial intelligence (AI) has begun to appear frequently in various situations, which indicates the advent of the data age.

For the program apes, in addition to writing code to achieve certain functions, in this time of change, also need to understand and grasp some of the data analysis skills and tools, as previously mastered some of the linux/database skills, with these skills, can give you a big score, Maybe it will help you get into a whole new and broad field.

1. What is Python?

Python is a well-known generic scripting language that can meet full-featured programming needs, and the current mainstream is 2.7.x and 3.x versions, and 2.x versions will no longer be supported at 2020. Python's biggest bit is simple and easy to learn, so it's widely used in other fields. All the toolkits we have here are built on top of Python.

2. What is Ipython?

Ipython is a python interactive shell that works much better than the default Python shell, supports variable auto-completion, auto-indent, supports bash shell commands, and has many useful functions and functions built into it. For many of the program apes, this is an extremely powerful interaction tool, and basically many subsequent data analysis operations are built on Ipython.

Ipython provides a variety of practical models, including: Terminal, interface and Web interface, and so on, very powerful and easy to use.

Installation Guide: http://ipython.org/install.html

3. Numpy

The NumPy system is an open-source numerical extension of Python. This tool can be used to store and manipulate large matrices, which is much more efficient than Python's own nested list (nested list structure) structure, which is also useful for representing matrices (matrix). It is said that NumPy Python is the equivalent of becoming a free, more powerful MATLAB system.

It is fast and powerful, it can support linear algebra operation, Fourier transform, random number generation and so on all kinds of mathematical meta-calculation.

Official website: http://www.numpy.org/

4. Pandas

The Python data analysis Library or pandas is a numpy-based tool that was created to solve the data analytics task. Pandas incorporates a number of libraries and a number of standard data models, providing the tools needed to efficiently manipulate large datasets. Pandas provides a number of functions and methods that enable us to process data quickly and easily. You will soon discover that it is one of the important factors that make Python a powerful and efficient data analysis environment.

It provides a powerful two-dimensional structure of dataframe as the basic structure body of data analysis, and series is used as an efficient data group structure. Pandas combines NumPy's high-performance array computing capabilities with the flexibility of spreadsheet, relational database (SQL) data analysis to easily reshape, slice, dice, aggregate, sort, and select subsets of data.

Official website: http://pandas.pydata.org/

5. Matplotlib (Graphic display package)

Matplotlib is Python's most famous drawing library, which provides a complete set of command APIs similar to those of MATLAB, making it ideal for interactive mapping. It can also be conveniently used as a drawing control, embedded in GUI applications, and its documentation is fairly complete and widely used, and is an essential tool for Python's data analysis. And it has been deeply integrated with pandas and other toolkits, can directly call various kinds of drawing functions in pandas, directly generate the corresponding chart.

Official site: http://matplotlib.org/

6. Scipy

SciPy is a convenient, easy-to-use, scientific and Engineered Python Toolkit. It includes statistics, optimization, integration, linear algebra modules, Fourier transforms, signal and image processing, ordinary differential equation solvers, and more.

SciPy has a stats package, which can include standard continuous, discrete probability distributions, various statistical testing methods, and better descriptive statistical methods.

The combination of NumPy and scipy can completely replace the computational function in MATLAB (including its plug-in toolbox)

Official site: http://www.scipy.org/

7. Common development tools

The following are two very powerful integrated development environments, with the integration of all the required development packages, and you can download the corresponding version from the official website to support various platforms (window, Mac, Linux) and 32-bit/64-bit systems.

    • Canopy https://www.enthought.com/products/canopy/

    • Anaconda https://www.continuum.io/downloads

7. Summary

There are a lot of tools in the Python community, such as Keras is a powerful machine learning implementation package, and has been able to directly use the TensorFlow to achieve convolutional neural computing, quite a packet. Well, I hope you have an intuitive understanding of the data analysis technology stack based on Python in this article. Later in the article, we'll step through the steps to step through the data analysis process in detail.

----------the end of the evil line, the article finally ended--------------------------------------------------------------

This article is cdsn of Bo Master wood Small Fish notes personal original, such as to reprint, please keep the original link and original author information, support the original, convenient for you and me.

The author also maintains a headline on today's headline: The program gas station, welcome attention.


Programmer's data Analysis Python technology stack

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.