Data analysis using Python-(i) Library learning

Source: Internet
Author: User
Tags ggplot

Summarize yourself on Python common pack: Numpy,pandas,matplotlib,scipy,scikit-learn

A. Numpy:

The standard installed Python uses a list to hold a set of values that can be used as an array, but because the elements of the list can be any object, the list holds pointers to objects. In order to save a simple [three-way], you need 3 pointers and three integer objects. This structure is obviously a waste of memory and CPU compute time for numeric operations.

Python also provides an array module, which, unlike the array object and list, stores values directly, similar to a one-dimensional array of C. But because it does not support multidimensional, and there are no various operational functions, it is also not suitable for numerical operations.

The birth of numpy compensates for these shortcomings, and NumPy provides two basic objects: Ndarray (n-dimensional Array object) and Ufunc (Universal function Object). Ndarray (known collectively as arrays) is a multidimensional array that stores a single data type, while Ufunc is a function that can be processed on an array.

1 Import NumPy as NP
It is much more efficient to store and process large matrices than Python's own nested list (nested list structure) structure, which itself is developed by the C language. This is a very basic extension, and the rest of the extensions are based on this.The data structure is ndarray. There are four main things to know:
    • Creation of Ndarry objects
    • Ufunc operations
    • Calculation of matrices
    • File Save

See blog:

two. Pandas:
A tool based on NumPy that was created to solve the data analysis task. Pandas incorporates a number of libraries and a number of standard data models, providing the tools needed to efficiently manipulate large datasets. The most statistically significant toolkit, some aspects better than R software. Data structure:There are one-dimensional series, two-dimensional dataframe (similar to Excel or SQL tables, if you dig deeper, you will find that pandas and SQL are many similar places, such as the merge function), three-dimensional panel (Pan (EL) + da (TA) + S, Know the name of the origin of it). Learning pandas you have to master is:

    1. Summarize and calculate descriptive statistics, process missing data, hierarchical indexes
    2. Clean, transform, merge, reshape, GroupBy technology
    3. Date and time data types and tools (date processing easy to fly)

See blog:

three. Matplotlib:
The most famous drawing system in Python, many other drawings such as Seaborn (for pandas drawings) are also encapsulated by them. John Hunter, a founding man, died in 2012. This drawing system is very complicated to operate and is prohibitive compared to the R Ggplot,lattice drawing, which is why I personally do not discard R, although calling"ggplot")
The drawing can be shown in roughly the color of the ggplot, but it still feels very chicken. But the complexity of matplotlib gives it a strong customization. It has an object-oriented approach and a classic high-level package of pyplot.
What you need to know is:
    1. Scatter plots, line charts, bar charts, histograms, pie charts, box plots.
    2. Three major systems for drawing: Pyplot,pylab (not recommended), object-oriented
    3. Adjustment of axes, addition of text annotations, area fills, and use of special graphics patches
    4. Financial students Note that: You can directly call the Yahoo Financial data mapping (real ... )

A handy, easy-to-use Python toolkit designed for science and engineering. It includes statistics, optimization, integration, linear algebra modules, Fourier transforms, signal and image processing, ordinary differential equation solvers, and more.

The basic can replace MATLAB, but the use of the word and data processing is not a small relationship, the Department of Mathematics, or engineering department of the relative use of more. Slightly
Recently found a statsmodel can complement Scipy.stats, time series support perfect

Attention to machine learning students can pay attention to, very hot open-source machine learning tools, this aspect, such as the end of last year Google Open source TensorFlow, or Theano,caffe (Jiayanqing), Keras, etc., this is another aspect of the problem.
Home: An introduction to machine learning with Scikit-learn

    1. "Ten minutes to Pandas" Chinese translation version:
    2. Founder of Pandas: Data analysis using Python (watercress) (recommend)
    3. The collection of textbooks: Scipy lecture Notes (very good writing!) Regret missing Pandas)
    4. Improve yourself: machine learning combat (watercress)

    1. NumPy Getting Started: http://www.
    2. Pandas Video Explanation: Pandas Course Introduction
    3. Matplotlib Explanation: Course Introduction and environment construction
    4. SciPy Getting Started:http://www.

Data analysis using Python-(i) Library learning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.