**This article is all from my (wheat) "Big Data Public" course handout, including three Python and numpy data analysis package related tutorials, Excel and SPSS data Analysis tutorial, etc., the author is wheat and Yi Wen classmate, is the original material. Originally is the curriculum internal information, now open source, only for everybody to study. If you want to reprint, please contact me, and respect copyright.**

Python Data Analysis Fundamental turtorial

Python basic syntax and data structure

has been introduced in another article

See my blog http://blog.csdn.net/xiaomai_sysu/article/details/51103070

Python's module--module

Similar to the header file (. h) of C + +, a Python program can explicitly invoke functions/classes from other modules.

Objective:

1. Avoid bloated client program, let the user call the module on demand;

2. Modular thinking, so that the module between high cohesion, low coupling

Method:

At the beginning of the file, use the import statement

For example:

Import all the contents of the entire module:

Import MyModule

Import a function of a module:

Import Mymodule.myfunction

To import a class for a module:

Import Mymodule.myclass

Once imported, you can use functions defined in other files

For example, you can use the function myfunction in the mymodule.py file directly:

Mymodule.myfunction

Of course, in addition to the import statement, you can also use the From: Import: Statement

From MyModule import MyFunction

This usage is easier than the above usage because you set the namespace to MyModule

Call this MyFunction later, only need to use MyFunction directly, and do not need to use mymodule.myfunction

Python Science Packet Anaconda2

Anaconda2 is a one-click installation, the interior has been installed first including NumPy, SciPy, matplotlib and other numerical calculation of the package, but also include other network expansion packs

If you have not used and installed Python before, it is highly recommended that you

Https://www.continuum.io/downloads

Under the Windows version, select the 32-bit/64-bit python2.7 to download. After downloading, double-click Install, very simple.

After installation, go to the installation directory

Anaconda2\ scripts\

Click Ipython.exe to enter Python's command-line mode

(You can right-send a shortcut to the desktop for easy operation later)

Interactive command-line window Ipython

The Ipython is an interactive command line.

It works much better than the default Python shell, supports variable auto-completion, auto-indent, supports bash shell commands, and includes many useful functions and functions.

Input prompt, In[1]: Indicates that the first line is now entered

The biggest benefit of using Ipython is that you can quickly implement an idea and validate it with interactive commands.

Support for highly optimized multidimensional arrays NumPy

One of the major uses of NumPy is to increase the speed of numerical operations. Because Python is dynamically interpreted, its built-in functions are often less efficient to run. In particular, when a large number of iterations/loops are encountered, native Python language speeds are often unsatisfactory.

Therefore, NumPy came into being, its lower level adopted the C language implementation, to ensure the efficiency of statement execution.

The most basic data structure in NumPy is the Narray (array). As stated above, native Python does not have this data structure and you must import it from the NumPy module. (Import NumPy as NP)

Examples are as follows:

We see that the second line here is assigned a value of numpy array type

The third line, the type of output a: print type (a)

Then, on line fourth, output a:print a

Show A is [1 2 3]

If we want to output the first element of a, we can:

If we want to change the first element of a, we can enter a[0]=2333

At this point A has been changed, output a again, you can see

In the same way, we can create another array b=np.array ([1,1,1])

Output C, found that C is the value of a+b

Similar operators also have the

A+b A-B a*b (here *is the bitwise multiplication, equivalent to MATLAB.* ) ）

As mentioned earlier, NumPy is much faster than the original Python cycle and can be tested

Unlike MATLAB, NumPy generally does not use "matrices", but instead uses multidimensional arrays to represent matrices

Like what

(Note that the brackets are nested)

is a two-dimensional array, which is a matrix

1 2 3

4 5 6

7 8 9

You can output a look at the effect:

If you have learned linear algebra, then you know that for matrices, there can be matrix multiplication:

If there is now a matrix D above, and another matrix E:

Then, the matrix multiplication of D and e can be computed in the following way:

Or you can do this: (This is the equivalent method)

However, you should not use D*e, because d*e is a bitwise multiplication, which is the product of each position corresponding

"Get Narray Fragments"

You can use subscripts, for example, for D

You can use d[0] to get the first line of a matrix (a two-dimensional array)

You can also use D[0:2] to get a new matrix consisting of the first and second lines of the matrix

Use d[3] or d[-1] to get the last line of the matrix

In addition to creating a new array manually (multidimensional arrays), you can also create them quickly with built-in functions:

For example, create an array from 0~x-1 with Np.arange (x)

Use Np.linspace (start,end,jump) to create a linear array space starting at 0, finally 5, 10 size

Use Np.eye (x) to create a unit matrix of size X

You can use Np.zeros (x, y) to create a 0 matrix with a size of x*y

Note that this is a two parenthesis, inside (3,5) is a tuple tuple, outside the call function itself is required by the parentheses

You may notice that the above 0 and 1 have a decimal point, because it defaults to the float type, not the integer type, and you can set the type manually.

Note: The above is just a demo display, and did not assign the d[0] This array to a variable, so it is not saved. In the actual process should be saved to a variable, such as New_array=d[-1]

**Numerical Analysis Method Library scipy**

With the Np.array array, you can use the scipy to provide a lot of numerical calculation of the function method.

They all depend on numpy, but each is fundamentally independent. The standard way to import NumPy and these scipy modules is to:

Import NumPy as NP

From scipy Import Stats # Other sub-modules are the same

The main scipy namespaces mostly contain real numpy functions (try Scipy.cos is Np.cos). These are only for historical reasons and there is usually no reason to use import scipy in your code

Each function is used differently, but it can be very simple to invoke and practice, and it is also highly efficient.

http://reverland.org/python/2012/10/22/scipy/

There are examples of how all functions are used, and you can refer to

Graphic Drawing Library Matplotlib

Unlike the scipy, which provides an array of NumPy, which provides a numerical calculation function, the Matplotlib module is a module for drawing.

Of course, Matplotlib's drawing still relies on NumPy's NARRAY data structure.

Matplotlib Simple Drawing case:

The first step is to import the module first

The second step is to generate the argument space between 0~5, which is 10 times apart

Third, generating the dependent variable space requires only y=f (x), such as Y=3*x+8

Finally, you can draw with Plt.plot (x, y)

At this point, the picture is still in memory, not generated, you can add the horizontal axis, ordinate the name and title, enter

`plt.xlabel(‘xlabel‘)plt.ylabel(‘ylabel‘)plt.title(‘Function y=3x+8‘)`

After adding all the other parts, enter Plt.show () to display the picture

Get the following picture: (including the name and title of the vertical axis)

You can also assign a value y1,y2

Then, draw again

Plus horizontal ordinate:

Finally, enter the PLT. Show () Get Image:

The value of the arguments here is not very appropriate (x range is too small, not dense).

You can redesign better x and Y values, and make the x-axis of the image more dense so you can draw a better image.

This will be the stop, the less can embark on a more difficult journey!

[Python Data analysis] Basic article 1-numpy,scipy,matplotlib Quick Start Guide