This article is all from my (wheat) "Big Data Public" course handout, including three Python and numpy data analysis package related tutorials, Excel and SPSS data Analysis tutorial, etc., the author is wheat and Yi Wen classmate, is the original material. Originally is the curriculum internal information, now open source, only for everybody to study. If you want to reprint, please contact me, and respect copyright.
Python Data Analysis Fundamental turtorial
Python basic syntax and data structure
has been introduced in another article
See my blog http://blog.csdn.net/xiaomai_sysu/article/details/51103070
Python's module--module
Similar to the header file (. h) of C + +, a Python program can explicitly invoke functions/classes from other modules.
Objective:
1. Avoid bloated client program, let the user call the module on demand;
2. Modular thinking, so that the module between high cohesion, low coupling
Method:
At the beginning of the file, use the import statement
For example:
Import all the contents of the entire module:
Import MyModule
Import a function of a module:
Import Mymodule.myfunction
To import a class for a module:
Import Mymodule.myclass
Once imported, you can use functions defined in other files
For example, you can use the function myfunction in the mymodule.py file directly:
Mymodule.myfunction
Of course, in addition to the import statement, you can also use the From: Import: Statement
From MyModule import MyFunction
This usage is easier than the above usage because you set the namespace to MyModule
Call this MyFunction later, only need to use MyFunction directly, and do not need to use mymodule.myfunction
Python Science Packet Anaconda2
Anaconda2 is a one-click installation, the interior has been installed first including NumPy, SciPy, matplotlib and other numerical calculation of the package, but also include other network expansion packs
If you have not used and installed Python before, it is highly recommended that you
Https://www.continuum.io/downloads
Under the Windows version, select the 32-bit/64-bit python2.7 to download. After downloading, double-click Install, very simple.
After installation, go to the installation directory
Anaconda2\ scripts\
Click Ipython.exe to enter Python's command-line mode
(You can right-send a shortcut to the desktop for easy operation later)
Interactive command-line window Ipython
The Ipython is an interactive command line.
It works much better than the default Python shell, supports variable auto-completion, auto-indent, supports bash shell commands, and includes many useful functions and functions.
Input prompt, In[1]: Indicates that the first line is now entered
The biggest benefit of using Ipython is that you can quickly implement an idea and validate it with interactive commands.
Support for highly optimized multidimensional arrays NumPy
One of the major uses of NumPy is to increase the speed of numerical operations. Because Python is dynamically interpreted, its built-in functions are often less efficient to run. In particular, when a large number of iterations/loops are encountered, native Python language speeds are often unsatisfactory.
Therefore, NumPy came into being, its lower level adopted the C language implementation, to ensure the efficiency of statement execution.
The most basic data structure in NumPy is the Narray (array). As stated above, native Python does not have this data structure and you must import it from the NumPy module. (Import NumPy as NP)
Examples are as follows:
We see that the second line here is assigned a value of numpy array type
The third line, the type of output a: print type (a)
Then, on line fourth, output a:print a
Show A is [1 2 3]
If we want to output the first element of a, we can:
If we want to change the first element of a, we can enter a[0]=2333
At this point A has been changed, output a again, you can see
In the same way, we can create another array b=np.array ([1,1,1])
Output C, found that C is the value of a+b
Similar operators also have the
A+b A-B a*b (here is the bitwise multiplication, equivalent to MATLAB. ) )
As mentioned earlier, NumPy is much faster than the original Python cycle and can be tested
Unlike MATLAB, NumPy generally does not use "matrices", but instead uses multidimensional arrays to represent matrices
Like what
(Note that the brackets are nested)
is a two-dimensional array, which is a matrix
1 2 3
4 5 6
7 8 9
You can output a look at the effect:
If you have learned linear algebra, then you know that for matrices, there can be matrix multiplication:
If there is now a matrix D above, and another matrix E:
Then, the matrix multiplication of D and e can be computed in the following way:
Or you can do this: (This is the equivalent method)
However, you should not use D*e, because d*e is a bitwise multiplication, which is the product of each position corresponding
"Get Narray Fragments"
You can use subscripts, for example, for D
You can use d[0] to get the first line of a matrix (a two-dimensional array)
You can also use D[0:2] to get a new matrix consisting of the first and second lines of the matrix
Use d[3] or d[-1] to get the last line of the matrix
In addition to creating a new array manually (multidimensional arrays), you can also create them quickly with built-in functions:
For example, create an array from 0~x-1 with Np.arange (x)
Use Np.linspace (start,end,jump) to create a linear array space starting at 0, finally 5, 10 size
Use Np.eye (x) to create a unit matrix of size X
You can use Np.zeros (x, y) to create a 0 matrix with a size of x*y
Note that this is a two parenthesis, inside (3,5) is a tuple tuple, outside the call function itself is required by the parentheses
You may notice that the above 0 and 1 have a decimal point, because it defaults to the float type, not the integer type, and you can set the type manually.
Note: The above is just a demo display, and did not assign the d[0] This array to a variable, so it is not saved. In the actual process should be saved to a variable, such as New_array=d[-1]
Numerical Analysis Method Library scipy
With the Np.array array, you can use the scipy to provide a lot of numerical calculation of the function method.
They all depend on numpy, but each is fundamentally independent. The standard way to import NumPy and these scipy modules is to:
Import NumPy as NP
From scipy Import Stats # Other sub-modules are the same
The main scipy namespaces mostly contain real numpy functions (try Scipy.cos is Np.cos). These are only for historical reasons and there is usually no reason to use import scipy in your code
Each function is used differently, but it can be very simple to invoke and practice, and it is also highly efficient.
http://reverland.org/python/2012/10/22/scipy/
There are examples of how all functions are used, and you can refer to
Graphic Drawing Library Matplotlib
Unlike the scipy, which provides an array of NumPy, which provides a numerical calculation function, the Matplotlib module is a module for drawing.
Of course, Matplotlib's drawing still relies on NumPy's NARRAY data structure.
Matplotlib Simple Drawing case:
The first step is to import the module first
The second step is to generate the argument space between 0~5, which is 10 times apart
Third, generating the dependent variable space requires only y=f (x), such as Y=3*x+8
Finally, you can draw with Plt.plot (x, y)
At this point, the picture is still in memory, not generated, you can add the horizontal axis, ordinate the name and title, enter
plt.xlabel(‘xlabel‘)plt.ylabel(‘ylabel‘)plt.title(‘Function y=3x+8‘)
After adding all the other parts, enter Plt.show () to display the picture
Get the following picture: (including the name and title of the vertical axis)
You can also assign a value y1,y2
Then, draw again
Plus horizontal ordinate:
Finally, enter the PLT. Show () Get Image:
The value of the arguments here is not very appropriate (x range is too small, not dense).
You can redesign better x and Y values, and make the x-axis of the image more dense so you can draw a better image.
This will be the stop, the less can embark on a more difficult journey!
[Python Data analysis] Basic article 1-numpy,scipy,matplotlib Quick Start Guide