This article is the 6th in a series of Python Big Data and machine learning articles that will introduce the NumPy libraries necessary to learn Python big data and machine learning.
The knowledge you will be able to learn through this article series is as follows:
Using Python for big data and machine learning
Apply spark for Big data analysis
Implement machine learning Algorithms
Learn to process numeric data using the NumPy library
Learn to use the Pandas Library for data analysis
Learn to use the Matplotlib library for Python drawing
Learn to use the Seaborn Library for statistical plotting
Dynamic visualization using the plotly library
Using Scikit-learn to process machine learning tasks
K-means Clustering
Logistic regression
Linear regression
Random Forest and decision tree
Natural language processing and junk mail filtering
Neural network
Support Vector Machine
In addition, the small part will embrace the changes and add other meaningful content according to the needs of the review. For example, add some related questions and so on.
What is NumPy
NumPy is a very important Python numerical computing extension Library, and the basic Python big data ecosystem relies on it, and because of the C-language library, it is very fast. Can say that we want to learn Python big data, must learn is the NumPy library.
Installing NumPy
If you have installed Anaconda based on the previous article, you have installed the NumPy library by default. If you want to install separately please continue to look down.
Commands to install using Conda:
Conda Install NumPy
Commands to install using PIP:
Pip Install NumPy
NumPy Array
This series of articles is mainly used in the NumPy array (arrays);
There are two basic forms of the NumPy array: vector (vector) and matrics (matrix)
Vectors are one-dimensional, while matrics are two-dimensional.
Open Jupyter and enter the following:
Import NumPy as NP
My_list = [A]
arr = Np.array (my_list)
Arr
The following results were run:
These are the general forms of vectors.
Continue to enter the following:
My_mat = [[1,2,3],[4,5,6],[7,8,9]]
Np.array (My_mat)
The following results were run:
These are the two-dimensional matrics matrices.
NumPy has its own range function.
Np.arange (0,10)
The results of the operation are as follows:
You can also specify the step Np.arange (0,10,2)
The results of the operation are as follows:
Generates a vector of all elements of 0 Np.zeros (3)
The results of the operation are as follows:
Generates a matrix of all elements of 0 Np.zeros ((5,5))
The results of the operation are as follows:
The same generation of vector and matrix with all elements 1 is np.ones (4), Np.ones ((2,3))
The results of the operation are as follows:
Np.linspace (0,5,20)
The first parameter is the starting point, the second argument is the end point, and the third parameter is the number of copies from the start to end distance.
The results of the operation are as follows:
Np.eye (4) Generate 4*4 matrix with the main diagonal of 1
The results of the operation are as follows:
Np.random.rand (5) generating random vectors
The results of the operation are as follows:
Np.random.rand (5,5) generates a random vector of 5*5
The results of the operation are as follows:
NP.RANDOM.RANDN (2) generates a standard normal distribution curve.
The results of the operation are as follows:
NP.RANDOM.RANDN (bis) Two-dimensional standard normal distribution curve
The results of the operation are as follows:
Tips:
Press the TAB key in the Jupyter input box to prompt the Lenovo menu, and press Shift+tab to prompt the function usage
Press the TAB key
Press the Shift+tab key
Np.random.randint (1,100) generates 1 random integers from 1 to 100 and does not contain 100
The results of the operation are as follows:
Np.random.randint (1,100,10) generates 10 random integers from 1 to 100 and does not contain 100
The results of the operation are as follows:
Some of the functions supported by the array type:
The reshape function can modify the dimensions of an array. For example:
arr = Np.arange (25)
Arr.reshape (5,5)
The results of the operation are as follows:
Max function: Max value
Min Function: Minimum value
Argmax function: Returns the index of the maximum value
Argmin function: Returns the index of the minimum value
Ranarr = Np.random.randint (1,100,10)
Ranarr.max ()
Ranarr.min ()
Ranarr.argmax ()
Ranarr.argmin ()
The results of the operation are as follows:
The shape function, which returns the size of the array
Dtype, return data type
Call Simplification:
From Numpy.random import Randint
We'll be able to use randint directly.
Randint (2,10)
The results of the operation are as follows:
Python Big Data and machine learning NumPy first Experience