Scikit-learn Combat Iris DataSet Classification
1. Introduction to the iris DataSet
The iris DataSet is a commonly used classified experimental dataset, collected and collated by Fisher, 1936. Iris, also known as Iris Flower DataSet, is a class of multivariate analysis data sets. The dataset contains 150 datasets, divided into 3 classes, 50 data per class, and 4 properties per data. The length of calyx, calyx width, petal length, petal width 4 properties of the predicted Iris flowers belong to (setosa,versicolour,virginica) three species in which category. 2. Use Scikit-learn to load the iris data and divide the training set and test machine set
Because the iris dataset is very common, sklearn with the iris DataSet, the iris dataset can be loaded using the Load_iris method, and the Train_test_split method makes it easy to divide the original dataset into two parts, which are used for training and testing respectively. By default, Train_test_split divides 25% of the data into the test set, 75% of the data is divided into the training set, and random_state guarantees the reentrant nature of the random sampling.
Import Matplotlib.pyplot as Plt
import NumPy as NP from
sklearn.datasets import Load_iris from
Sklearn.model _selection Import train_test_split
iris = Load_iris ()
x_train, X_test, y_train, y_test = Train_test_split (iris[ ' Data '], iris[' target ', random_state=0)
3. Next we use the pair plot to visualize the data-attribute-related situation.
Fig, ax = plt.subplots (3, 3, figsize=)
plt.suptitle ("Iris_pairplot") for
I in range (3): for
J in range (3):
ax[i, J].scatter (x_train[:, J], x_train[:, i + 1], C=y_train, s=60) ax[i, J].set_xticks
(())
Ax[i, J]. Set_yticks (())
if i = = 2:
ax[i, J].set_xlabel (iris[' feature_names '][j])
If j = 0:
ax[i, J].set_ylabel (iris[' feature_names '][i + 1])
If J > I:
ax[i, J].set_visible (False)
plt.show ()
4. Using the KNN training model and making predictions and evaluations
From sklearn.neighbors import kneighborsclassifier
KNN = Kneighborsclassifier (N_neighbors=1)
Knn.fit (X_ Train, Y_train)
X_new = Np.array ([[5, 2.9, 1, 0.2]])
prediction = knn.predict (x_new)
print (iris[' target_names '][prediction] )
y_pred = Knn.predict (x_test)
print (Np.mean (y_pred = = y_test))
print (Knn.score (x_test, Y_test))
$0.973684210526
$0.973684210526