The type of the self-brought dataset is as follows:
# comes with a small data set
# sklearn.datasets.load_<name>
# Download Data Set online
# sklearn.datasets.fetch_<name>
# Computer-generated datasets
# sklearn.datasets.make_<name>
# SVMLIGHT/LIBSVM Format Data set
# sklearn.datasets.load_svmlight_file (PATH)
# mldata.org Online download site data set
# sklearn.datasets.fetch_mldata (PATH)
Take Iris data as an example to introduce the use of your own data set.
Basic use:
Import Sklearnimport Matplotlib.pyplot as plt# load DataSet iris = Sklearn.datasets.load_iris () # Iris Data # Print the type print in the data set ( Iris.keys ()) # Dict_keys ([' Target ', ' data ', ' feature_names ', ' DESCR ', ' Target_names ']) # target: Tag # data: # Feature_names: Feature name, list, generated by sort in data # Target_names : Label name, list, generate print (Iris.target.shape) print by Target in order ( Iris.data.shape) print (iris.feature_names) print (iris.target_names) # (4) # [' Sepal Length (cm) ', ' Sepal Width (cm) ', ' petal length (cm) ', ' petal width (cm) ']# [' setosa ' versicolor ' virginica ']
To draw a histogram with a feature:
X_index = 3colors = [' Blue ', ' red ', ' green ']for label, color in Zip (range (len (iris.target_names)), colors): plt.hist ( Iris.data[iris.target==label, X_index], label = Iris.target_names[label], color=color) Plt.xlabel (iris.feature_ Names[x_index]) plt.legend (loc= ' upper right ') plt.show ()
Plot a scatter plot with two features:
X_index = 0y_index = 1colors = [' Blue ', ' red ', ' green ']for label, color in Zip (range (len (iris.target_names)), colors):
plt.scatter (iris.data[iris.target = = label, X_index], iris.data[iris.target = = label, Y_index], label= Iris.target_names[label], # legend Content Color=color) Plt.xlabel (Iris.feature_names[x_index]) Plt.ylabel ( Iris.feature_names[y_index]) plt.legend (loc= ' upper Right ') # show Legend Plt.show ()
Other small datasets (load) are the same, and you don't have to panic when you encounter tutorials later.
"Sklearn" comes with DataSet API