Classification with HDF5 data
1. Import the Library
1 ImportOS2 Importh5py3 ImportShutil4 ImportSklearn5 ImportTempfile6 ImportNumPy as NP7 ImportPandas as PD8 Importsklearn.datasets9 ImportSklearn.linear_modelTen ImportMatplotlib.pyplot as Plt One%matplotlib Inline
2. Generating data
sklearn.datasets.make_classification generate test data.
10000 sets of data, the feature vector dimension is 4.
Sklearn.cross_validation.train_test_split for cross-validation. is to split the data into different train set and test set.
This is split into 7,500:2,500
1 X, y = sklearn.datasets.make_classification (2 n_samples=10000, n_features=4, N_ Redundant=0, n_informative=23 n_clusters_per_class=2, Hypercube=false, random_state=0 4)56# Split into train and test7 X, Xt, y, yt = Sklearn.cross_validation.train_test_split (X, y)
3. Visualization of data
1 #Visualize sample of the data2 #np.random.permutation generating sequences or random exchange sequences3 #x.shape=75004 #this generates 0-7499 random sequence sequences and takes the pre -5IND = Np.random.permutation (X.shape[0]) [: 1000]6DF =PD. DataFrame (X[ind])7 #plot ' KDE ' kernel density estimation, ' hist ' Histogram8_ = Pd.scatter_matrix (DF, figsize= (9, 9), diagonal='KDE', marker='o', s=40, alpha=.4, C=y[ind])
Pd.scatter_matrix function Description
1 defScatter_matrix (frame, alpha=0.5, Figsize=none, Ax=none, grid=False,2Diagonal='hist', marker='.', density_kwds=None,3Hist_kwds=none, range_padding=0.05, * *Kwds):4 """5 Draw a matrix of scatter plots.6 7 Parameters8 ----------9 Frame:dataframeTen alpha:float, optional One amount of transparency applied A figsize: (float,float), optional - a tuple (width, height) in inches - Ax:matplotlib Axis object, optional the Grid:bool, optional - setting this to True would show the grid - Diagonal: {' hist ', ' KDE '} - pick between ' KDE ' and ' hist ' for + either Kernel Density estimation or histogram - plot in the diagonal + marker:str, optional A matplotlib marker type, default '. ' at hist_kwds:other plotting keyword arguments - To is passed to hist function - density_kwds:other plotting keyword arguments - To is passed to kernel density estimate plot - range_padding:float, optional - relative extension of axis range in X and y in with respect to (X_max-x_min) or (y_max-y_min), - Default 0.05 to kwds:other plotting keyword arguments + To is passed to scatter function - the Examples * -------- $ >>> df = DataFrame (Np.random.randn (4), columns=[' A ', ' B ', ' C ', ' D '])Panax Notoginseng >>> Scatter_matrix (DF, alpha=0.2) - """
View Code
4.
Caffe note-taking routine learning (II.)