Logical regression model, its own understanding of logic is equivalent to right and wrong, that is only 0, 1 of the case. This is what I saw in a great God, https://blog.csdn.net/zouxy09/article/details/20319673.
The logistic regression model is used to classify, and it is possible to know which factors are dominant so that an event can be predicted.
I downloaded from the internet a 2017 high school science one model transcripts, fuzzy names and schools, specific long such:
The last column is whether can over two, search the year two this line 480,sum>480 is 1, otherwise 0. A total of 10,002 data.
Step: 1, read the data.
2, the characteristics (influence factors) and results into a matrix form.
3, Import module Sklearn.linear_model under Randomizedlogisticregression, for instantiation.
4, through the Fit () training model.
5, through the Get_support () screening of effective features, is also the process of dimensionality reduction.
6, simplified model, training model.
Note that the sentence Y=dataf.iloc[:,7].as_matrix () cannot be written as Y=dataf.iloc[:,0:7].as_matrix (), which is a two-dimensional array, the former being a one-dimensional array, Otherwise, dataconversionwarning:a Column-vector y is passed when A 1d array is expected is present.
Code attached:
Import Pandas as PDA
Fname= "D:/xx/xx/2017yimo.xls"
Dataf=pda.read_excel (fname)
#数据框切片, turn into Matrix X=dataf.iloc[:,0:7].as_matrix ()
Y=dataf.iloc[:,7].as_matrix ()
From Sklearn.linear_model import logisticregression as LR
From Sklearn.linear_model import randomizedlogisticregression as RLR
R1=RLR ()
R1.fit (X,y)
R1.get_support (Indices=true)
Print (Dataf.columns[r1.get_support (indices=true)])
T=dataf[dataf.columns[r1.get_support (indices=true)]].as_matrix ()
R2=LR ()
R2.fit (T,y)
Print ("End of Training")
Print ("model correct rate:" +str (R2.score (t,y))
Because the amount of data is OK, so the correct rate is still very high.