']X_new_counts =count_vect.transform (docs_new) x_new_tfidf=tfidf_transformer.fit_transform (X_new_ Counts) predicted=clf.predict (X_NEW_TFIDF) fordoc,categoryinzip (Docs_new, predicted):print '%r=>%s ' % (doc,twenty_train.target_ Names[category]Categorize 2,257 of documents in Fetch_20newsgroups
Count the occurrences of each word
With TF-IDF statistics, TF is the number of occurrences of each word in a document divided by the total number of words in the document, IDF is the total
regression or nonlinear regression, is not as rich as the information contained in the model tree, so the model tree has higher prediction accuracy. Scikit-learn Implementation
#!/usr/bin/python
# Created by Lixin 20161118
import numpy as NP-
numpy import * from
sklearn.tree imp ORT decisiontreeregressor
import Matplotlib.pyplot as PLT
def plotfigure (X,X_TEST,Y,YP):
plt.figure ()
Plt.sca
GRIDSEARCHCV function to automatically find the optimal alpha value:
From Sklearn.grid_search import GRIDSEARCHCV
GSCV = GRIDSEARCHCV (Model (), Dict (Alpha=alphas), cv=3). Fit (X, y)
Scikit-learn also provides an inline CV model, such as
From Sklearn.linear_model import Ridgecv, LASSOCV
Model = RIDGECV (Alphas=alphas, cv=3). Fit (X, y)This method can get the same result as GRIDSEARCHCV, but if it
function, except kernel= ' sigmoid ' effect is poor, the other effect is not very different.Then there is the training and testing session, where it divides all the data into two parts. Half to do the training set, half to do the test set.Let's talk about the parameters of the test here. The first is Precision,recall,F1-score, support these four parameters.F1-score is through Precision,recall the two are counted. formulas such as:Support is the supporting degree, which indicates the number of
Words don't say much, directly on the code
1 Code implementation and results screenshot,
#coding: Utf-8#使用skflow内置的LR, the integrated regression model in Dnn,scikit-learn predicts "US Boston house prices"From Sklearn import datasets,metrics,preprocessing,cross_validation#读取数据Boston=datasets.load_boston ()#获取房价数据特征及对应房价X,y=boston.data,boston.target#数据分割, 25% tests.X_train,x_test,y_train,y_test=cross_validati
Because of the recent intention to learn "machine learning combat" this book, so using Python may be used NumPy, matplotlib, scikit-learn These libraries, so the Internet to find how to install these libraries, look at a number of methods, after trying to find themselves very lucky, Soon it's done, and it's not complicated. Let's get down to business!
1, to th
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.