Sklearn database-"Old fish learning Sklearn"

Source: Internet
Author: User

There is data to be trained when doing machine learning, but fortunately Sklearn provides a number of well-labeled datasets for us to train.
This section looks at what data sets are available for training in Sklearn.

This data is located in Datasets, at the URL: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets

Room Rate Data

Loading Boston house price data, which can be used for linear regression:
Sklearn.datasets.load_boston:http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_ Boston.html#sklearn.datasets.load_boston
Loading mode is:

fromimport= load_boston()print(boston.data.shape)

The shape of this data set is:

(50613)

That is, 506 rows, 13 columns, and here 13 columns are 13 attributes that affect the house price, specifically which properties can be printed by the following code:

print(boston.feature_names)

The output is:

[‘CRIM‘‘ZN‘‘INDUS‘‘CHAS‘‘NOX‘‘RM‘‘AGE‘‘DIS‘‘RAD‘‘TAX‘‘PTRATIO‘‘B‘‘LSTAT‘]

Specific representative what meaning, either own guess, or online check it, I do not explain, I guess a few: rm:room number, that is, the house of several rooms, age:age (intervention), do not know guess right, we have to practice.

You say how I know this data set has the Feature_names attribute, I do not know, I just put the above Boston the whole print out to see that there is this attribute.

Forecast Price case
 fromSklearn.datasetsImportLoad_boston fromSklearn.linear_modelImportLinearregression fromSklearn.model_selectionImportTrain_test_split# Load Rate dataBoston=Load_boston () data_x=Boston.datadata_y=Boston.target# Split Training sets and test setsX_train, X_test, Y_train, y_test=Train_test_split (data_x, data_y, test_size=0.3)# Create a linear regression modelModel=Linearregression ()# Training ModelModel.fit (X_train, Y_train)# Print out the top 5 price data for the forecastPrint("Top 5 Price data for forecasts:")Print(Model.predict (x_test) [:5])# Print out the first 5 data of the real room rate in the test setPrint("The first 5 data of real house prices in the test set:")Print(y_test[:5])

Output:

预测的前517.44807408  27.78251433  18.8344117   17.85437188  34.47632703]测试集中实际房价前514.3  22.3  22.6  20.6  34.9]

Taking the first data in this result set as an example, we predict that the price of a house is 174,000, while the actual price is 143,000.

But to be honest, the above price data can only be used for testing algorithms, we really want to predict the price, the original data is not so full and regular, so in machine learning, data collection and cleaning is also a very important work, dirty live dirty also have to dry, the light has the algorithm is useless.

The data in front of the flower is already spoken, and there is no repetition here.

Handwritten digital recognition data

and handwritten numeral recognition, this is also very common: http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html# Sklearn.datasets.load_digits

Create Sample Data

You can also generate some virtual data, which is located in the API document of the official Website Samples Generator section:

The case source code is:

fromimport make_regressionimportas plt# 创建100个样本,1个属性值的数据,输出一个目标值,同时也设置了噪音= make_regression(n_samples=100, n_features=1, n_targets=1, noise=10)print(X.shape)print(y.shape)# 对X,y画散点图,看看长啥模样的plt.scatter(X, y)plt.show()

The output data is:

(1001)(100,)

That is, the X value has 100 rows and 1 columns, and the Y value is the value of 100 rows.

The output graph is:

Looks close to a straight line.

Sklearn database-"Old fish learning Sklearn"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.