Sklearn database-"Old fish learning Sklearn"

Last Update:2017-12-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There is data to be trained when doing machine learning, but fortunately Sklearn provides a number of well-labeled datasets for us to train.
This section looks at what data sets are available for training in Sklearn.

This data is located in Datasets, at the URL: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets

Room Rate Data

Loading Boston house price data, which can be used for linear regression:
Sklearn.datasets.load_boston:http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_ Boston.html#sklearn.datasets.load_boston
Loading mode is:

fromimport= load_boston()print(boston.data.shape)

The shape of this data set is:

(50613)

That is, 506 rows, 13 columns, and here 13 columns are 13 attributes that affect the house price, specifically which properties can be printed by the following code:

print(boston.feature_names)

The output is:

[‘CRIM‘‘ZN‘‘INDUS‘‘CHAS‘‘NOX‘‘RM‘‘AGE‘‘DIS‘‘RAD‘‘TAX‘‘PTRATIO‘‘B‘‘LSTAT‘]

Specific representative what meaning, either own guess, or online check it, I do not explain, I guess a few: rm:room number, that is, the house of several rooms, age:age (intervention), do not know guess right, we have to practice.

You say how I know this data set has the Feature_names attribute, I do not know, I just put the above Boston the whole print out to see that there is this attribute.

Forecast Price case

 fromSklearn.datasetsImportLoad_boston fromSklearn.linear_modelImportLinearregression fromSklearn.model_selectionImportTrain_test_split# Load Rate dataBoston=Load_boston () data_x=Boston.datadata_y=Boston.target# Split Training sets and test setsX_train, X_test, Y_train, y_test=Train_test_split (data_x, data_y, test_size=0.3)# Create a linear regression modelModel=Linearregression ()# Training ModelModel.fit (X_train, Y_train)# Print out the top 5 price data for the forecastPrint("Top 5 Price data for forecasts:")Print(Model.predict (x_test) [:5])# Print out the first 5 data of the real room rate in the test setPrint("The first 5 data of real house prices in the test set:")Print(y_test[:5])

Output:

预测的前517.44807408  27.78251433  18.8344117   17.85437188  34.47632703]测试集中实际房价前514.3  22.3  22.6  20.6  34.9]

Taking the first data in this result set as an example, we predict that the price of a house is 174,000, while the actual price is 143,000.

But to be honest, the above price data can only be used for testing algorithms, we really want to predict the price, the original data is not so full and regular, so in machine learning, data collection and cleaning is also a very important work, dirty live dirty also have to dry, the light has the algorithm is useless.

The data in front of the flower is already spoken, and there is no repetition here.

Handwritten digital recognition data

and handwritten numeral recognition, this is also very common: http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html# Sklearn.datasets.load_digits

Create Sample Data

You can also generate some virtual data, which is located in the API document of the official Website Samples Generator section:

The case source code is:

fromimport make_regressionimportas plt# 创建100个样本，1个属性值的数据，输出一个目标值，同时也设置了噪音= make_regression(n_samples=100, n_features=1, n_targets=1, noise=10)print(X.shape)print(y.shape)# 对X,y画散点图，看看长啥模样的plt.scatter(X, y)plt.show()

The output data is:

(1001)(100,)

That is, the X value has 100 rows and 1 columns, and the Y value is the value of 100 rows.

The output graph is:

Looks close to a straight line.

Sklearn database-"Old fish learning Sklearn"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sklearn database-"Old fish learning Sklearn"

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sklearn database-"Old fish learning Sklearn"

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support