We can see the property information of the house.
That is, the characteristics of houses.
To better visualize the characteristics of a house, we use graphs to visually display the characteristics of the house.
Use graphlab canvas as described earlier
Redirect to current page display
Next, let's build a regression model.
The data used to fit the model is called a training set.
The replicas used for real prediction are called test sets.
Steps:
- Separate the test set of the training set
Note: it is better to directly call the random_split method of sframe,
The first parameter in the brackets is the random proportion, and the second parameter is whether to save the state of the data in the two parts of the random separation, 0, indicating, the next time we use the two data records that are separated for the first time
- Build a regression model
We use the linear function of graphlab plus the training set + target y + feature value x.
To build a regression model
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while. You can set ``validation_set=None`` to disable validation tracking.Linear regression:--------------------------------------------------------Number of examples : 16480Number of features : 1Number of unpacked features : 1Number of coefficients : 2Starting Newton Method--------------------------------------------------------+-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+| Iteration | Passes | Elapsed Time | Training-max_error | Validation-max_error | Training-rmse | Validation-rmse |+-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+| 1 | 2 | 1.060747 | 4337721.604860 | 1987870.095446 | 264377.961084 | 235343.331161 |+-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+SUCCESS: Optimal solution found.
- Evaluate the linear regression equation generated above
View the mean value of the test Dataset:
We use the regression model generated above for testing;
As you can see, the error is still very large.
- Use a graphical view of what our predictions look like (intuitive display)
Use a third-party image library matplotlib: drawing tool
% Matplotlib inline indicates that the image is drawn on the current page, which is also equivalent to redirection.
Start drawing:
Mainly: Define the X and Y axes, and use the dot to represent this (x, y)
Original flash point chart + prediction chart
Two functions are defined. The first XY indicates the flash point graph of the original test set, which is represented by vertices;
However, our second XY represents the predicted value of our regression equation, expressed in a short horizontal line.
Now we want to know the two parameters of the regression equation we have constructed:
Intercept and slope
5. Explore other features in the data
Custom feature value:
Start building
My_features_model = graphlab. linear_regression.create (train_data, target = 'price', features = my_features)
Comparison, but feature model and multi-Feature Model
It can be seen that the effects of multiple features are better than those of a single house.
Run the learned model to predict the house price
Actual house price (we randomly selected a house from the original data set)
The actual price of this House is 0.62 million
Start prediction below:
Analysis:
Models that do not necessarily cover many features are more accurate than those with fewer features.
Similarly, we predict another
Original Price: 2.2 million
Start prediction:
Conclusion: It is very likely that the effect of a single feature is better than that of multiple features, or that the effect of multiple features is better than ours, depending on the data
Build a house prediction regression model