1 Learning Goals:
- Learn the basic TensorFlow concept
- Using classes in TensorFlow
LinearRegressor and predicting the median house value of each city block based on a single input feature
- Estimating the accuracy of model predictions using RMS error (RMSE)
- Improve model accuracy by adjusting the model's hyper-parameters
Note: Data is based on California State 1990 census data.
2 settings
You first need to load the necessary libraries.
from __future__ Importprint_functionImportMath fromIPythonImportDisplay fromMatplotlibImportcm fromMatplotlibImportGridspec fromMatplotlibImportPyplot as PltImportNumPy as NPImportPandas as PD fromSklearnImportMetricsImportTensorFlow as TF fromTensorflow.python.dataImportDatasettf.logging.set_verbosity (Tf.logging.ERROR) pd.options.display.max_rows= 10Pd.options.display.float_format='{:. 1f}'. format
Next load Data set
California_housing_dataframe = Pd.read_csv ("https://download.mlcc.google.cn/mledu-datasets/ California_housing_train.csv", sep=",")
The data is randomized to ensure that no ill-ordered results (which may damage the effect of the random gradient descent method) occur. In addition, we will adjust the median_house_value to thousands, so that the model can learn this data more easily with the learning rate in common range.
California_housing_dataframe = California_housing_dataframe.reindex ( np.random.permutation (California_ Housing_dataframe.index)) california_housing_dataframe["median_house_value"]/ = 1000.0california_housing_dataframe
Run
3 Checking data
The following output is a quick summary of some of the useful statistics for each series: sample number, mean, standard deviation, maximum, minimum, and various levels of separation.
California_housing_dataframe.describe ()
Run
4 Building a first model
In this exercise, we will try to predict median_house_value that it will be our label (sometimes called the target). We will use it total_rooms as an input feature.
Note : We are using data at the city block level, so this feature represents the total number of rooms in the corresponding block.
To train the model, we will use the Linearregressor interface provided by the TensorFlow estimator API. This API handles a lot of low-level model building work and provides a convenient way to perform model training, evaluation, and inference.
4.1 1th Step: Defining Features and configuring feature columns
In order to import our training data into tensorflow, we need to specify the type of data each feature contains. In this exercise and in future exercises, we will mainly use the following two types of data:
categorical data : A type of literal data. In this exercise, our housing datasets do not contain any classification features, but examples you might see include home style and real estate advertising.
Numeric data : A number (integer or floating point) data and data that you want to be treated as a number. Sometimes you might want to treat numeric data, such as postal codes, as categorical data (which we'll explain in more detail later in this section).
In TensorFlow, we use a structure called a " feature column " to represent the data type of a feature. The feature column stores only the description of the feature data, and does not contain the feature data itself.
At first, we used only one numeric input feature total_rooms . The following code extracts the data from and california_housing_dataframe total_rooms uses the numeric_column definition feature column, which specifies its data as a numeric value:
# Define the input feature:total_rooms. My_feature = california_housing_dataframe[["total_rooms"]# Configure A numeric feature column for Total_rooms. Feature_columns = [Tf.feature_column.numeric_column ("total_rooms")]
Note : total_rooms The shape of the data is a one-dimensional array (a list of the total number of rooms per block). This is numeric_column the default shape, so we don't have to pass it as a parameter.
The starting step for using TensorFlow