Model and cost Functionmodel representation | Model representation
To establish notation for future use, we'll use \ (x^{(i)}\) to denote the "input" vari Ables (living area on this example), also called input features, and \ (y^{(i)}\) to Denot E the "output" or target variable that is trying to predict (price). A pair (\ (x^{(i)}\) , \ (y^{(i)}\) ) is called a Training example, and the dataset that we'll be using to learn-a List of M training examples \ (( x^{(i)}\) , \ (y^{(i)}), I=1,..., m-is\) called a training set. Note that the superscript "(i)" Under the notation is simply a index into the training set, and have nothing to does with Expon Entiation. We'll also use X to denote the space of input values, and Y to denote the space of the output values. In this example, X = Y =?.
In order to establish the symbols for future use, we will use \ (x^{(i)}\) to represent "input" variables (the living area in this example), also called input features,\ (y^{(i)}\) to represent "output" or target variables we try to predict (price). A pair (\ (x^{(i)}\),\ (y^{(i)}\)) is called a training sample, and we will use the data set for learning-M training sample list \ ((x^{(i)}\),\ (y^{(i) }); i= 1,...,m-is\) is called the training set. Notice that the superscript "(i)" in the symbol is just an index in the training set, regardless of the exponentiation. We will also use X to represent the space for the input value, and Y to represent the space for the output value. In this example, X = Y =?.
To describe the supervised learning problem slightly more formally, our goal are, given a training set, to learn a function H:x→y So, H (X) is a "good" predictor for the corresponding value of Y. For historical reasons, the This function h is called a hypothesis. Seen pictorially, the process is therefore like this:
To more formally describe the supervised learning problem, our goal is to learn the function \ (h:x→y\)in the case of a given training set, so that \ (h (X) \) is a "good" predictor of the corresponding value of \ (y\) . For historical reasons, this function h is called hypothesis (hypothesis). Look at the image, the process is this:
When the target variable this we ' re trying to predict are continuous, such as in our housing example, we call the learning Problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a DW Elling is a house or an apartment, say), we call it a classification problem.
When we try to predict the target variable is continuous, for example in our housing example, we refer to the learning problem as a regression problem. When y can only accept a small number of discrete values (for example, if the living area is considered, we want to predict whether a dwelling is a house or an apartment), we call it a classification problem.
Cost Function | Cost function
We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of a average) of the results of the hypothesis with Inpu TS from x s and the actual output Y ' s.
\ (J (θ_0,θ_1) =\frac{1}{2m}\sum_{i=1}^{m} (\hat{y_i}?y_i) ^2=\frac{1}{2m}\sum_{i=1}^{m} (h_θ (x_i)? y_i) ^2\)
To break it apart, it was \ (\frac1 2 \bar{x}\) where \ (\bar{x}\) is the mean of the squares of $h _θ (x_i)? Y_i $, or the difference between the predicted value and the actual value.
This function is otherwise called the "Squared error function", or "Mean squared error". The mean is halved \ ((\frac1 2) \) as a convenience for the computation of the gradient descent, as the derivative Term of the square function would cancel out the \frac1 2\ . The following image summarizes what is the cost function does:
We can measure the accuracy of our hypothetical functions by using cost functions. This requires an assumption of the average difference in all the results (actually a more beautiful version of the average) with input from X and actual output Y.
\ (J (θ_0,θ_1) =\frac{1}{2m}\sum_{i=1}^{m} (\hat{y_i}?y_i) ^2=\frac{1}{2m}\sum_{i=1}^{m} (h_θ (x_i)? y_i) ^2\)
Break it down, where X is the average of hθ (xi)-yi squared, or the difference between the predicted value and the actual value. This function is called "Squared error function" or "mean square error". Because the differential term of the square function offsets the 12th item, the average value is halved (12) to facilitate the calculation of gradient descent. The effect of cost function is summed up in slices:
Machine Learning Learning Note "Two" ——— Model and cost Function