Machine Learning's Neural Network 3

Source: Internet
Author: User

Organized from Andrew Ng's machine learning course week6.

Directory:

    • Advice for applying machine learning (Decide-to-do next)
      • Debugging a Learning Algorithm
      • Machine Learning Diagnostic
      • Evaluating a hypothesis
      • Model selection and Train/validation/test set
    • Bias and Variance
      • Diagnosing bias and variance
      • Regularization and Bias/variance
    • Learning curve
      • High bias
      • High Variance
    • Summary of decide what does next
    • Diagnosing neural network
    • Model complexity Effects
    • Build a spam classifier
      • Prioritzing.
      • Error Analysis
    • Error Matric for skewed classes
      • Precision/recall
      • Trade off precision and recall
    • Data for machine learning

1. Advice for applying machine learning (Decide-to-do next)

1.1, Debugging a learning algorithm

Suppose you use regularized linear regression to solve the problem of predicting house prices, but when you use models to predict new house price data, you find that the predictions are very error--what do you do next?

    • Get more training data?
    • Trying to use a smaller set of features?
    • Trying to use more features?
    • Trying to increase $\lambda$?
    • Trying to reduce $\lambda$?

What do you do with the above practices? Do you rely on intuition?

In reality, people often rely on intuition to pick a particular practice, such as getting more training data, but when they spend a lot of time to do it, they find that the performance of the model does not improve, relying on intuition is clearly a cost-effective way

1.2, machine learning Diagnostic

A test that you can run to gain insight what is/isn ' t working with a learning algorithm, and gain guidance as-best To improve its performance.

Machine learning Diagnostic is a test that lets us know if a particular algorithm is useful and can guide us in how to maximize the performance of the model.

Diagnostic may take your time, but it is worth it to find that the results are not good compared to spending a lot of time in combat.

1.3, evaluating a hypothesis

How to evaluate a model for good or bad? As has been said before, the training model does well, does not necessarily mean that the model is good, the model is likely to be over-fitting (such as), then for the new data set, the model may not do well.

Therefore, it is not possible to evaluate the model with the error in the training data set.

The usual practice is to divide the dataset into training data (70%) and test data (30%), and then:

    • Train the model with training data and get the model parameters
    • Use the above model to predict the data on the test data set and calculate error errors on the test set

Use the error on the test set to assess the quality of the model

For linear regression:error Yes ($\frac{squared and}{m_{test}}$)

For logistic regression:

1.4. Model selection and Train/validation/test set

If there are multiple models alg1,alg2,alg3 ..., as shown below, 10 different order models (representing different model complexity), we need to choose the best one, so how to choose (according to the criteria to choose)?

If we follow the practice of 1.3, first train the model with training data, then use the test data to calculate the error, and choose the model with the least error on the test data set as our final selected model;

Think about it, if so, in fact, and only using the training data error evaluation model is a good idea, at this time is chosen only in the test set we use the least error model, but with other test sets, may not be the current model error is minimal, then this error does not really reflect the quality of the model;

The usual practice is to divide the dataset into train set/training set (60%), cross validation set/validation set (20%), test set/test set (20%), and then:

    • Train multiple models with training sets to get the parameters for each model
    • Using these trained models to predict the validation set, calculate the error on the validation set, and select the model with the least error.
    • Calculates the error of the test set on the model with the smallest error as the evaluation criterion of this model

The error of each set is calculated as follows:

2, Bias and Variance

In the previous article, I have been exposed to the concepts of bias and variance, and here we review:

The training set and validation set and error calculations are as follows:

2.1. Diagnosing bias and variance

As the D (polynomial order) increases, the training error and cross validation error change as follows:

When the model is relatively simple, the degree of training is low, there is less fitting phenomenon (high bias), at this time the training error and test error (or validation error) are relatively large, when D is getting bigger, more and more training, training errors and test errors are reduced, after reaching a point, the model is too complex, The training error continues to decrease, but at this point there has been an over fitting phenomenon (high variance), and the test error is increasing, namely:

    • Bias (underfit): Training error and test error are high, the two are similar
    • Variance (Overfit): The training error is small, the test error is very large, far greater than the training error

2.2, Regulization and Bias/variance

$\lambda$ is a penalty factor, as shown in:

    • When the $\lambda$ is very large, the model is equivalent to a constant estimate, and the image is approximated to a horizontal line, less than fit (high bias)
    • When the $\lambda$ is very small, the penalty is not very useful and the model appears to fit (high variance)
    • We need to pick a proper $\lambda$ to make the model just right

So the question is, how to choose a proper $\lambda$? We can try:

    • Set a series of $\lambda$=[0, 0.01, 0.02, 0.04, 0.08, ..., 10.24], such as these 12 $\lambda$;
    • Constructs a different set of models (with different degrees or other variants)
    • For each model traversing each $\lambda$, training data is used to train the model of these combinations, and the final model parameters are obtained.
    • Calculates the error on the validation set;
    • Select the model that makes the error of the validation set smallest in the model;
    • Calculates the error of the selected model on the test set, and evaluates the resulting model;

With a series of $\lambda$ we can see changes in training error and CV (cross validation) error as $\lambda$ increases:

    • When $\lambda$ very hour, train error is very small, test error is very large, at this time a high degree of training, model over-fitting;
    • With the increase of $\lambda$, the train error is getting larger, the test error is decreasing, and the model overfitting phenomenon is easing;
    • When the $\lambda$ increases to a certain point, train error and test error begin to grow, because the model is not fit at this time;

3, Learning Curves

This section is primarily to assess the relationship between the size of the training data set and the model error, for training errors:

    • When the training set is very small (one-to-three training data), train error will be close to 0, because if the point is small, of course you can find a curve almost perfect fitting these points;
    • When the training set becomes larger, the train error will be correspondingly larger;
    • When the data volume of the training set reaches a value, the train error tends to be constant;

3.1, high bias

The red line is supposed to reach the lower limit of error, training errors and test errors are far higher than the ideal error

This can be used to explain that the problem is that the model is too simple to get more training data at this time is not helpful to the model

3.2, High variance

The test error is much larger than the training error, and there is an over fitting phenomenon at this time;

Can be used to explain this phenomenon, the model is too complex, the penalty is not in place, so the model has been a fitting phenomenon, at this time if you can increase the number of training data, the model should be helpful;

4. Summary of decide what does next

In response to the questions raised at the beginning of the article, after a series of carding, with the answer:

    • Get more training data? Resolve High variance issues (overfitting)
    • Trying to use a smaller set of features? Resolve High variance issues (overfitting)
    • Trying to use more features? Resolve high bias issues (underfitting)
    • Trying to increase $\lambda$? Resolve High variance issues (overfitting)
    • Trying to reduce $\lambda$? Resolve high bias issues (underfitting)

5. Diagnosing Neural network

    • The neural network with less parameters is easy to underfitting, and also computationally cheaper;
    • Neural network with more parameters is easy to overfitting, but also computationally expensive (at this time can be regulization to deal with the fitting problem);
    • The default is a layer of hidden layers, but you can also use the CV set to select the most appropriate hidden layer number;

6. Model complexity Effects

    • The lower order polynomial (simple model) has high bias and low variance, and the error of training set and test set is large;
    • Higher order polynomial (complex model) has high variance and low bias, the training set error is small, the test set error is large;
    • What we want to get at the end is a neutralization model, good ability to fit data, and good generalization ability.

7. Build a spam classifier

7.1. proritizing

System design: For a given message data, you can design a vector for each message, each element of the vector represents a word,word in the message is set to 1, not set to 0, the most commonly found in the search message words.

As shown in the following:

So, how to improve the accuracy of the classifier?

    • Collect more data?
    • To add more empirical features? (header data in spam)
    • Processing input data? (Reprocessing miss-spelling)

7.2. Error Analysis

The recommended methods for solving machine learning problems are:

    • Start with a simple algorithm,implement it quickly, and test it on CV data
    • Draw learning curve to decide whether some strategies are useful, such as more Data,more features
    • Manually detect CV errors, identify the most common errors, and analyze them.

For example, 5000 e-mails, 100 were wrongly divided, thinking:

    • These 100 messages belong to the category of the message (for example, mostly steal password, miss-spelling)
    • What is the way to put this type of mail pairs (such as adding new features), and then add the new strategy into the model;

Once you get a new strategy, add it to the model to see if the result is better, and you need a judging standard, a simple and rude criterion;

In evaluating the model we have to get a real error number, if the new strategy after the addition of the error is reduced, it is worthwhile to continue to do, otherwise there is no need to do it anymore;

8. Error Metrics for skewed classes

For the cancer discrimination problem, trained classifier in the test set to achieve a 1% error rate, at first glance think the results are very good ah;

But, in fact, only 0.5% of the patients in the test set have cancer, that is, if our classifier is simply a simple one that will not have cancer, or 0.5% of the error rate, can we say that the classifier is better than the previous 1% error rates classifier? Obviously not, because it's not a classifier at all!

This type of distribution is referred to as skewed classes, where error rate is used to evaluate whether the model is progressing or not.

8.1, Precision/recall

For skewed classes, the common error analysis method is Precision/recall

$precision =\frac{true Positive}{number of predicted positive}=\frac{true positive}{ture positive + False positive}$

Represents the proportion of people who are actually suffering from cancer in all patients who predict cancer

$recall =\frac{true Positive}{number of actural positive}=\frac{true positive}{ture positive + False negative}$

Represents the predicted proportion of patients who actually have cancer

8.2. Trade off precision and recall

Precision and recall are a quantity that need to be combined with the actual situation;

For example, in the cancer problem, using the logistic regression,

If in order to appease the patient's psychology, only when very certain (the probability big) patient has the cancer to be judged to have the cancer (y=1), then will set the threshold in the classification very big, for example 0.7, 0.9, at this time, we have higher precision,lower recall

If, for the sake of the patient's health, there are only a few possibilities to be judged as cancer so that he can be treated, the threshold in the classification will be set very large, such as 0.3, at which point we have higher recall,lower precision

The images of precision and recall are similar to the following, and the curves are indeterminate:

Now there are two values, how do we choose the model, how to guide some strategies to improve the role of the model?

Here's the $f_1 SCORE=2\FRAC{PR}{P+R}$,F1 score is a value that takes into account both precision and recall:

    • When P=0 or r=0, the F1 score=0
    • When P=1 and R=1, then F1 score=1

9. Data for machine learning

Someone once said: It's not a who had the best algorithm that wins, it's who had the most data. See two examples:

    • There are features that can provide enough information, $x \in r_n$, at this time to predict Y;

For example: for breakfast I ate _____ eggs. (two,too,to)

    • Predict house prices in the case of only housing area information;

Think: For an expert (for a given question), can they answer? For the 1th question, it is possible; for the second issue, it is obviously not possible;

So back to the original question, will big data be sure to win?

Not necessarily, if the information is not enough, no more data will be used

Machine Learning's Neural Network 3

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.