I've been talking about why machines can learn, and starting with this lesson are some basic machine learning algorithms, i.e. how machines learn.
This lesson is about linear regression, starting with the minimization of Ein, introducing the Hat Matrix to understand the geometric meaning. Finally, the linear regression and binary classification are compared, and the reason why linear regression can be used to do binary classification is explained. The contents of the whole lesson can be expressed in the following diagram:
This course is more theoretical than the linear regression of other courses and has a deeper understanding of the subject after reading it.
Reference: http://beader.me/mlnotebook/section3/linear-regression.html
This blog post is very good and instructive.
This article is also good: http://www.cnblogs.com/ymingjingr/p/4314665.html
Coursera Big Machine Learning Course note 8--Linear Regression for Binary classification