Let's see the two books together. Of course, I think Spark's practice is really great. There is another series of articles that discuss spark.
/users/baidu/documents/data/interview/Machine Learning-Data Mining/"machine learning _ Zhou Zhihua. pdf"
Altogether 442 pages. Can this weekend first swallowed read it. hahaha.
P1 generally use a model to refer to a global result (such as a decision tree), using a pattern to refer to a local result (such as a rule).
P3 If the predicted is a discrete value, that is the classification-classification, if the predicted is a continuous value, it is called regression-regression.
P3 clustering, starting with no labels. Depending on whether the label is started, it is divided into supervised learning (supervised learning) (labeled) and unsupervised learning (no labeling). Classification, regression is the former; clustering is the latter.
The ability to learn models suitable for new samples is called generalization capability (generalization); strong generalization ability, good. Assuming that the entire sample in the sample space obeys an unknown distribution (distribution), each sample we obtain is obtained independently from this distribution, i.e. the independent distribution (independent and identically distributed, IID). Generally, the more samples you get, the more information you get distributed.
P4 Hypothetical space
Induction (induction) and deduction (deduction) are two major means of scientific reasoning. The former is from the special to the general "generalization" (generalization), the latter is from the general to the special ' specialization ' (specialization). In the system of mathematical axioms, the theorems derived from axioms and derivation rules are deductive, and learning from examples is inductive, also known as inductive learning.
The narrow-sense induction learning requires the concept from the sample, but it is too difficult, the application is very few, now most of the learning is the black box model.
P6 the preference for a certain type of hypothesis in the machine learning process becomes ' inductive preference ' or simply preference.
P9 NFL theorem, no free Lunch theorem, refers to the occurrence of the opportunity "the same" problem, no matter what solution, the same effect. The most important implication is that the algorithm is good or bad, it must be discussed according to the problem.
"Reading Notes" machine learning-Zhou Zhihua & Machine learning Combat (Python)