Lec1-4
machine learning details
Case-Does the bank issue credit cards to customers?
Data: customer's information [input]
Expected result: Issue a card to the customer, or not issue a card【output】
Machine learning: learn how the bank issues cards (will make the best profit later)
Potential model: There are some indicators that can help determine whether a user should be issued a card. When manually operated, customer information will be comprehensively considered, so that if the user is issued a card (or not), the bank's revenue will be greater.
Symbolic
input: x∈X (customer information)
output: y∈Y (card issuing/no card issuing)
The latent mode that the machine needs to learn (target function): f: X -> Y The ideal formula from X to Y.
Starting from the simplest one-dimensional data, suppose the training data D={(x1, y1), (x2, y2), ……, (xN,yN)}. The real f(x) is unique, but not We know that what machine learning has to do is to start from the data X and help us analyze the hypothetical function g(x), so that g(x) ~= y, that is, g(x) is infinitely close to f(x).
Then you can assume that g(x) is a function you might think of, such as a linear function, a quadratic function, etc. For a certain machine learning algorithm, the type of g(x) should be roughly selected, and the training process is just to adjust the function Parameters to fit the real data curve, so that y = g(x) is constantly approaching the real objective function y = f(x). g(x) tries to guess and simulate f(x) so that the result of g(x) can be approximated to f(x) on the known data.
There are many possibilities for g(x), but as the amount of information increases, the range of options for g(x) will be reduced. The data uses machine learning algorithm A to select a fitting function g that approximates the objective function f from the hypothesis set hypothesis. If machine learning has learned potential patterns and has improved skills, it is expected that the more similar g and f, the better.
Possible g(x) in credit card problem
g is hk selected from the hypothesis set H. hk is the possible value of g.
h1: Annual income is greater than 80w
h2: Debt is greater than 100,000
h3: work less than two years
...
Hypothesis set may contain good hypotheses or bad hypotheses. What machine learning does is to use algorithm A to select the best g from the set of hypotheses H.
The machine learning model refers to algorithm A and hypothesis H.
Complete process
Lec1-5 Machine Learning V.S. Data Mining/Artificial Intelligence/Statistics
Machine learning
Use the data to find a hypothesis g(x) that is similar to the desired target function f(x).
Data mining
Use the data to find out some interesting things. (For example, after a supermarket user purchases one thing, will he want to buy another thing-find out the correlation between the goods).
Machine Learning V.S. Data Mining-Same or Related
If the interesting thing is to find out hypothesis g similar to target function f like prediction, then machine learning and data mining are the same.
If something interesting is related to finding a hypothesis g similar to the target function f, then data mining can help machine learning do better, or machine learning can help data mining to dig out interesting things.
Slightly different
Traditional data mining also focuses on efficient calculations in large amounts of data.
very close
These two fields are very close, and it is difficult to find researchers who do only one of them.
Machine learning V.S. artificial intelligence-machine learning is a way to realize artificial intelligence
Artificial intelligence hopes that computers can perform smart things like chess and driving. Predicting is a very clever thing. Finding a g is very close to the f we want. From this perspective, machine learning is a way to realize artificial intelligence. There are many ways to achieve artificial intelligence.
Chess case
Traditional artificial intelligence: tree diagram-analyze the advantages and disadvantages of this next step;
Machine learning artificial intelligence: learn how to play chess from the data of chess players or play chess by yourself.
Machine learning V.S. statistics-statistics is a way to achieve machine learning
Statistics use data to make some inferences. For example, we don't know the probability of tossing a coin.
g is an inference result, f is something we don't know. From this perspective, statistics is actually a way to achieve machine learning.
Slightly different
Many traditional statistics tools will use machine learning, but statistics are based on mathematics. Many things will find ways to write down some hypotheses, and finally use provable results to say what deductions can be proved under such statistics. . Traditional statistics are mostly mathematical inferences. Machine learning starts from data calculations. Many algorithms pay more attention to how to calculate, rather than mathematical results.