Correlation analysis
is a kind of unsupervised information algorithm, Apriori is mainly used to do _ Association Analysis _,_ Association Analysis _ can have two forms: frequent itemsets or association rules. For example: Trading orders
Serial Number |
Product Name |
1 |
Books, Computers |
2 |
Mug, cell phone, phone case, plate |
3 |
Guzheng, mobile phone, mobile phone case, glass |
4 |
Cell Phone, glass |
5 |
TV, cell phone, phone case |
Frequent itemsets: {guzheng, cell phone, phone case, glass} is an example.
Association rules: Mobile phone---phone shell, buy mobile phone is a big chance to buy mobile phone shell.
The ideas used in association analysis
- Whether it is a frequent itemsets or association rules, it is necessary to see the frequency of occurrence, such as the mobile phone has a cell phone shell probability, if the ratio is more than 75%, then meet. So probability mobile phone case = probability ({phone, phone shell})/probability ({phone}) =
3/5 divided by 4/5 = 0.75.
- The above is only one of the combinations, in theory to calculate the probability of all permutations, so with the increase in data volume, the calculation of the exponential growth, and the Apriori algorithm is how to reduce the amount of computation
The principle of Apriori
Proposition: If an item set is non-frequent, the inclusion of the itemsets is also infrequent.
Todo
Pros and cons and scenarios
- Advantages: Easy Coding
- Cons: Big data volume time may be sung
- Trial: Numeric or nominal data
An unsupervised learning algorithm-apriori correlation analysis