Modeling Method:
1: Data Summary:
Eg: PageRank
Data is used to reflect the importance of a Web page, that is, the probability that a random user is on this page.
2: Clustering
3. Feature Extraction
1: frequent itemset:
Eg: most users buy A and B at the same time. When a user buys a, B is recommended to him.
2: similarity item similar item: Collaborative Filtering
Search for similar users or products for recommendation
The theory of bonefrani: positive correlation between pancreatic production, that is, when a specific feature is searched for in certain data, even if the data is completely random, the feature will appear and increase as the data grows.
TF. IDF: TF * IDF
TF: Word Term Frequency
TF (ij) = f (ij)/maxk F (kJ)
That is, in document J, the frequency of word item I is the frequency of word item I divided by the frequency of word item k with the highest frequency (normalization without considering deprecated words)
IDF: Frequency of inverse document
Assume that the number of documents is N. If word item I appears in N (I) documents, the IDF of word item I is:
IDF (I) = Log (2) n/n (I)
Not complete...