Recommendation System note iv. Content-based recommender system

Source: Internet
Author: User
Tags idf
First, overview:

The content-based recommendation system (CBRSS) extracts the item's content characteristics and user preferences from the item and user's content descriptions, recommending the user's evaluation history and the semantic (content) similarity between item.

The high-level structure of the content recommendation system, as shown in the figure, consists of three parts (rounded rectangles in the picture): Content Analyzer, which extracts structured data (such as word vectors) from unstructured data (such as text); profile Learning (User information Learning), learning the user preference model from the user's evaluation history of item; Filtering Component (filter component), recommend item to users by matching user preferences and item properties. second, the keyword vector space model:

A document collection (corpus) is represented by a vector space model (vector spaces, or VSM), which first needs to construct a dictionary set T={t1,... tn} t=\{t_1,... t_n\}, which is done by segmenting the documents in the corpus, A collection of keywords that are used to stop words and other operations. With a dictionary, every document in the Corpus DJ D_j can be expressed as a keyword vector form: [w1j,..., WNJ] [w_{1j},..., W_{nj}], Wij w_{ij} corresponds to the weight of the word ti d_j in the document DJ T_i. The next two problems to solve are the choice of weight calculation method and the measurement of similarity degree.

TF-IDF (term frequency-inverse Document Frequency) is one of the most commonly used weighting mechanisms and is calculated as follows:
TF−IDF (TK,DJ) =TF (TK,DJ) ⋅IDF (TK,DJ) =fkjmaxzfzj⋅lognnk \begin{equation} tf-idf (T_k,d_j) = TF (T_k,d_j) \cdot IDF (t_k,d_ j) = \frac{f_{kj}}{\max_z F_{zj}}\cdot log\frac{n}{n_k} \end{equation}

Fkj F_{KJ} is a document DJ D_j Word

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.