Recommended system Algorithm Commentary

Source: Internet
Author: User

Recommended instance extension "Go"

7.1. Reading recommendation

Take a look at the text first (excerpt from 36KR):

"Beijing is very technology is also very optimistic about the application of reading recommendations, they spent a lot of energy (60 people a year team), today launched the iphone version of" Cool Cloud reading. "

Why invest so many people to do this reading application? CEO Li told me that more than half of the people in this team are doing background related things, including semantic analysis, machine learning and other algorithms. Their goal is to make the internet "semantic" after the people's interest is clear, and finally to everyone interested in the content to recommend to the relevant people. On the iphone, the general approach to cool cloud is similar to the Zite IPad version, where users behave like, dislike, and click on the appropriate media sources or related tags to tell the cool cloud that you want to see more of them in the future.

The goal is for most of the reading recommendation applications, but the cool cloud approach seems to be more perverted. In addition to capturing more than 100,000 articles per day from the Internet, they also indexed video content broadcast by 200 television stations nationwide so that users could search for videos and recommend the same content. The general approach is to record these programs first, then turn the sound into text, and finally create summaries and indexes. “

The general recommendation System Application algorithm is the above-mentioned what kind of collaborative filtering is so complex? The following is quoted from my January 21 in the text on the microblog: 1, most of the recommended reading applications will generally give the article tagged according to the content: algorithm, iphone (click on the equivalent of adding weight to this label), and invited to comment on the article: like, or do not like.    Each click is recommended system recorded, and eventually gradually formed the user tag cloud (at the same time, can also be based on the same or similar tag tag to find similar users, thus based on user recommendations), and then the system every retrieval of a new article, extract the keywords of the article, matching the user's label orientation, push.     2, the current mobile phone on the news to do a classification, such as science and technology, education, but generally will not take such as the page as a rating, so also can not record the user's behavior characteristics, there will be no new article after the recommended reading service, so created a number of mobile phone recommended reading, such as @ Cool cloud reading, refers to read. 3, but the general user's habit is to read a piece of news will be finished, choose the day to see the choice of day to see.    For example, how many users would like to sign up for an account in order to evaluate an article? How to try to make the user pay extra cost to use this kind of reader, change user habits, personally think, is the key. Then I said to the above sentence: first to record these video programs, and then turn the voice to the text a little doubt. We already know that if it is music, like watercress FM may be the following approach:

    1. You like some songs, and I also like some songs, if you I like a lot of songs are repeated similar, then the system will define you as a friend, that is, similar users, based on the user's collaborative filtering recommendation: Friends like, you may also like;
    2. Another is the recommendation for the song, you like a song A, and the other song B is similar to song a (if All about love, sentimental category), so the system guess you may also like B, and B recommended to you. This is a collaborative filtering recommendation based on items (items).

According to the repeated similar decision to listen to the song as a friend to recommend based on the user's collaborative filtering, some songs are almost similar to the project based on collaborative filtering recommendations, but the problem comes out, repeat the same song with a singer, but those similar music songs and how to define the decision? To analyze the spectrum of songs through the system? Distinguish the tempo of each song, audio? While this may seem effective, it is impractical to implement.

I think it's a tag tag for those music, and that's what the video does, so it's easy to find the index later. Full video of the current feel is still not reliable), such as playing "Love" "sentimental" type of tag, and then the same tag can be judged as similar songs. But the key is how to fight? Speech recognition? 7.2, label tag how to playEarly can human flesh, reptiles, buy databases, such as traffic up, you can consider UGC. The so-called UGC, user-generated content. But users are generally less likely to label the music themselves, too cumbersome (such as the recent Sina Weibo content of a more than one "tagging" hint, but how many users would like to pay attention to it?) , of course, some systems will automatically generate some tags for you tag (of course, you can also add some tags yourself), such as Sina Blog: How to do it? My idea is that
    1. Should be the system in the back scan your article again, and then extract some keywords as tag, for you to choose. What keywords do you take? Of course, the high-frequency word. Scan the whole article to count how often each word appears.
    2. Then take its top K, as the above "algorithm" appeared in the article 4 times, "blog" appeared 3 times, so the system for you automatically match these tags.
    3. What data structures or methods are used to count the frequency of these keywords? General application hash+ Heap (11, thoroughly parse hash table algorithm from beginning to end), or trie tree (from trie tree to suffix tree) can be. But when the trie tree faces the Chinese character, it is more troublesome. So the hash+ heap is the ideal choice.
Similarly, for the video, it should be similar: 1, through the system or machine to read video content, the video into text, and then extract the high frequency of the keywords (how to extract keywords, which involves a key issue: participle. This blog later elaborated), the extracted these keywords as the label of this video tag;2, and then for these tags to create an index summary (what kind of index?) Inverted index.    As for what is inverted index, refer to the Art of Programming 24th Chapter: The 23rd, four chapters: young matrix search, inverted index keyword hash non-repetition coding practice), and ultimately facilitate the future user or system search (this section is discussed with the Friends of the programming art in summary). Detailed follow-up elaboration. 8. Reference documents
    1. I posted on January 7, January 21 Weibo (hanging in the left sidebar of this blog);
    2. Explore the secrets of the recommended engine interior, Zhao Chen Ting, Machun;
    3. Collective intelligence programming, TOBYSEGANRA.
    4. Overview of collaborative filtering of recommendation systems.
    5. http://www.cnblogs.com/leoo2sk/.
    6. Mitchell, Tom M. Machine learning. Mcgraw-hill, 1997 (a mountain in the field of machine learning).
    7. http://zh.wikipedia.org/wiki/%E5%86%B3%E7%AD%96%E6%A0%91.
    8. Http://www.36kr.com/p/75415.html.
    9. Intelligent Web algorithm, chapter III recommendation system (to achieve the user and project similarity calculation, it is worth a look)

Recommended system Algorithm Commentary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.