From the ignorance of the recommendation to do a few recommendation system, and then to complete a recommendation system Eco-framework design, think it is time to precipitate down some knowledge, and then continue to find delicious. The "recommend system casually talk" series will start with the simplest recommendation system, and complicate system design with increasing business demands and understanding, occasionally sharing some of the other people's design ideas in the middle.
Recommended, everyone is most familiar with the Amazon "saw this product user also saw", "purchased the product also purchased", the example is simple enough to understand it. In fact, the recommendation can be considered as a search supplement, search is a user's active behavior, the goal is very clear, just do not know the answer, need search engine told me. The recommendation is to tell users that you might like something when they don't know their specific needs. OK, the students who have experience in building search engines or recommendation engines may find that their connections are actually very close, and later on these complex areas.
Back to the Amazon example, in fact we can simply build a similar Amazon recommendation engine, here we say is "recommendation system casually talk" series of v1.0 system. Only need to count each commodity and other goods at the same time be purchased or browse the number of times, according to more than a few sort, you can get a common purchase of a product list (mall aunt will Ah!). )。 Yes, the most outrageous solution for our little white is to do it on the surface of the scholar's family.
Item_co_occurrence ={}item_co_occurrence["A"] ={}item_co_occurrence["A"]["B"] = 2item_co_occurrence["A"]["C"] = 1item_co_occurrence["A"]["D"] = 3item_co_occurrence["B"] ={}item_co_occurrence["B"]["A"] = 2item_co_occurrence["B"]["D"] = 1item_co_occurrence["C"] ={}item_co_occurrence["C"]["A"] = 1
If you do, then the problem comes ... Aunt Air to the manager said, I calculate the result is, buy XXX users also buy the most products are plastic bags! Mother Egg, Pit Daddy this is @&#¥% ... If simple statistics come together, it's easy to get the top of your candidate recommendations by popular items or items that are often not useful.
A little Smart Reunion said, I have to add a pre-treatment, you say the hot items or common and non-useful items, such as the plastic bag to delete things. It seems correct, but how do you want to delete it?
At this time has been the search engine students stood up and said, we do text processing when the idea of keyword extraction seems to be reusable? "Common and useless items" is actually "stop word", "hot" but not contribute to the whole thing also can be calculated by TF-IDF. Yes, Baidu a bit TF-IDF, you will find the count is just one of the steps. So in fact, the calculation of 22 of items related information can be used more scientific methods, such as mutual information, class TF-IDF, cosine distance, of course, also need to consider whether their own data has a special distribution.
After pre-treatment and several adjustment calculation formula, Aunt confidence full of tools to the hardship code farmers, said: I ran out of good results, take to use it! But the actual use of yards to find that the naked eye to see the effect is also possible, but I a business, or a transaction may have more than one product, I am so how to run a single machine so many things ah!! Aunt Slam table (╯‵-′) ╯︵┻━┻ said: "Mom eggs, do a thing to effect and efficiency!! Then you can run a few broken machines. Aunt a introduction Road out the true meaning, let yards nong think of Hadoop artifact!! To transform aunt's program with map-reduce process, map process enumeration, sort alphabetically, reduce process merge count.
Well, the first version of the recommendation system is complete! Although the effect and performance are very file, but thank Aunt Guidance! We look forward to the next aunt with a better way to solve the problem.
Referral System casual Talk (i)