Implement your own recommendation engine based on Lucene

Source: Internet
Author: User
Keywords Recommended engines implementations algorithms commodities peers

Using the algorithm based on data mining to realize recommendation engine is the most common method of E-commerce website, SNS community, recommended engine commonly used content-based recommendation algorithm and collaborative filtering algorithm (item-based, user-based in e-commerce recommendation System Entry v2.0, The introduction of e-commerce recommendation system has been elaborated. But from the practical application, for most small and medium-sized enterprises, it is very difficult to adopt the above algorithm in the electronic commerce system.

1, the common recommendation engine algorithm problem

1, relatively mature, complete, off-the-shelf open source solution is less

Roughly, there are several types of open source projects related to data mining and recommendation engines:

Data mining Related: Mainly including Weka, R-project, Knime, RapidMiner, Orange, etc.

Text mining Related: Mainly including OPENNLP, Lingpipe, Freeling, GATE, etc., can refer to Lingpipe ' s competition

Recommendation engine Related: Mainly includes Apache Mahout, duine framework, Singular Value decomposition (SVD), other packages can refer to open Source Collaborative Written in Java

Search engine Related: Lucene, SOLR, Sphinx, Hibernate search, etc.

2, the commonly used recommendation engine algorithm is relatively complex, entry threshold is lower

3, the common recommendation engine algorithm performance is low, is not suitable for massive data mining

These packages or algorithms, in addition to lucene/sor relatively mature, most of them are still in the academic research use, and can not be directly applied to the Internet large-scale data mining and recommended engine engine use.

2, the advantage of using Lucene to implement recommendation engine

For many small and medium-sized Web sites, because of the limited development capacity, if there is integration of search, recommend integrated solutions, such a solution is certainly popular. Using Lucene to implement the recommendation engine has the following advantages:

1), Lucene entry threshold is low, most sites in the site search are used Lucene

2, compared to the collaborative filtering algorithm, Lucene performance is high

3, Lucene to text Mining, similarity calculation and other related algorithms have a lot of ready-made solutions

In open source projects, the mahout or duine framework is a relatively complete solution for recommending engines, especially the Mahout core utilizes Lucene, so its architecture is well worth learning. Just mahout the current function is not very complete, directly with its implementation of E-commerce Web site recommendation engine is not very mature. It can be seen from the mahout implementation that using Lucene to implement the recommendation engine is a feasible scheme.

3, the core problem to be solved by using Lucene to implement recommendation engine

Lucene good at text mining better, Lucene in the contrib package provides the Morelikethis function, can be easier to achieve content-based recommendations, However, Lucene currently does not have a good solution for the results that involve user collaborative filtering behavior (called relevance Feedback). We need to add the user collaborative filtering behavior to the content similarity algorithm in Lucene, and convert the user collaborative filtering behavior result into the model supported by Lucene.

4, recommendation engine data source

E-commerce websites are typically associated with recommendation engines:

buyers of this product have also bought a customer who has browsed this product and seen more similar products like this product and also like the average rating of the product by the user

Therefore, the recommendation engine based on Lucene mainly deals with the following two kinds of data

1), Content similarity

For example: Product name, author/translator/manufacturer, product category, profile, comment, user label, System label

2, User synergy behavior similarity

For example: tag, buy goods, click Stream, Search, recommend, collection, scoring, write comments, questions and answers, page stay time, group, etc.

5, the implementation of Scheme 5.1, content similarity

Based on the Lucene morelikethis implementation.

5.1, dealing with user's cooperative behavior

1, the user each coordinated behavior uses Lucene to index, each behavior one record

2), the index record contains the following important information:

Commodity name, commodity ID, commodity category, product introduction, label and other important features, user-related behavior of other commodities, product thumbnail address, cooperative behavior type (purchase, click, collection, rating, etc.), boost value (the setboost of the coordinated behavior at the time of the weight value)

3), to score, collection, click and other cooperative behavior in commodity characteristics (tag, title, summary information) to characterize

4, different types of collaborative behavior (such as purchase, scoring, click) set different values Setboost

5), the search time uses the Lucene morelikethis algorithm, transforms the user collaboration to the content similarity degree

The above scheme is only based on Lucene to achieve the most simple recommendation engine implementation, the accuracy of the scheme and detailed plans to elaborate.

More detailed implementation, can refer to mahout algorithm implementation to optimize.

Source: http://www.yeeach.com

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.