Opening
We recommend several articles on the recommendation, and personal feeling is very meaningful for getting started, written by IBM's Project division, for example:
Explore the secrets of the recommended engine, part 1th: A preliminary discussion of the recommended engine
Explore the secrets of the recommended engine, part 2nd: In-depth recommendation engine-related algorithms-collaborative filtering
Explore the secrets of the recommended engine, part 3rd: In-depth recommendation engine-related algorithms-clustering
Two books are recommended, such as the following:
Xiangliang: Recommendation System Practice
Lanci: "Recommendation System"
What is a referral system?
The recommendation is to push the goods you may like into front of you. Building a referral system is the process of building a product that pushes you to the front.
It is often said that the recommendation is the algorithm, from a certain point of view, this is not correct. But before contacting the recommendation system, we still do not study the algorithm, when speaking of algorithms, it may be considered very advanced, but also very bluffing, immediately produce a sense of worship, it becomes magical up.
For a project that we don't have much of a recommendation theory to support, go to the recommendation or get started first. We do not lack of practice, first through the practice of work to understand a recommendation, and then through reading books, learning algorithms to deepen understanding and comprehension, and then through the different recommendations, as well as the effectiveness of the objective assessment, improve the level and situation.
The first step, when we really complete a full contact with the recommendation system, reached an entry level, to be able to independently build a TENS PV site recommendation system, may be the main point of view is:
(1) Recommendation is an overall calculation process, in the coding, about the part of the algorithm accounted for the workload may be less than 1%;
(2) The choice of each recommendation option is a general calculation process.
Building a multi-PV recommendation system is relatively easy, a day's log is only hundreds of M, the calculation process of data, a single machine memory can be stored, when PV reached hundreds of millions of billions of, it is necessary to carry out a slightly more complex distributed computing;
The recommended calculation method is very many, how to choose, the effect is unpredictable, only through the horizontal and vertical more effect analysis, only meaningful.
With the deepening of understanding, the elevation of the situation, the knowledge of a lot of other understanding, cognition will be in constant adjustment ...
Recommended calculation process
The calculated data source
Web Access logs, purchases, collections, these are actually user's behavior data;
User, this is the basic data of the analysis;
Commodity, which is the basic data of analysis;
Storage format for plan logs
How to mark the same user who is not logged in, how to find out who is not logged in and who is logged in.
This is very important, which is the basis for future log analysis calculations.
The scale is as follows:
27.189.237.91--[27/jun/2014:15:00:01 +0800] "GET a URL http/1.1" 200 75 "Previous URL" "95907011.390482691.1402709325.140385 1977.1403852394.7 "" 95907011.8a8a8aeb385a8c6b013860df24501310 "[---] [image/webp,*/*;q=0.8]" Mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/31.0.1650.63 safari/537.36 "
The above web logs url,95907011.390482691.1402709325.1403851977.1403852394.7 and 95907011.8a8a8aeb385a8c6b013860df24501310, Use the JS code of Google Analytics to record the ID of the user who is not logged in and the ID of the logged-on user, respectively.
For the purpose of the JS code of Google Analytics, here it is, in fact, completely able to build a third-party traffic analysis system based on it, such as the following:
(1) The site that requires statistical traffic to check the code, to record cookies and so on, and trigger to the server side of the request (can be to request a non-existent picture)
(2) When the server side received the request, the head inside the site access to the information related to the traffic records, the server side of the program is a simple servlet can.
The first step in the calculation process
According to user behavior data, analyze the relationship between user and commodity, user <--> Browsing, user <--> purchase, user <--> collection, etc.
The second step in the calculation process
According to the data calculated in the first step, the recommended results are often used in the analysis, for example, according to the browsing data, calculated "Look and see", according to the purchase data, calculated "Buy and buy" and so on.
Algorithm (or rule) of the computational process
Algorithms, which are generalized, mathematical formulas; rules are small, company-defined, complex business rules for their own scenarios, and in the second step of the calculation process, most of them use self-defined business rules when calculating the results of the last recommendation.
To recommend "look and see" for example, according to a commodity, how to recommend other products:
Can be based on the basic meaning of this recommendation type, a commodity---> saw this product of very many people, and saw---> a lot of goods, this is the recommended results, but this recommendation has very very much, how to recommend it?
Can recommend the number of times finally, recommend the latest, recommended two items of view crowd most similar ...
Recommended interface for Results
It's nothing, it's all universal.
The core of the recommendation system
The evaluation system based on business and recommendation effect;
Technology-based distributed computing for large data volumes
Code description
predecessor project: This related project is more than, the site, goods, orders, all have relevance.
Latest source code:git clone [email protected]:p umadong/cl-recommend.git.
Recommended development
Big Data volume computing, real-time data flow computing, user behavior analysis, user clustering refinement, personalized recommendations.
Perhaps a higher level of search recommendations, or need to search the support of the recommendation theory, different from the implementation of the level of things, this may exist in different levels of the situation, the cognition of the only know ...
Log analytics extension and traffic statistics
For the analysis of the log, can count the traffic of the site, but to filter out the URL of static resources such as js/css/img, just keep the real and effective access to ask.
In a page of the interview process, the browser will launch a very many requests to the server, the Html/css/img/js and so on to download, parse into a beautiful page, presented to the visitors, in the process in fact in Nginx and other webserver, record very many lines of log.
About traffic statistics, there are very many ways to use the interpolation code, such as inserting code, the industry's code standard is Google's GA, the advantage of interpolation is to be able to record a lot of other information (beyond the log), can define very many events, collect a lot of other information.
At present, for special reasons, Google can not directly visit the country, but the GA code statistics is no problem, the address is: Http://www.google-analytics.com/ga.js.
Compared to log analysis and interpolation two ways, log analysis is to have access to log, when the page may not show the interview is closed; in such a way, only to run to the inserted JS code, the meteor will be recorded; that is, the previous emphasis, the latter one emphasizes effective access.
Log analysis of such traffic analysis, the need to filter out the IP address of the crawler, and the insertion code is not required, because the crawler will only crawl the page content, and will not run Js,js is actually the browser's JS engine to help us do.
In addition, for third-party traffic analysis, it must be a plug-in, it is not possible to use log analysis.
Recommendation system--uncover the magical veil of recommendation