In the author's case, although also engaged in technical research and development related work, but for the algorithm such a very "advanced" and mathematical relations and relatively close to the technology, to really understand it is really a very difficult action. But after I participated in some activities related to algorithms and referral systems, I found that this advanced learning has been widely used by friends who are engaged in software development. Especially in the electric business hot today, a variety of recommendations related to the site surging, the algorithm into the ordinary people home is the natural. This is why Infoq, in 2013, used algorithms and recommendation systems as one of the reasons for a key topic tied to mobile development and cloud computing.
As an editor, to provide a reasonable content for the reader, the prerequisite is to have a 7788 understanding of this technical field. Also for this reason, the author and some experts in the industry to do related communication, including the Resyschina forum sponsors Ku Wendong (Sina Weibo @clickstone), Baidu's former chairman of the Technical committee Lio Jo (Sina Weibo @riso-) and so on. This article is intended to be the author from these experts learned to share the content to everyone, I hope that the algorithm has some understanding of the students to help.
The reason why algorithms and recommendations cause concern
At the end of 2012, InfoQ has participated in an algorithm for the theme of Baidu Technology Salon (Sina Weibo @ Baidu Technology Salon), invited Baidu's research and development engineers Zhao as well as the percentage of COO and technical Vice President Zhang Shaofeng to participate in the venue to accommodate 160 people to almost 260 people, the response is very strong. And in the Resyschina of the recommendation System Congress, but also hot, from Hulu (Sina Weibo @hulu_beijing), Facebook, Baidu, Taobao, Tencent and other companies algorithm experts to share their experience, on the micro-blog has aroused heated discussions. As for the algorithm and the reason why the recommendation is concerned about, can be summed up in the following several points:
From the perspective of industry trends. The information explosion makes the information abundant, the traditional means of obtaining information can not solve the information acquisition demand of this kind of environment very well, recommend and personalized technology, as a method to solve the problem of explosion, has obtained the good effect. There are also more and more applications in the industry, which in turn has contributed to the enthusiasm for recommendation.
Look at the technology itself. The technical depth and complexity of the recommendations are often at the forefront of computer science. For example: The recommended system to deal with the size of the data is often up to PB (PETABETA,1PB=1024TB), and real-time requirements are required to the second level, which for the architecture and algorithms are very high challenges.
Recommendation is the interaction between the system and people, the recommendation first need to better understand people, understand users. Lio Jo, the chairman of the former Technical committee, Baidu, this is a more intelligent Internet development direction, the system of wisdom will be more and more advanced, which in itself is very attractive direction.
The algorithm is not abstruse, the case is proof
As I said at the beginning of this article, considering the combination of arithmetic and mathematics is very close, and in the daily life of the application is also less, in fact, many people are still somewhat afraid of the algorithm. But is the algorithm really that deep? In a broad sense, the algorithm is a method of solving a kind of problem which is summed up by people. Generally speaking, the computer algorithm, in order to describe the precise rigor, usually uses the formal language of mathematics to describe the problem and give the corresponding solution. The formal language of sampling is not easily understood by the general public. But the algorithm itself is only a way to solve the problem, the actual application we will use a lot of algorithms, some simple and complex.
In order for the general technical staff to have a better understanding of the algorithm, Lio Jo listed an electronic business site often has a function-"hot list." In fact, "Hot List" is a simple and effective algorithm, the use of sales directly sorted. At the same time, on this basis can also be further in-depth analysis, for practical problems, to make more targeted strategy upgrades. But directly according to the sales order, sometimes we will encounter a large number of hot goods long-term occupy the list, there is "expired" situation. For example, a large sale of products suitable for winter sales, no longer to buy in the summer, but because of historical data, so that the product still occupy the position of the list, it seems inappropriate. At this point we need to improve the algorithm targeted to add the "time" dimension of information.
We can either simply follow a discrete, one-size-fits-all approach, in the case of only 1 months of sales volume, you can also do a continuous function, so that the history of sales will decay over time, such as the use of the "Newton Cooling Theorem" (Note: The temperature is higher than the surrounding objects to the surrounding medium to pass the heat of the gradual cooling of the law followed. When the surface of the object and ambient temperature difference, unit time from the unit area of heat dissipation is directly proportional to the temperature difference, the proportional coefficient is called the heat transfer coefficient. In the designation and improvement of the algorithm, we can not only solve the practical problem, but also use the unique and specific solution, and we can abstract the problem, turn the actual problem into a variant of the known problem, use the existing knowledge and mature algorithm to resolve the new problem.
In fact, many algorithms, the basic principles are derived from our common sense, such as: you want to buy a mobile phone, but meet your basic requirements of the phone is too much, you start to tangle. So you focus on which phone your co-workers and friends are using, and ask them to comment on the phone so that they can help themselves make decisions because you think they are more likely to use the phone's habits and needs than others (say your elders or juniors), so their evaluations are more useful. This is the famous collaborative filtering algorithm, is the most classic recommendation domain algorithm. Its core idea is to estimate how much you like this thing, based on the number of people who are most similar to you, who have a preference for something.