Big data is the physical world in the network world map, is a human unprecedented network portrait movement. The network world and the physical world are not isolated, the network world is the reflection of the physical world level. Data is a seamless connection between the network world and the physical world of DNA. The discovery of data DNA and recombinant data DNA is a continuous process of human cognition, exploration and practice of big data.
Figure 1 Big Data development path
Chen Wan the network image into behavioral portraits, health portraits, corporate credit portraits, personal credit portraits, static product portraits, rotating equipment portraits, social portraits and economic portraits , and other eight categories, and through practical cases to explain.
In the future, every journey of life is driven by data.
Figure 2 Data-driven life
In the future, the entire life cycle of the device will also be data driven.
Figure 3 Data-driven automotive full life cycle (SEA-view consulting)
Dr. Liu, who first started with the percentage recommendation engine, explored the four engines in depth.
Scene Engine: The core of personalization, determine the user in which shopping links, what kind of shopping goals;
Rule engine: The core of the business, combining user, scene, algorithm output data and business KPI, decide what content to recommend to the user;
Algorithm Engine: Calculates the similarity between users, the similarity between products, the user's rating of the product, user grouping, popular ranking ...
Presentation Engine: Presents the recommended content to the user in the best possible way.
The core of the recommendation engine is the data of the shopping process, which is premised on the data of the user. How to make the user data? is the user portrait.
Dr. Liu has illustrated vividly what is a user portrait in several examples of life portraits.
The object, mode, organization, standard and verification of the user portrait are several characteristics.
He argues that user-side writing may describe the word "user portrait" more accurately, because we describe a person by a limited amount of information, rather than by a holographic camera model.
From the technical point of view, people in cyberspace is a bit stream, people know the way of people change greatly, from the physical space "face reading" to the network space bitstream analysis, more importantly, the Church machine in accordance with the rules of human hand to his automatic recognition from these bitstream. The ability to identify financial fraudsters, terrorists, etc. from tens of millions of users.
How is this process achieved? This requires a pixel in a similar imaging technique to characterize the human character, which is the label in the portrait.
Big Data User portrait is actually a real user to do a mathematical model, in the whole mathematical model, its core is, how to describe the business knowledge system, and this business knowledge system is ontology, ontology is very complex, we found a particularly simple implementation, is the label. After building the model, in the practice of the business to test, and constantly improve, and constantly enrich the model, to achieve the use of bit flow to more and more accurate understanding of people. User portrait is not a math game, not a technical problem, actually a business problem. Because the core is how you go about understanding users and understanding your users. It is the best combination of technology and business, and is also a best practice for reality and data.
Li Haifeng shared the practice and examples of percentage points in user portraits.
He first shared the example of portraits in his own case. Based on him this person can know his city is in Beijing, male, company in percentage, like the category is Men's shoes, sneakers, like the brand has Nike, Adidas and so on. Each label has a weight value. As you can see, Nike has a higher weight than Adidas.
This picture is a feature portrait of a percentage of the founder/Chairman and CEO of Shomen-Guchi by cloud imagery.
Percentage of the Portrait labeling system includes: population attributes, Internet features, marketing features, content preferences, interest preferences.
Take the mobile product attribute as an example, including brand, category, model, time to market, price, color, network, operating system, resolution, screen size and so on.
The label management system has the following characteristics.
There are many ways to identify the user, which is like the social life of the identity card number, just replaced by the network space mobile phone number, cookies, IMEI, Email, Weibo, account, etc., in the process of processing, the information is encrypted, machine know but people do not know.
Percentile user portrait logical architecture as shown, through the e-commerce, community, mobile applications, Weibo, and other categories of data sources to collect, and then to the user portrait, finally in personalized recommendations, user insight, precision marketing and other aspects of application. Percentage of data sources and large, more than 1500 customer service, covering more than 40 industries. For example, a netizen, who visits an e-commerce a while visiting an e-commerce B, is not the same as the knowledge system of the two e-commerce providers themselves. For example, this user visits a pair of shoes, his category on e-commerce a may be shoes-Men's shoes-sneakers, on site B may be sports-outdoor-Men's shoes, category description may be different. So percentage points have created such a system, called the commodity portrait system. Through this system, all the tags have a label plan, and then you can build the user's entire Web user portrait label. The user portrait is just a starting point, not an end. Based on this, we can also create a series of services, such as precision marketing, personalized recommendations and so on.
is the technical architecture diagram of the user portrait. We can see a total of five layers: the first is the data source; The second layer is a collection service, and a bunch of data collection services, including our probe, can take a real-time acquisition of the user's behavior; the third layer is data preprocessing, mainly structured; the fourth layer is a product portrait, This piece is our user portrait service. We can see the user portrait is divided into two chunks, real-time processing more emphasis on the prediction of user portrait needs, offline processing more emphasis on the user's long-term preferences; the second is a unified data interface, there is a cluster, the above can be connected to a variety of applications.
is an example of a user label output process.
User behavior on the Internet is mainly divided into e-commerce, social and media categories. Each behavior is very different, e-commerce behavior includes browsing, search, add shopping cart, collection, payment, etc., while social classes are like, forwarding, commenting and so on.
Then the next step is to extract the page labels, before doing this, you need to train the model, first prepare the training data, through labeling and rule generation, and then for the sequence set to do a serialization process. First, a weak model is obtained, finally a strong model is obtained, and the parameters are preserved. This time we will add a decision, if the effect is not very good, we will proceed to the next round of optimization. When this model is set up, we can make predictions. Our forecasts are divided into four large chunks, including input, input preprocessing, prediction, and output. That is to say the user this tag already has, this label to user's credibility degree is 1 or 0? This is the time to model user behavior. User behavior modeling behind the main thought can be considered to have two large chunks, the higher the cost of the higher the behavior weight, the next order is higher than the view, the more recent behavior weight, such as I saw a mobile phone today, it must be more than I saw a week ago the computer weight to be higher. We can divide according to the scene, the first is to generate the demand, then the decision, then the end, percentage points based on business considerations, the implementation of tag weight accumulation mechanism.
This is a case of an airline of our clients, the project aims to excavate high-value passengers, hoping to optimize the capacity resources by analyzing passenger travel preferences. The final percentage helped him build 5 labels, 75 small labels, tens of thousands of small labels, and here are some of the effects.
It's just a matter of what percentage points have been done, but the percentage is still far from enough. The next four aspects may be in-depth thinking and practice: The first is a different scenario, that is, the user at home and in the office environment to represent the preferences are not the same, the second is the user psychology characteristics, such as when a user to see a woman's clothes, she this time is bored to stroll or purposeful stroll, reflected in the label weight is not the same, the third is to let users actively feedback offensive point, we emphasize a lot of, is generally in the emphasis on what users like, but users do not like what, we do not enough, we should let users take the initiative to tell us what he does not like, for example, he does not like to eat onions, So we can predict the time will be much better; four is the user's interest transfer fast capture, at the beginning we use a half-life, and by frequency breakdown, we are able to divide by people? For example, according to the visit to score? For example, for the category of mobile phones This label, for mobile phone enthusiasts, it may be a year after he will still be interested in mobile phones, but for like me, only want to buy the time to see, maybe I do not see two days, it means that this interest has been attenuated to zero.
Article Source: http://www.199it.com/archives/337393.html
User Insight in the era of Big data: User portrait Creation (PPT version)