Live Big Data actual combat--the association mining of crowd label and label

Source: Internet
Author: User
Keywords tags tags pass tag pass algorithm tag pass algorithm we tag pass algorithm we big data

Introduction-

At the beginning of 2013, the 85th annual Academy Awards ceremony was held in Hollywood, USA. Before the ceremony, Microsoft's New York Institute economist David Rothschild the award winners of the Oscars through large data analysis. The results showed that all other awards were hit except for the best Director award. This is not the first time that David has accurately predicted that, in the 2012 presidential election, he had accurately predicted the results of 50 of the 51 constituencies with an accuracy of more than 98%.

The arrival of the "Big Data" era has played a crucial role in the use of data for various industries in the pre-sentencing, analysis and optimization. And how to make large data to play its fundamental value, really for us to use, is the world's data algorithm scientists struggle for the technical problems.

Find out the relationship between data

1980, Toffler in the "third Wave" has predicted: "If IBM's mainframe opened the information revolution of the big screen, then ' Big Data ' is the third wave of the CLS movement."

At the moment when the data is 10 trillion bytes ZB, it is a challenge for all data practitioners to get and filter valuable relational information from massive data. And how to establish the relationship between the data, but also how to make large data "live" the only way.

In daily life, we often find such a situation, in such as Google, Baidu search engines such as searching for a number of keywords, such as "mascara", "not dizzy dye", "thick", "slender" and other keywords, in the search results page often see the ads to promote mascara. It seems that these search engines know exactly what we want to do and what we are interested in.

It's not magic, it's just an algorithm. After data collection, modeling and analysis, the author associates the data such as user, search word and search word related advertisement. So when we do a search, it's not hard to see a matching ad.

Recently, the United States "PRISM" program has aroused worldwide attention, such as personal privacy and other topics are constantly mentioned. In a series of controversies, as the IT giants have been dragged down by Snowden, a pioneer concept of "big data" has been pulled into the spotlight again.

Some people even "well-meaning" to the follow-up to study in the United States to make recommendations, in the telephone with family or friends more mention such as "How to make bombs with pressure cooker", "How to make TNT explosives" and other sensitive words, used to increase the workload of U.S. intelligence analysis agencies. However, does this approach really work?

In fact, there is no sense in the absence of regular and structured data, and the data analysts in the United States are clearly already aware of this. It's not enough to get data from phone recordings, web footprints, etc., which only completes the "Big Data". And the real value of data generation, only these fragmented data analysis compared to people's real identity, character, consumption habits, demand and other personal information restored, the data can be "live".

According to U.S. data analysts, only through the time of a telephone call, the length of the call, the location of the number of data, you can determine whether the call is the possibility of terrorist attacks. This is the result of the establishment of a huge number of user call data and the link between the terrorist attacks, the analysis concluded.

Reliable Data Model-

David "The model we created is the one that can predict the future, not just what happened in the past," Rothschild says. Science is the same, but proving which data is most useful is very different. ”

And the traditional way of data statistics and collection by the manual, the network era of data more from the machine, the use of machines for automated data capture and storage, and batch import database for follow-up analysis and use.

For example, a camera that records traffic on a street, through 24 hours of real-time monitoring will be the road situation, collation statistics and for follow-up analysis. And through the camera record road situation, obviously than the traditional traffic police sentry duty statistics more effective, but at the same time for the analysis of the data requirements are higher.

In the age of large data, as the scale of data grows exponentially, the protagonists of data processing and analysis have also been transformed into programmers and algorithmic engineers by former statisticians and analysts. Programmers and algorithmic engineers have created numerous and complex mathematical models, and are constantly tuning and tuning to find subtle links between the data and apply them to each channel.

Let us leave aside the controversy over whether the PRISM project itself should exist, purely from a technical point of view, the PRISM program is inseparable from the development of the big data age.

Admittedly, each individual's behavior may be different, but all are regular. Through the acquisition and analysis of massive data, we can obtain the effective information of people's behavior habits, and when the amount of quantity is accumulated to a large enough scale, the scientists find out the relation between the data by modeling, and then speculate the individual behavior habits of each person and provide analysis. and "Prism" project is through the collection of massive data, modeling and analysis, to find a single individual and such as "terrorist attacks", "hidden crime" and other events between the link, and to take the appropriate way to deal with the plan.

When the LAPD collected data from the PRISM program, after analyzing the crime records of several decades, predicting the pattern and frequency of criminal behavior, and then arranging the police force, advertisers can understand customers by analyzing the purchase behavior of massive customers, and carry out targeted marketing to enhance business, And easy media crowd labeling algorithm, is to help advertisers understand users and improve the effectiveness of the data analysis model, is "live" large data reality practitioners.

"Live" large data practitioner--

American journalist Wallace once said: "If it looks like a duck, swims like a duck, barks like a duck, then it may be a duck." ”

And easy media crowd labeling algorithm, is to help advertisers find "ducks."

In the marketing world, the case of beer and diapers has been a familiar one. Ordinary people may not be able to understand why diapers and beer, the two different kinds of goods together, actually make a slight increase in both. It turns out that mothers often instruct their husbands to buy diapers for their children on their way home from work, while the husband buys diapers and buys his own beer. The discovery has brought a lot of profits to businesses, and in the Internet's vast but disorganized data, the link between "beer and diapers" is the core value of the crowd tagging algorithm.

The crowd labeling algorithm first based on the behavior of the Internet people to partition, and then analyze the common attributes between different groups of people, establish the relationship between the people and apply to the follow-up advertising.

This is like a supermarket store found: Old Zhang bought 2 bottles of beer, 4 bags of peanuts. But in the supermarket, understand one after another old Zhang's drinking habits are meaningless. What the store needs to know is how many old Zhang? How many old Li with different drinking habits? The old Zhang, who drank beer with peanuts, separated from the old Li who drank dry white wine with cashew nuts, divided into different customer groups to make sense. For example, as long as you know, in the drinking of 100 customers, there are 30 beer with peanuts old Zhang, 10 drink dry white wine with cashew nuts Old Lee, and 20 old Wang is to drink rice wine with dried tofu, this is enough. At this point can be known, beer and peanuts have a relationship between dry white wine and cashew, rice wine and dried tofu has a relationship, then these products can consider a promotion, or placed in a similar position for display.

Easy Media crowd labeling algorithm, is the internet on the "Old Zhang", "Lao Li", "Lao Wang" section, and find their real concerns, the label of the processing, analysis of the relationship between the labels and the Association of the algorithm. For example, we found that the visit to the car site as a daily issue of the old Zhang, also often search for "LED TV", thus marking the "car" and "LED TV" crowd label, when found that thousands of "old Zhang" both have "car" and "LED TV" label, We found that these two labels seem to have some kind of inevitable connection, can be to these "Lao Zhang", put the advertisement of LED TV. And this in the past, only by the experience of the era, the car and LED TV, like beer and diapers, is not the same as two things, it is not surprising that the link between them.

The vast amount of data on the Internet not only can be refined to classify and develop into a practical system tool, in the actual implementation, the data is ubiquitous and can be extended to use. and "crowd tagging algorithm" is to give the data vitality, so that large data "live" a typical embodiment. Crowd labeling algorithm is the collection of online crowd behavior data, extracts and marks the product-oriented interest tag, through the crowd label clustering, and the crowd behavior and interest trend analysis of the flow of the algorithm.

Through continuous, multi-channel, massive data collection and management, easy media from the line to the line, from the online to mobile, the audience for the nano-scale differential, help advertisers most likely to find people, managers, support a strong audience, provide including 26 categories of population attributes, 20 categories of 159 small categories of behavioral interest subdivision, 3 categories of products industry, thousands of kinds of industry product intent subdivision, a total of more than 13,000, 3-tier structure of the audience tag.

In the big Data age, the biggest innovation is that people can use algorithmic scientists and data analysts to constantly adjust the optimized data model to understand the brain can not handle the relationship between the data, our four weeks is full of data, and our lives are constantly collecting data to guide and optimize the computer.

Through the analysis of the relative rationality of large data, combined with the brain perceptual way of thinking, in the face of decision-making and judgment is crossroads, we will come to more cost-effective conclusions, more efficient solutions. All this is the wealth and value that big data brings to us. And easy media, in the "live" large data used in the Internet advertising on this matter, will spare no effort to continue to advance.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.