President Obama is really thankful for the big Data mining technology. Because of the control of the big data, his campaign team was confident that it would win the election long before the results of last November's election were announced. What is this all about?
As early as the Obama campaign began, a technical team of data scientists had been set up. By analyzing historical data and various types of input factors, they set up a precise user preference model for each voter during the presidential campaign by using data-mining techniques, thus drawing the probability of voter turnout on the day of the election and which side the result would be biased towards. At the same time, they are constantly updating their models so that they can always know the changes in the electorate's intentions. The models, based on an analysis of voter preferences and behavioural data, come from thousands of data sources, including previous polling records, various feedback on campaign issues, thousands of phone and online interviews, and the impact of voter change on the outcome.
Not only do volunteers keep a weekly record of updating voters ' personal preferences, but they also assess various factors that may change their views, such as speech content, campaign themes, and certain key issues.
The team also uses statistical models to guide volunteers to effectively persuade a wavering voter. For example, a volunteer from California could be more effective in wooing voters with a specific problem than any other state volunteer.
The story sounds strange, but the fact is that the application of large data mining has become the norm around us, and its core is data.
Rather, it's big data that involves all the digital records around us, such as social networking, tools, videos we watch, deals we make, web searches we do, the use of apps (mobile apps), college online courses, and so on.
We can use the oil industry to make an analogy with the oil majors. We can compare these data to crude oil, and to become useful, it requires exploration, extraction and refinement. Unlike crude oil, what you need is not a machine for extracting processed oil, but data mining, a multidisciplinary technique that combines statistics, machine learning, and data management techniques. Similarly, the machine that deals with crude oil is no longer operated by engineers, but by data scientists. Data scientists are a new industry that comes from a number of fields, including computer science and artificial intelligence researchers, statisticians, data-storage specialists and social scientists, among others.
Knowledge learned from data can be used by politicians, scientists, educators, and business managers to make decisions.
Today, data mining has become a part of our daily lives. We use Google, the search button behind a powerful data mining engine. By tapping the user's data, Google can predict who you are, what you want to do with the information, and how to display ads that will attract your attention.
When we use credit cards to buy goods, a powerful data mining engine runs behind it to determine if your credit card is being embezzled. The data model behind this is based on the previous billions of transactions of the consumer.
When we were crossing the Luohu, there was a model built by the data mining algorithm behind the machine that collected our fingerprints, and it would quickly confirm whether the person standing in front of the machine was you.
We are in the midst of a new wave of large data mining, which is still at an early stage. Even so, the academic and industrial sectors of Hong Kong have long been at the forefront of this area.
In Hong Kong universities, scholars have studied various aspects of data mining: From designing accurate algorithms, such as Web pages, video, and voice data, to studying how to protect user privacy in the process of data mining. The newly established Huawei Noah's Ark Laboratory is also carrying out several research projects aimed at large data mining for the future.
Author: Yang Qianghua, director of the Noah's Ark Laboratory
(Responsible editor: The good of the Legacy)