KeywordsThese data mining big data being social networks
"Summary" Today's big data is forcing researchers to find new technologies for knowledge discovery and data mining
Top scientists from companies such as Google and Yahoo have come together in San Diego to participate in the 17th session of the American Computer Association (ACM) "Knowledge discovery and Data Mining" (KDD) Conference for academic discussion. They show the latest technologies to gain insight into information from today's deluge of data, and to understand the implications of these more widely diverse sources of information than ever before.
20 years ago, the only people who cared about "big data" (those with huge numbers and those who tried to deal with them) were the scientific community, said Usama Fayyad, executive director of the ACM Special Interest Group on Knowledge discovery and data mining, and Ossama Fiad, Yahoo's former chief data officer. Even so, the results of data mining are impressive. "We can solve some of the major scientific problems that have existed in this field for more than 30 years," Fiad said.
However, the explosive growth of the internet has changed everything. Whether they like it or not, companies find that they are running online and accumulating a lot of data about customers and their behavior. Fiad said the economic incentives for investment in the field are also growing as the data mining capacity is becoming clearer.
Netflix, for example, has offered 1 million of dollars to reward teams who can tap their information about their users and build recommendations that are more accurate than their existing systems. Such high-profile examples of data mining applications are simply superficial.
"Businesses are increasingly interested in the role they have gained through business processes," said Chid Apte, head of the IBM Center for Analysis and Research, Zide Epte, President of the General Assembly. He points out, especially in medical, social media and anything that happens online.
Today, internet giants are making money from the information they collect and the information they dig out of it. Retailers are able to get a complex model of the behavior of shopkeepers to help them run their stores better. Industry researchers can predict car traffic models based on congestion, weather and a certain time of year, and provide the best route.
However, the current data is not in the form of a database that we are familiar with. "Information is not presented to you in a clear form," Epte said. "It is being presented to you in the form of a web. "Usually in the form of charts," he explains--such as those used by social media. These charts often record not only the complex connections between nodes but also other new types of information, such as videos, images and comments that people submit on social networks.
Social networking may have opened the trend for analysis of such charts, Epte said, but there are other sources of network data-for example, from complex engineering systems such as power networks, water distribution systems and traffic management systems. The distributed sensor networks in these systems generate connections across datasets that are as important as friendships between individuals in a social network. Understanding these linkages is key to optimizing systems and making them sustainable.
People have been dealing with graphs for more than hundreds of years, but now charts based on social networks or sensor networks have an unprecedented scale, Epte said. "These are huge charts," he said. "You're talking about millions of nodes and tens of millions of of connections.
To deal with charts of that size and range, and to apply modern analysis tools to them, you need better algorithms and other ideas. One of the goals of the meeting, Epte said, is for companies to notice cutting-edge technology from academic and industrial research laboratories, so companies can put them into use more quickly. At the same time, the organizers of the conference hope that the academic community will be aware of these most urgent business challenges.
Fiad says the strong business interest in data has changed the field of data mining. He says scientists mainly deal with data stored in neat, structured forms. But most businesses produce data that is chaotic.
"When scientists are doing a good job of avoiding it, companies are being forced to face it," Fiad said. "It drives companies to develop technologies that have never been tried before. ”
Of course, Fiad says the challenges remain, but "people can put forward many more predictive models and, more importantly, evaluate them (determine their working status) ... This brings data analysis to a level that really transcends the understanding of the human brain. ”
This article for the United States Marvell review authorized articles, without written permission, is prohibited reprint use.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.