The phenomenon of artificially generating large data has led today's enterprises to generate thousands of megabytes and thousands of megabytes of structured and unstructured data. The big problem with big data remains the same: is this going to be a "oil rush" for a handful of people to win, most of them fail, or will we all have a lot to offer?
The human-generated content includes all the files and emails we create every day, including presentations, Word processing documents, spreadsheets, audio files, and other documents that we are building all the time. For most organizations, these files occupy a large portion of the digital storage space. You have to keep them for a long time, and they have a ton of related metadata.
The amount of human-generated content is extremely large, and its metadata is even more. Metadata is information about the file, including the creator of the file, the time and type of creation, and the folder where the file is stored, the person who is reading the file, and the person with the appropriate access rights. The content and the metadata together constitute the whole artificial generation of large data.
Data avalanche
The problem is that most large organizations are not equipped with tools for mining large, artificially generated data. A recent survey of more than 1,000 Internet experts and other Internet users from the Pew Research Center and the Imagining Internet Center (the Internet Center) has shown The world may not be ready to handle and understand big data properly.
The experts concluded that massive data, known as "digital emissions", created before 2020 could significantly improve productivity, increase organizational transparency and broaden the boundaries of "predictable future". However, they are also concerned about the people who have access to this information, the people who control access, and whether government or business entities will use this information properly.
"By 2020, the analysis of people and machines for large data could improve social, political and economic intelligence," the survey found. The increase of the so-called large data will promote the realization of the following ideas: real-time prediction of events, the development of ' inference software ' for data patterns used to evaluate project results, and the creation of an advanced association algorithm to help acquire a new understanding of the world.
In the survey, 39% of Internet experts hold negative views on "big data benefits". This negative view concludes: "By 2020, the problems caused by the analysis of people and machines in large numbers will outweigh the problems it solves." The presence of a large number of datasets used for analysis will give us false confidence in our predictive power and lead to many serious and bones errors. Moreover, the analysis of large data will be abused by people and institutions with real power for private purposes, and they will use discovery to do whatever they want.
Bryan Trogdon is one of the entrepreneurs involved in the research. "Big data is like a new field," he said. "Companies, Governments and organizations that are able to exploit this resource have a huge advantage over those that cannot be mined." Because the speed, flexibility and innovation determine the winners and losers, so the big data let our mentality from "two measurement, one cut" into a "quick little bets."
"The media and regulators have demonized big data and its hypothetical threat to privacy," said Jeff Jarvis, a professor and blogger. Such moral panics that have occurred are often attributed to technological change. But the moral of the story is still the same: the value of the data remains to be found, and the value lies in our newly discovered ability to share. ”
"Google's founders have urged government regulators not to allow them to quickly delete search results, because they found themselves able to track the outbreak of flu earlier than health officials in their model and anomaly data, and believe that millions of lives could be saved by tracking epidemics in a similar way." Jarvis continued, "It is no wise act to demonize large data or small data by demonizing it." ”
Sean Mead is the Mead business analyst director of International brand Mead & Clark. "Compared to the Internet and PC revolutions of the late the mid 1990s, the large, publicly available data sets, more easy-to-use tools, wider dissemination of analytical skills, and early AI software will lead to dramatic increases in economic activity and a significant increase in productivity," Mead said. "Social movements will emerge, This frees up access to large data repositories, restricts the development and use of AI, and frees up AI.
Beyond analysis
The above are all interesting arguments, and they do start to be the core of the problem. Our datasets have grown too large for us to analyze and process without using advanced automation technology. We must rely on technology to analyze and process such a large amount of content and metadata.
It has great potential to analyze artificially generated large data. In addition, the power of using metadata has become the key to managing and protecting artificially generated content. File sharing, e-mail, and the Intranet enable business users to save and share files so easily that most organizations now have more of the human-generated content than they can manage and protect with small data thinking.
Many companies face a real problem because they cannot answer the question that can be answered 15 years ago with only a smaller set of static data. The types of these questions include: Where is the key data located? Who has access rights? Who should have access rights? As a result, IDC, an industry research firm, estimates that only half of the data should be protected.
In addition to this issue, there is the issue of cloud-based file sharing. These services generate growing storage for artificially generated content that needs to be managed and protected. The cloud content is outside the enterprise infrastructure, and its control and management processes are different from those in the enterprise, further adding to the complexity.
"We are just beginning to understand the scope of the problems that big data can solve, even if it means admitting that we are less predictive, casual and reckless than we think we are," said David Weinberger of Harvard University's Berkman Center. Organizations would be delighted to use the power of artificially generated large data to reduce the unpredictability, randomness and recklessness of data protection and management. ”
Over the next few years, the idea of artificially generating large numbers will undoubtedly bring opportunities and challenges to the business.