Enterprise CIOs to address big data challenges should answer three big questions first
Source: Internet
Author: User
KeywordsBig data big data challenges solutions
Nowadays, the arrival of large data has become an inescapable challenge in real life. When we make decisions, big data is everywhere. The widespread emergence of large data terminology has also led to a gradual understanding of its importance. The big data is showing the huge opportunities it brings to academia, industry and government. At the same time, large data also posed great challenges to the participating parties, starting with three important technical issues:
How to use information technology and other means to deal with unstructured and semi-structured data
In large data, structured data accounts for only about 15%, and the remaining 85% are unstructured data, which are found in many areas such as social networks, the Internet and E-commerce. On the other hand, perhaps 90% of the data comes from open source data, and the rest is stored in the database. The uncertainty of large data is manifested in high dimension, changeable and strong randomness. Stock trading data flow is a typical example of large uncertainty data.
Big data has stimulated a lot of research problems. The individual manifestations, general features and basic principles of unstructured and semi-structured data are not yet clear, and these need to be studied and discussed through multidisciplinary interdisciplinary, including mathematics, economics, sociology, computer science, and management science. Given a semi-structured or unstructured data such as an image, how can it be transformed into a multidimensional data table, an object-oriented data model, or a direct image-based data model? It is noteworthy that the large data in each form of representation only presents the side performance of the data itself, not the whole picture.
If the process of extracting "rough knowledge" through data mining is called a "mining" process, then the process of "intelligent knowledge" is called "two mining", which combines rough knowledge with the quantified subjective knowledge, including specific experience, common sense, instinct, situational knowledge and user preference. From "one excavation" to "two mining" similar things "quantity" to "quality" leap.
Due to the semi-structured and unstructured characteristics of large data, the structured "rough knowledge" (latent mode) generated by data mining based on large data is also accompanied by some new features. These structured rough knowledge can be processed and transformed by subjective knowledge to generate semi-structured and unstructured intelligence knowledge. Seeking "intelligent knowledge" reflects the core value of large data research.
How to explore the descriptive method of large data complexity and uncertainty feature description and the system modeling of large data
The breakthrough of this problem is the precondition and key of realizing large data knowledge discovery. In the long run, the challenge of the individual complexity and randomness of large data will lead to the formation of large data mathematical structure, which leads to the complete theory of large data unification. In the short term, academia encourages the development of a general conversion principle between structured data and semi-structured, unstructured data to support cross industrial applications of large data. Management science, especially the theory based on optimization, will play an important role in developing the general method and regularity of large data knowledge discovery.
The complex form of large data results in many research issues related to the measurement and evaluation of "rough knowledge". Known optimization, data envelopment analysis, expectation theory, and utility theory in management science can be applied to the study of how to integrate subjective knowledge into the "two mining" process of rough knowledge produced by data mining. Human-Computer interaction here will play a vital role.
Third, the relationship between data heterogeneity and decision heterogeneity influences the discovery of large data knowledge and management decision
Because of the complexity of the large data itself, this problem is undoubtedly an important research subject, which puts forward new challenges to the traditional data mining theory and technology. In the large data environment, the management decision faces two "heterogeneity" problems: "Data heterogeneity" and "decision heterogeneity". The traditional management decision mode depends on the learning of business knowledge and the accumulated practice experience, and the management decision is based on the data analysis.
Large data has changed the pattern of traditional management decision structures. The study of the influence of large data on the management decision structure will become an open research problem. In addition, the change in decision structure requires people to explore how to do "two excavations" to support higher-level decisions. No matter what data heterogeneity the large data brings, "rough knowledge" in large data can still be considered as a "mining" category. It is necessary to find the "intelligent knowledge" produced by "two mining" as a bridge between data heterogeneity and decision heterogeneity. To explore how decision structure is changed in large data environment, it is equivalent to study how to involve decision-makers ' subjective knowledge in decision-making process.
Large data is an artificial nature with hidden laws, and the scientific model of finding large data will bring a general approach to the study of the beauty of large data, though it is difficult to explore, but if we find a way to transform unstructured, semi-structured data into structured data, Known data mining methods will become a tool for large data mining.
These are some of my tips for studying three important technical questions about big data, and just a starting point for studying big data challenges. In addition, there are some problems in data science, including the axiom system that may exist in obtaining data and generating rules from data, knowledge discovery rules based on database and knowledge discovery rules based on open data sources, and the existence of whole and/or local solutions of large data mining. In the near future, I believe these problems need to be studied carefully in order to obtain breakthrough research and application results.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.