The age of "data innovation": Big Data has great wisdom

Source: Internet
Author: User
Keywords Large data unstructured data innovation huge through
In addition to the "Internet of Things" and "cloud computing", the IT industry has emerged a term-big data. Today, the big data has even aroused great concern in business and financial circles, and it is believed that large data will provide effective help for data application and decision support, and become the soul and inevitable development trend of Internet and cloud computing. Large data are not yet uniformly defined, and are often considered to be unstructured data with large amounts of data and diverse forms of data.








Here we first understand a few concepts, structured data, semi-structured data, and unstructured data. Structured data can be found in relational databases, it has dominated it applications for years; semi-structured data, including e-mail, word processing files, and a large number of news releases on the web, are based on content, which is why Google and Baidu exist, and unstructured data is widely found in social networks, IoT, Electronic commerce. With new technologies such as social networks, mobile computing and sensors emerging, more than 85% of the data is reported to be unstructured data.








Many people believe that these vast heterogeneous data contain huge wealth--companies that can tap into these unstructured data and integrate with the business, the basis for decision-making will be more comprehensive and accurate; in science, sports, advertising and other areas of public health, There is also a trend towards data-driven discovery and decision making.




The driving factors for
's big data are mainly from large IT companies, such as Google, Amazon, China Mobile, Alibaba, etc., who need to store and analyze data in a more optimized way. In addition, there are large data requirements in industries such as health care, geo-spatial remote sensing and digital media. According to market research companies, the total number of digital information expected to increase by 44 times times from 2009 to 2020 in the next 10 years, global data usage will reach about 35.2ZB (1ZB=10 billion TB).





Large data presents the characteristics of "4v+1c": (1) produced, a wide range of large data, in the coding method, data format, application features and many other aspects of the differences, multiple sources of information to form a large number of heterogeneous data, (2) Volume, through a variety of equipment generated by the massive data, Its data scale is extremely large, far greater than the current information flow on the Internet, PB level will be the normal; (3) Velocity, involving the perception, transmission, decision-making, control open loop of large data, real-time processing of data has a very high demand, through the traditional database query method to get the "current results" is likely to have no value at all; (4) Vitality, the data continues to arrive, and only in a specific time and space to be meaningful, (5) complexity, data from the database processing of persistent storage is no longer applicable to large data processing, New methods are needed to meet the requirements of unified access and real-time data processing for heterogeneous data.








Apache Hadoop has become the driving force behind the development of large data industries, and hive and pig technologies are often mentioned. At the same time, computer tools designed to gain knowledge and insight from the vast treasures of unstructured data are also rapidly evolving. The development of these tools relies on advances in artificial intelligence technologies such as natural language processing, pattern recognition and machine learning.








can foresee a large number of tools and platforms to handle large unstructured data in the next year or two. In addition to the batch processing of Hadoop, the method based on stream data processing will also play a role in the application of real-time data analysis. In addition, the large data boom will challenge the understanding and requirements of visualization. Visualization in the data workflow will be interpreted and explored at the same time, data scientists will be visualization as a way to find the problem and explore the new features of the DataSet.








because of the high technical threshold of large data, so at present in the field of competition is mostly in data storage, analysis and other fields have the traditional advantages of manufacturers. Oracle Large Data Machine was officially released in January 2012. IBM's advantage in the Big data field is overall, and the robot "Watson" wins in the man-machine war, and is an example of IBM's bonus points for its large data analysis solution.








The Chinese market is very important in this emerging field. China has a large population base, IT infrastructure is also more mature, the amount of data is unimaginable. Optimists have seen an opportunity, whether it's a system upgrade to cope with the need for massive amounts of data, or an impulse to tap value from the data, and it's possible to usher in an era of intelligent "data innovation". (Liuyu author Unit: Chinese Academy of Sciences Automation)





(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.