Relying on cloud computing, digging up the value behind big data

Source: Internet
Author: User
Keywords Cloud computing Big Data

Cloud computing is the inevitable result of the development of information technology and the demand of information society. Cloud computing technology innovation has led to the success of the new business model, the existing electronic information industry and the application model has produced a huge shock, has far-reaching impact. IDC predicts that the global cloud computing sector will have a revenue of $800 billion trillion in the next 3 years. The entire "Twelve-Five" period, China's cloud computing industry scale is expected to reach 7500~10000 billion yuan. At present, the world's major IT vendors are competing into the cloud computing field, to occupy the commanding heights of the next generation of information technology.

Cloud computing needs to avoid two major pitfalls

Under the dual impetus of government and industry, cloud computing has become hot, becoming the hottest area in the emerging industries. This shows that cloud computing has been from "unintelligible" to the deep rooted, but also there are worries and problems. The problems are mainly embodied in two aspects:

On the one hand, concerns about the "cloud bubble". According to the survey, many places have invested heavily in the so-called "cloud" system, but the utilization of resources is less than 20%, the cloud Computing Center has become an image project, or even a disguised commercial real estate projects. Cloud computing itself is a green calculation, not more than the size, equipment, compared to the plant, the development of cloud computing can not become a simple circle of money enclosure, but to avoid duplication of construction and waste of resources, the cloud computing industry to implement, so that consumers benefit from cloud computing. Therefore, the innovative application of cloud computing is the touchstone of the healthy development of cloud computing industry.

On the other hand, cloud computing is being overused as a universal wrapper, as if everything could be cloudy and everything on the internet is cloud computing, so that consumers and investors often obsess over the identification of the real "cloud" and the false "cloud". What are the essential features of cloud computing? First of all, cloud computing is an internet-based, participatory computing model, the basic application of cloud computing should be directly oriented to the Internet, the resources required is not the client but from the network, that is, through the network to provide enterprises and individuals with the necessary computing power, storage space, software functions and information services; second, Cloud computing services must have a high scalability capability, cloud computing services resources can be dynamically adjusted with the application demand, not only can in a few minutes or seconds, automatically increase the number of service resources, improve service capabilities to cope with the network peak traffic, but also with the reduction of application, dynamic reduction of service resources.

Cloud computing supports Big data development

The concept of big data has been increasingly mentioned in recent years by more and more people and is often associated with cloud computing. Large data will undoubtedly bring great value to human society, scientific research institutions can through large data business to help carry out research and exploration, such as environment, resources, energy, meteorology, aerospace, life and other fields of exploration. So what exactly is the relationship between cloud computing and big data? Generally speaking, there is no cloud computing model without the Internet, and there is no large data processing technology without the cloud computing model.

However, the cloud computing environment also presents a new challenge to the large data processing technology, which mainly reflects that the traditional relational database can not meet the requirements of large data processing, such as the high concurrent reading and writing of massive users, efficient storage and access of massive data, high availability and high scalability of the system, etc. Therefore, some manufacturers in the industry have developed a batch of new technologies including distributed data caching, distributed file system, non relational database and new relational database to solve these problems.

Similarly, due to the large amount of data and the characteristics of distribution, the traditional data processing technology is not suitable for processing massive data. This paper presents a new challenge to the distributed parallel processing technology of massive data, and begins to appear a series of new processing technologies, such as data parallel processing technology, increment processing technology and streaming computing technology, which are represented by MapReduce.

The cloud computing era will have more data stored in computing centers. The data is an asset, and the cloud is the place where the data assets are kept and the channels of access. The processing and analysis of large data must rely on cloud computing to provide computing environments and capabilities to excavate valid datasets for specific scenarios and topics. For example, the New York Times used cloud computing to convert more than 400,000 scanned images from 1851 to 1922, and by assigning tasks to hundreds of computers, the work was completed within 36 hours; credit card company Visa calculates a two-year record, including 73 billion deals, up to 36TB of data, Processing time takes 1 months with traditional methods, while the processing technology based on Hadoop takes only 13 minutes.

Value behind mining data

In the Internet age, especially in the mobile internet era, people can only find the potential value of data mining from the mass of low value density data. The large data mining in mobile internet age is mainly the unstructured data mining under the network environment, which reflects the fresh, fragmented and heterogeneous original ecological data. What are the characteristics of this unstructured data? It is often low value, heterogeneous, redundant data, and even some of the data is not used in memory. At the same time, data mining focus on the object has also changed a lot, mining attention is the first small audience, only to meet the needs of the small public excavation, only to meet the needs of the public composed of more small public, so the mobile Internet era data mining an important idea is "from the bottom up" than "Top-down" top design, It emphasizes the authenticity and timeliness of mining data, discovers the correlation, discovers the anomaly, discovers the trend, and finally discovers the value.

In fact, the interactive public on the Internet, not only in the enjoyment of services, but also in providing information. The public's online behavior can no longer be characterized by browsing, searching or mining, and is evolving to create content quickly and swarm intelligence emerges. Small areas of local accumulation characteristics can form a larger range of "public" characteristics, the minority become the foundation of the masses. The understanding of the public, the public and the small audience provides us with an opportunity to recognize the so-called micro-, meso-, or macro-group behavior of human beings at different scales. Therefore, people in the process of data mining should pay attention to the network of large data mining methods, that is, community and community discovery. For example, the wireless T-shirt Company (Threadless) is an online T-shirt retailer and creative settlement, which allows users to share the design of their own T-shirts with the highest votes in the user's design and voting. It also gives the winner a certain fee. Threadless has become a winning model for both business and community models, receiving more than 800 new designs every week, with more than 1000 new registered users each day to discuss design and art, and submit music and videos based on inspiration inspired by the design.

Today, Internet bandwidth is growing at a rate of doubling every 6 months, it is more than every 9 months to double the storage development speed and every 18 months to double the calculation of the speed of development, the rapid development of bandwidth so that human beings into the interactive era, and interactive driving the calculation and storage accelerated forward.

Large data signs a new era, the characteristics of this era is not only the pursuit of rich material resources, and not only the ubiquitous Internet to bring convenient and diverse information services, but also contains different from the material value of data mining, as well as value conversion and so on. and large data will be in the cloud computing technology and other support to explore more value.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.