On the development of large data: problems and challenges

Last Update:2014-12-18 Source: Internet

Author: User

Keywords Large data surface shallow

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

At present, almost all world-class internet companies have extended their business to large data industries.

Regardless of the social platform, the price war or the Web portal competition, there is its shadow. Large data, is from the technical hot word into a social wave, affecting all aspects of social life.

What is big data? Large data or huge amounts of information, refers to the amount of data involved in a large scale to be unable to pass the current mainstream software tools, within a reasonable time to achieve capture, management, processing, and collation to help business decision-making more positive purpose information. In the big Data age, written by Victor Maire-Schoenberg and Kenneth Couqueil, large data refers to the 4V features of large data, such as the use of random analysis (sampling), and the method of using all data: Volume (Mass), velocity (high speed), Produced (variety), value (values). So we understand from the definition of "big data" in the "Big Data Age" defined in the four features, we can probably perceive its value: large volume of data, data type, data value density low, data with timeliness.

With the development of various portable devices, Internet and cloud storage technologies, all traces of human beings and objects can be recorded. The core network node in mobile Internet is human, no longer a Web page. In the context of the Big bang, large data also face many challenges.

Challenges from data storage: The problem with large data development is the data information from different places, different standards, large amount of data, multiple structure forms, real-time and so on. These problems undoubtedly increase the difficulty of data acquisition and integration, so the architecture design of block and file based storage system should be modified to overcome the existing problems.

Data security challenges: The continued growth of data brings security issues to the data. First, large data are more likely to be found on the web because of their large target, and second, more sensitive and valuable data are more attractive to potential attackers. In addition, the exposure of personal information can also cause personal security problems.

Challenges from data display: Many users tend to be more concerned with the display of data results than data analysis. The traditional method of outputting the result in text form or displaying the result directly on the computer terminal may be a good choice in the face of the small amount of data, but it is not feasible for the massive data with complex form. This requires the introduction of visualization technology to visualize the final and even intermediate results, in addition, the need for human-computer interaction technology or data origin technology, so that users can get results at the same time better understand the origin of the results.

The challenge from data cost control: Cost control is a key issue for organizations that are using large data environments. Trying to control costs means that we have to make each device more "efficient" while also reducing the expensive parts. Technologies such as data deduplication are already in the primary storage market and can handle more data types, which can bring more value to large data storage applications and improve storage efficiency. In an environment where data volumes are growing, reduce the consumption of back-end storage by just a few percent. Today, traditional boot drives used in data centers not only have high failure rates, but also have higher maintenance and replacement costs. If you use it to replace a standalone server boot drive in the datacenter, you can increase reliability up to 100 times times. And the host system is transparent, for each additional server to provide a unique guide image, can simplify system management, improve reliability, and energy-saving rate of up to 60%, the real cost-saving problems.

The challenge from data analysis is that data analysis is at the heart of a large data processing process, because the value of large numbers comes from the process of analysis, but it also poses great challenges. First of all, the large amount of data brings more value and more data noise, in the data cleaning and other pretreatment work must be more cautious, if the granularity of cleaning, it is easy to filter out useful information, and cleaning the grains through coarse, and can not achieve the ideal cleaning effect, so in the quality And the quantity needs to carry on the careful consideration and the tradeoff, simultaneously also to the machine hardware and the algorithm are the stern test. Secondly, the traditional data warehouse system for processing time is not high, but in many large data application field requirements.

The significance of large data is associated with the increasing popularity of human network behavior. To "purify" the useful information from the massive data is a very huge project, and it is also a major challenge in the current big data age. After several years of criticism, questioning, discussion and speculation on large data, the development of large data is still a long journey.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More