On the development of large data: problems and challenges
Source: Internet
Author: User
KeywordsLarge data surface shallow
At present, almost all world-class internet companies have extended their business to large data industries.
Regardless of the social platform, the price war or the Web portal competition, there is its shadow. Large data, is from the technical hot word into a social wave, affecting all aspects of social life.
What is big data? Large data or huge amounts of information, refers to the amount of data involved in a large scale to be unable to pass the current mainstream software tools, within a reasonable time to achieve capture, management, processing, and collation to help business decision-making more positive purpose information. In the big Data age, written by Victor Maire-Schoenberg and Kenneth Couqueil, large data refers to the 4V features of large data, such as the use of random analysis (sampling), and the method of using all data: Volume (Mass), velocity (high speed), Produced (variety), value (values). So we understand from the definition of "big data" in the "Big Data Age" defined in the four features, we can probably perceive its value: large volume of data, data type, data value density low, data with timeliness.
With the development of various portable devices, Internet and cloud storage technologies, all traces of human beings and objects can be recorded. The core network node in mobile Internet is human, no longer a Web page. In the context of the Big bang, large data also face many challenges.
Challenges from data storage: The problem with large data development is the data information from different places, different standards, large amount of data, multiple structure forms, real-time and so on. These problems undoubtedly increase the difficulty of data acquisition and integration, so the architecture design of block and file based storage system should be modified to overcome the existing problems.
Data security challenges: The continued growth of data brings security issues to the data. First, large data are more likely to be found on the web because of their large target, and second, more sensitive and valuable data are more attractive to potential attackers. In addition, the exposure of personal information can also cause personal security problems.
Challenges from data display: Many users tend to be more concerned with the display of data results than data analysis. The traditional method of outputting the result in text form or displaying the result directly on the computer terminal may be a good choice in the face of the small amount of data, but it is not feasible for the massive data with complex form. This requires the introduction of visualization technology to visualize the final and even intermediate results, in addition, the need for human-computer interaction technology or data origin technology, so that users can get results at the same time better understand the origin of the results.
The challenge from data cost control: Cost control is a key issue for organizations that are using large data environments. Trying to control costs means that we have to make each device more "efficient" while also reducing the expensive parts. Technologies such as data deduplication are already in the primary storage market and can handle more data types, which can bring more value to large data storage applications and improve storage efficiency. In an environment where data volumes are growing, reduce the consumption of back-end storage by just a few percent. Today, traditional boot drives used in data centers not only have high failure rates, but also have higher maintenance and replacement costs. If you use it to replace a standalone server boot drive in the datacenter, you can increase reliability up to 100 times times. And the host system is transparent, for each additional server to provide a unique guide image, can simplify system management, improve reliability, and energy-saving rate of up to 60%, the real cost-saving problems.
The challenge from data analysis is that data analysis is at the heart of a large data processing process, because the value of large numbers comes from the process of analysis, but it also poses great challenges. First of all, the large amount of data brings more value and more data noise, in the data cleaning and other pretreatment work must be more cautious, if the granularity of cleaning, it is easy to filter out useful information, and cleaning the grains through coarse, and can not achieve the ideal cleaning effect, so in the quality And the quantity needs to carry on the careful consideration and the tradeoff, simultaneously also to the machine hardware and the algorithm are the stern test. Secondly, the traditional data warehouse system for processing time is not high, but in many large data application field requirements.
The significance of large data is associated with the increasing popularity of human network behavior. To "purify" the useful information from the massive data is a very huge project, and it is also a major challenge in the current big data age. After several years of criticism, questioning, discussion and speculation on large data, the development of large data is still a long journey.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.