Laxcus Big Data Management System (2)-Chapter I Basic overview 1.1 Some thoughts based on the present situation

Source: Internet
Author: User

Chapter I Basic overview

1.1Some thoughts based on the present situation

Over the past more than 10 years, with the popularity and rapid development of the Internet industry, Internet data in various formats has also shown explosive growth. At the same time, in another important area of data application: commercial and scientific computing, the demand for data storage and computing is increasing, driven by a variety of emerging technologies and industries, and the accuracy and precision of computing data is much higher than internet data. And behind these phenomena, the current data calculation has already broken through the MB magnitude, GB becomes the norm, TB becomes popular, is moving to PB, the face of such a huge amount of data, if they manage and use them, to meet a variety of computing needs, to discover and filter the valuable information, usually to improve chip performance, increase memory and disk practices have become increasingly unsustainable and even impractical. In this context, based on network and network communication technology, it is necessary to implement large-scale data processing by connecting computers scattered in different geographical locations and composing spatially dispersed and logically unified computer clusters.

The advantage of a computer cluster is that it emphasizes the overall processing power, each computer as a node to participate in the processing process, take part of the processing tasks, the strength of processing power by all nodes to jointly decide. This mode of work greatly plays a collective role, making the processing performance of a single computer no longer important. And because of the network connection, each computer can join or evacuate the calculation process at any time. This similar to the "Hot Plug and unplug" function, so that the computer cluster in the course of operation can dynamically adjust their computing power, and the cluster computing almost unlimited growth, which is the traditional centralized computing can not be compared. At the same time because no longer pursue the processing performance of a single computer, in the procurement of hardware equipment, according to the actual need to consider, in order to save costs to provide a choice of space.

But it must be seen that, as with the two sides of a coin, cluster computing offers unprecedented processing power, along with many of its inherent problems.

First, because of the numerous and dispersed nodes, the structure of the cluster becomes very large. Individual hardware quality good and evil, network lines, communication equipment, computer connectivity and communication process there is a lot of uncertainty, hardware equipment itself, equipment and equipment, equipment and external environment, cross-influence each other. Under such conditions, it is impossible to ensure the complete stable operation of each equipment, and to solve the stable storage and computation of computer cluster in unstable state becomes the primary requirement.

In addition, the fundamental difference between data processing with a single computer is that the data processing of the cluster is a decentralized computational process. It accepts a large number of request tasks on the front-end, and then assigns these tasks to the many back-end computers to execute. An efficient and reasonable distribution computing algorithm becomes necessary. The problems of this algorithm include: task assignment, process scheduling, fault tolerance, data filtering, data balancing, data summarization and so on, finally forming the same data processing results as the centralized calculation. The process is complex.

Data management benefits become important. In the batch parallel processing of the data queue, to ensure the correct processing results, any single point of data can not be omitted. This needs to be aware of the existence of each data, determine the physical location of the data, verify the availability and correctness of the data, even in the fault state, still need to ensure the normal operation of the calculation process. This is the basic requirement for data processing.

However, as data reserves continue to grow, another phenomenon is beginning to emerge, and it cannot be overlooked that the current data is increasingly moving beyond the digital content itself into an asset, where the underlying and potential value is sometimes immeasurable. How to protect this data is only owned by the data owner, and will not be invaded by the outside world, become the problem that must be solved.

A more important one comes from the user experience. No one would like a complex, cumbersome, hard-to-maintain system. On the contrary, a user-friendly, easy-to-operate and managed product is more easily favored by users. This requires a lot of work in the product design phase, comprehensive consideration of product application scope, processing efficiency, operating costs, as well as user behavior and habits, make the necessary trade-offs, supplemented by technical implementation, in order to obtain a good user experience.

When the hardware infrastructure can be provided has been fixed, a variety of application needs are still evolving and changing, how to adapt to the trend of this change, close to the user's use needs, the development of user satisfaction products, above all, are big data software designers need to think about the problem.

Laxcus Big Data Management System (2)-Chapter I Basic overview 1.1 Some thoughts based on the present situation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.