Even today in the early 21st century, computing still looks more like an art than a science. However, I am afraid that even the most senior members of the industry are unable to clarify the reasons behind. The most likely reason is that it thinks about final products, applications, and data from a variety of perspectives. For example, like three elephant, because the touch of the position is different, so everyone's heart of the Elephant is also different.
In it, we need to integrate a variety of skills and resources into one system to meet business needs. Infrastructure operations need to manage the facilities, hardware, and software needed to run applications. Provide the end user with a variety of features that require engineering to create a set of components. To ensure that the risk of loss of data and intrusion is reduced, and that availability is ensured, security analysis of the environment is required. The production environment needs to be monitored and the user needs to be supported when the business is not working properly. All of this needs to be combined rather than just delivering an application environment, not to mention that most businesses require 30 to 40 applications.
Needless to say, deploying and operating an application and computing infrastructure for an enterprise is a daunting task. If coupled with the ability to dynamically meet a variety of load requirements and manage shared resource pools, the complexity of the problem increases exponentially. Here is a message reply to a large data solution that summarizes all the focus. Large data is chosen because it provides a rich solution for computing facilities that include multiple parallel, massive storage, considerable network bandwidth, multi-level information security, scalability, and more.
Traditional methods
It may be easier to apply the same approach to building data warehouses to large data problems.
Review data domains and deliver information architectures
Estimate the amount of data that needs to be stored and increase by 25%
Design a Hadoop deployment environment with sufficient nodes to handle the maximum business load within the expected time range
Get the necessary hardware and install to the data center
Deploying Hadoop environments
Loading data to an allocated store
Pray and hope that the business does not have too much data to process or load into the environment
This traditional approach utilizes large data tools to produce a one-time application stack that is designed to meet a business requirement. It is not extensible or reusable, and the capacity allocated to fixed functions can or may not be used, resulting in the need to adjust investment expenditures.
Of course, you can also consider the use of external cloud service providers, but it is expected that the need to move a large number of petabytes of data to the public cloud provider, it will cost a considerable amount of traffic and difficult to anticipate the cost.
The way of the cloud
So what does cloud-based design bring? Far beyond the traditional it approach?
Resilience, when customers want to add new data elements to the analysis or add more data to the results, meeting the infrastructure supply and response is no longer a contest.
Use virtualization to make the most of existing resources so that unused resources are not idling.
You can automatically deploy the Hadoop environment so that it is closed when you are not using it.
A well available set of common services, including authentication services, network security, configuration, and management, that do not need to be acquired externally to reduce operational costs.
Since the service supply, data scientists and commercial customers can use the environment on demand without waiting for a long time.
To achieve these benefits, appropriate planning and architecture are required to enable the infrastructure to meet these objectives. The following diagram shows the main focus that must be included to meet the needs of a cloud environment that supports large data.
Although this picture is self-evident, it is important to explore the impact of each branch on the desired output.