The big data is in full swing and is another hot word after cloud computing. Large data has the three major features of unprecedented mass, rapidity and diversification, which are the driving forces behind many new technologies that can help enterprises deal with the many new problems of large data.
In many emerging solutions, Hadoop and MapReduce are seen as two promising ways to manage and analyze large data efficiently. But the current run-time engine for MapReduce applications does not yet provide enough functionality to meet the actual needs of the enterprise to deploy MapReduce applications in the production environment.
For an enterprise IT department, the advanced runtime engine should be a management tool that supports the business units within the enterprise while satisfying a high standard of service level agreements (SLAs). Such management tools should be able to support mixed-type workloads, including MapReduce applications and other applications that the business unit intelligently submits on a shared grid.
In addition, today's IT departments are undergoing a major shift. In many cases, it is no longer just a cost center, but instead it is viewed as a service provider that supports the business units in the enterprise. Budget constraints and increased workload requirements have put the IT department under great pressure to make greater use of the existing infrastructure to maximize overall resource utilization.
Getting a quick return on infrastructure investment is one of the most important considerations for the IT department in making procurement decisions. Can make full use of the existing infrastructure, to meet the higher workload requirements, and real-time dynamic to meet demand, further save costs, improve investment returns, is no doubt the company wants to find solutions.
To support new applications such as MapReduce applications, IT departments urgently need a shared service runtime platform for better performance, higher resource utilization, automatic monitoring and diagnostics, as well as more advanced application lifecycle support from the IT infrastructure.
What is a shared service model?
A shared services model is an infrastructure platform that allows multiple applications, regardless of their type and requirements, to be executed in parallel on a shared infrastructure. The shared services model is often managed by an advanced application scheduling and resource management engine, providing a guaranteed service to multiple business units in the enterprise it supports.
The Shared services It model provides several important benefits:
Less money, more work.
Quickly gain ROI on infrastructure
Provide better manageability
Better scalability and flexibility to support changing application requirements
Leverage fee policy to transform the role of IT department from cost center to profit center
Existing Hadoop MapReduce runtime engine challenges
Unfortunately, the current MapReduce runtime engine's Hadoop implementation does not provide the shared service functionality described above. This is due to the basic architectural design of the Hadoop Job Tracker (Hadoop jobtracker) The Hadoop job Tracker is a management layer that provides the necessary services for MapReduce operations during operation.
The current Hadoop job tracker cannot separate job scheduling logic from resource management logic, which directly leads to the following major flaws:
Lack of enterprise-class capabilities. At any one time, only one MapReduce application can run on the cluster. As a result, resources become static and use-only, and applications are serial execution, not parallel execution, resulting in resources that cannot be used efficiently, a chimney-like IT environment, and limited scalability.
Job Tracker becomes a single point of failure. If the job tracker fails, all running jobs will stop.
It is clear that the current Hadoop job tracker is limited in its ability to provide the shared service capabilities that IT departments need to deploy MapReduce applications in production-level environments.
Sharing services with Platform Symphony MapReduce
Platform Symphony MapReduce is a production-level, distributed run-time engine for managing large scale data applications. Platform Symphony MapReduce provides the following unique benefits for enterprise running large data applications:
The ability to bring a shared service platform to the IT department
Increase resource utilization and increase investment return on infrastructure
Able to achieve a complete service level agreement in the enterprise
Provides higher performance and shorter time to achieve results
Simplify IT management and reduce the total cost of managing complex IT environments
Enhance IT agility
How do I deploy a shared service model?
Organizations should use the following methods to deploy a shared service model, depending on their business requirements:
1. "Chimney-style sharing model" provides secure resources for different business units. IT departments provide secure resources to the specific needs of different business units. The business unit requests a certain amount of resources from the centralized IT department for its own use. Those requests are then defined as policies and added to the resource allocation scheme. Resources are not shared between different business units. Centralized IT departments are responsible for managing resource allocations, system monitoring, and troubleshooting.
2. "Proxy sharing model" within the enterprise across the functional areas of resource sharing. Different functional departments within an enterprise share a common set of IT resources, which are defined according to the specific needs of different business units, and are then added to the resource allocation scheme. Instead of assigning static resources to a single business unit, you can provide secure resources to users by dynamically sharing the entire infrastructure.
(Responsible editor: The good of the Legacy)