How to deploy openstack to hadoop

Last Update:2014-10-26 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

With the rapid development of the information age, big data technology and the private cloud environment are both very useful. However, if they are combined, enterprises will obtain huge profits. Although the combination of the two will make the environment more complex, enterprises can still see the significant synergy effect of combining the openstack private cloud with the Apache hadoop environment. How can it be better?

　　 Solution 1. Swift, Nova + Apache hadoop mapreduce

Enterprises wishing to achieve higher flexibility, scalability, and autonomy in the big data environment can leverage the inherent capabilities of open-source products provided by Apache and openstack. To this end, enterprises need to use these two technology stacks to the maximum extent, which requires the use of different ways of thinking from the preceding solutions to design the environment. In this regard, the software development professional network is very experienced.

To obtain a fully scalable and flexible big data environment, you must run it in a private cloud environment that provides both storage and computing nodes. Therefore, an enterprise must first build a private cloud and then add large data. Therefore, in this case, SWIFT, Nova, and rabbitmq are used and the controller node is used to manage and maintain the environment. However, the question is whether an enterprise needs to divide the environment into several parts for different systems and business departments (for example, non-Big Data virtual machines or customer server instances ). If you want to use the private cloud completely, you should add quantum to divide different environments from the network perspective.

　　 Solution 2. Swift + Apache hadoop mapreduce

In a private cloud environment, one of the common big data deployment models is to deploy the swift Storage Technology of openstack to the Apache hadoop mapreduce cluster for processing. The advantage of this architecture is that enterprises will obtain a scalable storage node that can be used to process the accumulated data. According to the IDC survey, the annual growth rate of data has reached 60%. This solution will meet the increasing data requirements and allow organizations to start a pilot project to deploy Private clouds at the same time.

The best application scenario of this deployment model is that enterprises want to use private cloud technology through the storage pool and use big data technology internally. Best practices indicate that enterprises should first deploy big data technology in your production data warehouse environment, and then build and configure your private cloud storage solution. If the Apache hadoop mapreduce technology is successfully integrated into the data warehouse environment and your private cloud storage pool has been correctly built and run, then you can integrate private cloud storage data with the pre-scheduled hadoop mapreduce environment.

　　 Solution 3. Swift + cloudera Apache hadoop release

For enterprises that do not want to use big data from the beginning, they can use big data devices provided by cloudera and other solution providers. The cloudera release includes an Apache hadoop (CDH) solution that allows enterprises to recruit or train employees without having to look at every nuances of hadoop, therefore, a higher ROI can be achieved in big data ). This is especially attractive for enterprises that do not have big data or private cloud skill sets and want to integrate the technology into their product portfolios in a slow and progressive manner.

Big Data and cloud computing are relatively new technologies, and many enterprises want to use them to save costs. However, many enterprises are hesitant to fully adopt these technologies. By leveraging the big data software versions supported by suppliers, enterprises will be more calm in this regard, and they can also learn how to use these technologies to take advantage of their own advantages. In addition, if you use big data software to analyze large datasets and you can manage these datasets through private cloud storage nodes, these enterprises can achieve higher utilization. To better integrate this policy into an enterprise, you must first install, configure, and manage CDH to analyze the enterprise's data warehouse environment, add the data stored in swift to the desired location.

After setting and testing the private cloud environment, you can merge Apache hadoop components into it. At this time, the Nova instance can be used to store nosql or SQL data storage (yes, they can coexist) as well as pig and mapreduce instances; hadoop can be located on an independent non-Nova machine, to provide processing functions. In the near future, hadoop is expected to run on the Nova instance so that the private cloud can be included in all the Nova instances.

　　 Solution 4: GFS, Nova, pig, and mapreduce

From the perspective of architecture, apart from using SWIFT of openstack to implement scalable storage, there may be other options. In this example, Google File System (GFS), Nova component, and Apache hadoop component are used. Specifically, pig and mapreduce are used. This example allows enterprises to focus on developing a private cloud computing node for computing only, while using Google's public storage cloud as data storage. By using this hybrid cloud, enterprises can focus on the core capabilities of computing and processing functions, and third parties are responsible for storage. This model can take advantage of storage solutions from other vendors, such as Amazon simple storage service. However, before using any external storage, enterprises should use a scalable File System (xfs) internally) to build the solution, test it, and then extend it to the public cloud. In addition, based on data sensitivity, enterprises may need to use data protection mechanisms, such as obfuscation, anonymization, encryption, or hashing.

Tips

When integrating cloud computing and big data technologies into the enterprise environment, you must build skill sets for the two technology platforms. After understanding these technologies, you can build a lab to test the combined results of the two platforms. Because it contains many different components, you must follow the verified path mentioned above in the implementation process. In addition, enterprises may encounter some setbacks when trying to merge these two models. They should use other methods after several attempts. These methods include devices and hybrid clouds.

Obstacles and traps

Because these are relatively new technologies, most enterprises need to use existing resources for testing, and then perform a large amount of capital expenditure (CAPEX ). However, if reasonable budgets and personnel training are not provided for the application of these technologies in the enterprise, the pilot and testing work will fail. Similarly, in the absence of a complete private cloud department, enterprises should first implement big data technology and then implement private cloud.

Finally, enterprises need to develop a strategic road map for private cloud and big data plans. To achieve a successful deployment, You need to perform more analysis and "work", which may delay the processing. To eliminate this risk, an iterative project management method should be adopted to deploy the technology to the business department in a phased manner. Enterprises need to confirm how to connect

How to deploy openstack to hadoop

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More