Amanma of big data and cloud computing-[software and information services] 2014.08

Source: Internet
Author: User
Tags hadoop ecosystem

Since VMware launched vsphere Big Data extention (BDE) at the 2013 global user conference, big data has become increasingly popular. Of course, BDE is mainly used for hadoop big data applications. In fact, big data is not only hadoop, but also different release versions even if only hadoop is used. However, no matter which version of hadoop or Big Data Platform it is, it is just as important as a good horse and a good saddle. What cloud computing platform is a good saddle of big data?

Operating Environment platform: Multi-tenant, resource supply and management

During customer communication over the past few months, I have heard of different platforms running big data, including mesos for Twitter, virtualization for FedEx, and yarn for Yahoo. Different cloud computing platform features can solve different problems of big data applications. For example, yarn aims to support non-M-R applications on hadoop. While mesos used by Twitter supports mixed loads and uses the virtualization of the operating system. Because the big data application scenarios of enterprises are often diverse, You need to select a platform suitable for different application scenarios, including:

  • Deploying new big data applications is extremely simple: they can be completed through automation and self-service;

  • It can support a variety of loads, that is, it can run a variety of big data applications, not only limited to map-reduce, but also support some hadoop ecosystem applications, SQL services and other general applications;

  • Reliable security isolation: If sensitive information needs to be isolated, the platform can ensure the security of data sets and the environment;

  • Secure resource isolation: to provide sufficient resources to meet the overall SLA requirements, noisy neighbors can be isolated to ensure performance;

  • Multi-version support: it can run multiple Running Environments of different versions to meet the requirements of different users and developers;

  • Enterprise availability: ensures the robustness of the entire system and provides enterprise-level availability.

Network challenges

For the network, challenges and opportunities coexist. Today, two-layer core aggregation and switching networks cannot provide sufficient bandwidth across racks. The bandwidth in the rack should be no problem. It can often reach several hundred Gbit per second, but the bandwidth between the racks is often very limited. Therefore, we often need to optimize the traffic to the local, that is, the full integration of data and computing. Fortunately, the new network topology, including Clos and trunk plus branch design, provides a good solution. With these new network topologies, You can ensure sufficient bandwidth when the latency of the entire cluster is basically constant. No bandwidth problem exists in the rack or between racks.

Storage platform Selection

With the development of storage technology, the storage options of big data are becoming more and more abundant. Of course, hadoop's HDFS is in the core circle, but other storage platforms can also provide compatibility similar to hadoop, plug-and-play, and provide some unique value. Several main storage options are as follows:

Traditional San or NAS: this should be the best storage option to support big data applications, because a large number of data centers can provide such storage options, and also include various storage services, for example, snapshots, archives, and copies;

Software-defined storage built on the built-in disk of the server: HDFS is the main representative in this regard. Other options include CEpH, gluster, and mapr. Both of them can establish a file system to meet the needs of big data applications;

Scale-out storage solution: many emerging companies with unique characteristics provide a solution to replace HDFS with scale-out storage, which effectively solves cost and bandwidth problems. For example, the scale-out storage solution of isilon provides a solution of 3 to 144 nodes, which can be expanded to 15 Pb and gb throughput per second, becoming a typical example of Scale-out storage.

Run big data on the vsphere Platform

The BDE solution released by VMware is constantly improving, providing powerful support for hadoop running in different versions. Currently, BDE can be combined with vcloud automation center to provide self-help creation of hadoop clusters. With the help of vsphere platform, end users can quickly create applications on their own, solving big data deployment problems. The big data platform provides automation and self-service capabilities, making big data no longer a geek patent. Developers and administrators of any big data application only need to focus on their own big data applications, you do not need to care about the underlying architecture.

Note: This article has been published in the August 2014 Journal of software and information services. If you want to know @ yunjie's latest views on cloud computing, please subscribe to the "China yunmeng" public account. You can also scan the following QR code to subscribe directly.After subscribing to "China Cloud Dream", reply directly to "20007" to read this article.

650) This. width = 650; "src =" "Title =" 12cm.jpg "alt =" wkiom1p1yaoxyy5qaacgfe8oo9s729.jpg "/>

This article is from the "China Cloud Dream" blog, please be sure to keep this source

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.