Glusterfs will be integrated with hadoop

Source: Internet
Author: User
Tags glusterfs hadoop mapreduce gluster

Big data requires a big file system, which is the design goal of the open-source glusterfs File System in the upcoming glusterfs version 3.3.

The gluster project launched the second test version of glusterfs 3.3 this week. The final release version is expected to be by the end of this year. The new release provides integration points with Apache hadoop, allowing hadoop users to use gluster storage. For gluster, its file system and hadoop HDFS (hadoop file system) are registered (comptable will not translate), but gluster provides some additional benefits, this includes improvements in scalability and performance.

"Glusterfs3.3 added two protocols to its file system," said periasamy, CTO and Director of gluster. "One of them is the object protocol, so you can access data as an object, which is similar to the Amazon S3 protocol"

Periasamy points out that the second protocol is an API compatible with HDFS.

"So you can make big data applications and mapreduce on gluster," said periasamy.

Periasamy points out that there are many reasons why gluster has added support for hadoop. He pointed out that the market trend is the convergence of the entire stack. Previously there was a storage pool similar to San and NAs, but they are only for specific types of applicationsProgram.

"We can see that OSS is another option to store long-term unstructured data," said periasamy. "We can now easily expand and access storage over the Internet ."

Based on HDFS and hadoop, periasamy pointed out that in hadoop, data is first available and then applied. He explained that using the hadoop mapreduce framework, a large number of applications are now enabled, and its growth has injected a powerful ecosystem.

Periasamy said: "The storage engine was initially designed to handle some workloads, and the metadata server is one of the bottlenecks ."

He uses HDFS metadata for explanation. All metadata is centrally stored in a single system memory, which is a performance bottleneck for horizontal scaling. Periasamy points out that gluster already has a powerful storage engine without such metadata bottlenecks.

Periasamy said: "glusterfs looks at big data from a storage perspective, while hadoop projects look at big data from an analysis perspective ".

Therefore, periasamy believes that hadoopCommunityUsing gluster to store large data at the backend brings many benefits. He explained that hadoop itself is sufficient to enable the gluster file system to be inserted in a modular form.

Periasamy also pointed out that the replication mode of gluster can better extend hadoop. Periasamy explained that gluster has a replication module that maintains synchronization between multiple sites. Data changes are synchronized, independent of the snapshot replication model.

"When data changes, we have the ability to synchronize by location so that we can have a continuous regional replication," periasamy said.

Looking forward to the future, periasamy said that geographic replication will continue to be enhanced in the next version of gluster.

"In glusterfs3.4, we can transfer a fault from one site to another," periasamy said.

 

References:

1. http://www.chineselinuxuniversity.net/news/87589.shtml

2. http://www.businesswire.com/news/home/20110823005899/en/Gluster-Announces-Apache-Hadoop-Storage-Compatibility-Latest

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.