DFS & KV

Source: Internet
Author: User
Tags couchbase hypertable mapr glusterfs gluster

Name Server

Online scalability

Performance

Hadoop

Small files

Index

GlusterFS

None

N/remount

S

Map

Y *

FastDFS

Multiple

Y

F

Y *

MogileFS

Multiple

Y

N

Y *

TFS

Active/standby

Y

F

Y

MFS

Multiple

Y

F

Y

GridFS

N/1

Y

N

Y

HDFS

Active/standby

Y

N

Y

MongoDB

3/1

Y

N

Map

Y

Y

Hypertable

Active/standby

Y

F

Y

Y

Y

Voldemort

None

N

F

Y *

CouchBase

Multiple

Y

H

Y *

Y

Cassandra

None

Y

N

Y

Glusterfs

Stack type, users can use different modules to constantly stack features. For example, perform various soft RAID, add volume management, fuse, and NFS. These are provided through bilateral translators.

No metadata. Open hash is used. The secondary ing metadata of a directory exists in the directory information.

If you use a non-gluster client, such as NFS, the performance is very general, and read/write can only be a single node at the same time. The gluster client is fully connected, which strictly controls the number of clients.

You cannot delete nodes. It can only be replaced.

Mogilefs

Trackers + storage servers

Trackers metadata is maintained in MySQL.

Storage Server is a web server that uses its own mogstored, or nginx or Apache with some extensions. All uploads and downloads are based on rest (webDAV)

Fastdfs

Trackers + Storage Server

A relatively lightweight file system, which is very compact.

Replication: controls the replication (full in the group) by means of groups. In addition, the access to files within a certain period of time is only in the source service of the group, this avoids the embarrassment that files have not been synchronized to other group members.

Put the metadata in the file name:

The group name, disk, directory, and file name can be directly located on the Storage Server Based on the file ID. For example, group2, m00, 00/00, or cgrgqe7msrxakxx9aazodqcbvvc047.jpg. The file creation time and the IP address of the source SS can be reversed from the file name. In this way, how can we directly access the source server in less than 30 minutes and the steps to access tracker are saved.

By using this naming method, metadata is very lightweight, and you only need to maintain the group and load the group.

TFS

Like FastDFS, metadata is stacked into file names to reduce the number of metadata maintained by the system and support a large number of small files.

DataServer: locate DS Based on the blockid area of the file name, and obtain the file name in the specific DS Based on the fileid area of the file name, DS maintains the disk file corresponding to the file name and the offset and length in the disk file.

MFS

The HDFS compatible product of Mapr, C ++, makes full use of hardware features: bare Devices

GridFS

Use MongoDB to store files.

MongoDB

Multiple config servers, but the entire system becomes read-only if there is a problem.

CouchBase

Memcached + virtual memory.

The storage is based on buckets. Multiple buckets form a vbucket.

Supports view and data search by using lucence.

Voldemort

Very few documents, Read-repair, storage supports a variety of engines, such as BDB, MySQL, in-memory.read-only

Consistent hash: searches the current and subsequent R-1 nodes based on the key location during replication.


HyperTable

1. Metadata nodes support HA

2. hql, traversal, and regular expression keys and values (row_regexp, value_regexp) are supported)

3. Support for Map-reduce Data Processing

4. Better performance. Similarly, HDFS is used more than HBase 30%.

5. The underlying storage supports local file and MapR. Among them, MapR has no single-point NameNode problem, and its performance is claimed to be three times that of hadoop.

OS: CentOS 6.1
CPU: 2X AMD C32 Six Core Model 4170 HE 2.1 Ghz
RAM: 24 GB 1333 MHz DDR3
Disk: 4X2 tb sata Western Digital RE4-GP WD2002FYPS
12 machines


Write

Value Size

Key Count

Hypertable
Throughput MB/s

HBase
Throughput MB/s

10,000

500,041,347

188

93.5

1,000

4,912,173,058

183

84

100

41,753,471,955

113

?

10

167,013,888,782

34

Scan

Value Size

Keys Submitted

Hypertable
Keys Returned

HBase
Keys Returned

Hypertable
Throughput MB/s

HBase
Throughput MB/s

10,000

500,041,347

500,041,347

500,026,388

478

397

1,000

4,912,173,058

4,912,173,058

4,912,184,933

469

371

100

41,753,471,955

41,753,471,955

*

413

*

10

167,013,888,782

167,013,888,782

*

292

SRandom read

Dataset
Size

Hypertable
Queries/s

Hbase
Queries/s

Hypertable
Latency (MS)

Hbase
Latency (MS)

0.5 TB

7901.02

4254.81

64.764

120.299

5 TB

5842.37

3113.95

87.532

164.366

Uniform read

Dataset
Size

Hypertable
Queries/s

HBase
Queries/s

Hypertable
Latency (MS)

HBase
Latency (MS)

0.5 TB

3256.42

2969.52

157.221

172.351

5 TB

2450.01

2066.52

208.972

247.680

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.