|
Name Server |
Online scalability |
Performance |
Hadoop |
Small files |
Index |
GlusterFS |
None |
N/remount |
S |
Map |
Y * |
|
FastDFS |
Multiple |
Y |
F |
|
Y * |
|
MogileFS |
Multiple |
Y |
N |
|
Y * |
|
TFS |
Active/standby |
Y |
F |
|
Y |
|
MFS |
Multiple |
Y |
F |
Y |
|
|
GridFS |
N/1 |
Y |
N |
|
Y |
|
HDFS |
Active/standby |
Y |
N |
Y |
|
|
MongoDB |
3/1 |
Y |
N |
Map |
Y |
Y |
Hypertable |
Active/standby |
Y |
F |
Y |
Y |
Y |
Voldemort |
None |
N |
F |
|
Y * |
|
CouchBase |
Multiple |
Y |
H |
|
Y * |
Y |
Cassandra |
None |
Y |
N |
|
Y |
|
Glusterfs
Stack type, users can use different modules to constantly stack features. For example, perform various soft RAID, add volume management, fuse, and NFS. These are provided through bilateral translators.
No metadata. Open hash is used. The secondary ing metadata of a directory exists in the directory information.
If you use a non-gluster client, such as NFS, the performance is very general, and read/write can only be a single node at the same time. The gluster client is fully connected, which strictly controls the number of clients.
You cannot delete nodes. It can only be replaced.
Mogilefs
Trackers + storage servers
Trackers metadata is maintained in MySQL.
Storage Server is a web server that uses its own mogstored, or nginx or Apache with some extensions. All uploads and downloads are based on rest (webDAV)
Fastdfs
Trackers + Storage Server
A relatively lightweight file system, which is very compact.
Replication: controls the replication (full in the group) by means of groups. In addition, the access to files within a certain period of time is only in the source service of the group, this avoids the embarrassment that files have not been synchronized to other group members.
Put the metadata in the file name:
The group name, disk, directory, and file name can be directly located on the Storage Server Based on the file ID. For example, group2, m00, 00/00, or cgrgqe7msrxakxx9aazodqcbvvc047.jpg. The file creation time and the IP address of the source SS can be reversed from the file name. In this way, how can we directly access the source server in less than 30 minutes and the steps to access tracker are saved.
By using this naming method, metadata is very lightweight, and you only need to maintain the group and load the group.
TFS
Like FastDFS, metadata is stacked into file names to reduce the number of metadata maintained by the system and support a large number of small files.
DataServer: locate DS Based on the blockid area of the file name, and obtain the file name in the specific DS Based on the fileid area of the file name, DS maintains the disk file corresponding to the file name and the offset and length in the disk file.
MFS
The HDFS compatible product of Mapr, C ++, makes full use of hardware features: bare Devices
GridFS
Use MongoDB to store files.
MongoDB
Multiple config servers, but the entire system becomes read-only if there is a problem.
CouchBase
Memcached + virtual memory.
The storage is based on buckets. Multiple buckets form a vbucket.
Supports view and data search by using lucence.
Voldemort
Very few documents, Read-repair, storage supports a variety of engines, such as BDB, MySQL, in-memory.read-only
Consistent hash: searches the current and subsequent R-1 nodes based on the key location during replication.
HyperTable
1. Metadata nodes support HA
2. hql, traversal, and regular expression keys and values (row_regexp, value_regexp) are supported)
3. Support for Map-reduce Data Processing
4. Better performance. Similarly, HDFS is used more than HBase 30%.
5. The underlying storage supports local file and MapR. Among them, MapR has no single-point NameNode problem, and its performance is claimed to be three times that of hadoop.
OS: CentOS 6.1
CPU: 2X AMD C32 Six Core Model 4170 HE 2.1 Ghz
RAM: 24 GB 1333 MHz DDR3
Disk: 4X2 tb sata Western Digital RE4-GP WD2002FYPS
12 machines
Write
Value Size |
Key Count |
Hypertable Throughput MB/s |
HBase Throughput MB/s |
10,000 |
500,041,347 |
188 |
93.5 |
1,000 |
4,912,173,058 |
183 |
84 |
100 |
41,753,471,955 |
113 |
? |
10 |
167,013,888,782 |
34 |
|
Scan
Value Size |
Keys Submitted |
Hypertable Keys Returned |
HBase Keys Returned |
Hypertable Throughput MB/s |
HBase Throughput MB/s |
10,000 |
500,041,347 |
500,041,347 |
500,026,388 |
478 |
397 |
1,000 |
4,912,173,058 |
4,912,173,058 |
4,912,184,933 |
469 |
371 |
100 |
41,753,471,955 |
41,753,471,955 |
* |
413 |
* |
10 |
167,013,888,782 |
167,013,888,782 |
* |
292 |
|
SRandom read
Dataset Size |
Hypertable Queries/s |
Hbase Queries/s |
Hypertable Latency (MS) |
Hbase Latency (MS) |
0.5 TB |
7901.02 |
4254.81 |
64.764 |
120.299 |
5 TB |
5842.37 |
3113.95 |
87.532 |
164.366 |
Uniform read
Dataset Size |
Hypertable Queries/s |
HBase Queries/s |
Hypertable Latency (MS) |
HBase Latency (MS) |
0.5 TB |
3256.42 |
2969.52 |
157.221 |
172.351 |
5 TB |
2450.01 |
2066.52 |
208.972 |
247.680 |