HDFS (Hadoop Distributed File System) is the core of the Hadoop project, is the basis of data storage management in distributed computing, and frankly HDFS is a good distributed file system, it has many advantages, but there are some disadvantages, Includes: not suitable for low latency data access, inability to efficiently store large numbers of small files, no support for multiple user writes, and arbitrary modification of files.
When the Apache Software Foundation was established, HDFs was looking for ways to improve its performance and usability, and frankly, it might be more appropriate for pilot projects, unconventional projects, and less demanding environments, but for some Hadoop users, they are for performance, usability, With the high requirements of enterprise-class features and a focus on direct attached storage (DAS) architecture, especially when older versions of Hadoop do not have high-performance master nodes, the next 8 products are the perfect alternative to HDFs.
1. Cassandra (DataStax)
Not a full file system, but an open source, NoSQL key value (Key-value) store. This gives a hdfs choice of Web applications that rely on fast data access. In short, it blends Hadoop into the Cassandra, enabling Web applications to quickly access data through Hadoop, and Hadoop can quickly access data flowing into Cassandra.
2. Ceph
Ceph is an open source, multi-pronged operating system, because of its high-performance parallel file system features, some even think it is based on Hadoop Environment HDFs successor, because since 2010 researchers have been looking for this feature.
3. Cleversafe: Decentralized storage network
This week Cleversafe announced the integration of Hadoop's parallel programming technology and its own decentralized storage network. The principle is that by distributing the entire metadata in a cluster (not relying on a single master node, not relying on replication), Cleversafe says it is faster, more stable, and more scalable than HDFs.
4. GPFS (IBM)
IBM has been selling its parallel file systems to high-performance users, including the world's fastest supercomputer, 2010 years after it launched the GPFS based on Hadoop, and announced that GPFS does not share the cluster version much faster than Hadoop because it runs at the kernel level, Rather than running in an operating system such as HDFs.
5. Isilon (EMC)
EMC has been delivering the Hadoop release for a year, but in January 2012 it was transformed into the Onefs file system of the new HDFS enterprise-level solution--isilon. Because Isilon can read NFS, CIFS, and HDFS protocols, a separate Isilon NAS system can ingest, process, and analyze data.
6. Lustre
The HPC storage provider, Xyratex, wrote in a 2011 report that lustre clusters are faster and cheaper than HDFS based clusters.
7. mapr File System
MAPR file system has a certain reputation in the industry, not only MAPR announced its own file system faster than HDFs 2-5 times (actually 20 times times), it also has mirrors, snapshots, high-performance these corporate users like features.
8. NetApp Hadoop Open Solution
NetApp has revamped the physical Hadoop architecture by placing HDFs in the disk array, which enables faster, more stable, and more secure Hadoop work.
(Responsible editor: The good of the Legacy)