Recently, someone mentioned a problem in Quora about the differences between the hadoop Distributed File System and openstack object storage.
The original question is as follows:
"Both HDFS (hadoop Distributed File System) and openstack Object Storage seem to share a similar objective: To achieve redundant, fast, and networked storage. What technical features make these two systems so different? Is it significant for these two storage systems to eventually converge ?"
After the question was raised, a developer of openstack quickly replied. This article excerpted the first two replies for your reference.
The first answer is from rackspace's openstack swift developer Chuck their:
Although there are some similarities between HDFS and openstack Object Storage Service (SWIFT), the overall design of these two systems is quite different.
1. HDFS uses a central system to maintain file metadata (namenode, Name node). In swift, metadata is distributed and replicated across clusters. Using a central metadata system is no different from a single fault point for HDFS. Therefore, it is more difficult to expand to a large environment.
2. Swift has taken into account the multi-tenant architecture while HDFS does not.
3. HDFS is optimized for larger files (this is often the case when processing data). Swift is designed to store files of any size.
4. in HDFS, a file can be written once, and only one file can be written at a time. In swift, a file can be written multiple times. In a concurrent operating environment, the last operation prevails.
5. HDFS is written in Java, While Swift is written in Python.
In addition, HDFS is designed to store a medium number of large files to support data processing. Swift is designed as a general storage solution, it can reliably store a large number of files of different sizes.
The second answer is Joshua mckenty. He is the chief architect of NASA's nebula cloud computing project and one of the early developers of openstack Nova. He is currently a member of the openstack project regulatory board, or piston. the founder of CC, an openstack-based company.
Chuck gave a detailed introduction to the technical differences between the two, but did not discuss the conceivable integration of the two. The openstack design Summit threw out the topic of integration. In short, HDFS is designed to implement mapreduce processing across objects in the storage environment using hadoop. For many openstack companies (including my own companies), supporting the processing in SWIFT is a goal in the road map, but not everyone thinks mapreduce is the solution.
We have discussed writing wrapper for HDFS, which supports the openstack internal storage application programming interface (API) and allows users to execute hadoop queries on the data. Another way is to use HDFS in swift. However, none of these methods seem ideal.
OpenstackCommunityWe are also working on research and development, and carefully studying other alternative mapreduce frameworks (such as Riak and couchdb ).
Finally, there are some other storage projects that are "affiliated to" openstack community (sheepdog and hc2 ). It is expected to make full use of Data Locality and make Object Storage more intelligent.