Mongodb VS Hbase

Source: Internet
Author: User
Tags mongodb mongodb support

Reprint: http://hi.baidu.com/i1see1you/blog/item/a8038399d9a777286e068c8a.html

1.Mongodb Bson Document database, the entire data exists on disk, HBase is a column database, each familycolumn is saved in a separate HDFs file when the cluster is deployed.

2.MONGODB primary key is "_id", the primary key can not be indexed, records inserted in the order and storage order, the HBase primary key is row key, can be any string (the maximum length is 64KB, the actual application length is generally 10-100bytes), Inside the hbase, the row key is saved as an array of bytes. When stored, the data is sorted according to the dictionary order of the row key (byte orders). When you design a key, you want to fully sort the storage feature and put together the row stores that are often read together.

The result of the dictionary order for int is 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,..., 9,91,92,93,94,95,96,97,98,99. To maintain the natural order of the reshaping, the line keys must be filled with 0 left.

3.MONGODB supports two level index, and hbase itself does not support level two index

4.Mongodb Support collection lookup, regular lookup, range lookup, support skip and limit, etc., is the most like MySQL NoSQL database, and hbase only support three kinds of search: through a single row key access through the row key range, the full table scan

The update of 5.MONGODB is Update-in-place, that is, in situ update, unless the updated data records are not accommodated in situ. and hbase modification and add are the same command: Put, if the put incoming row key already exists to update the original record, in fact HBase is not an update, it is only a different version of the data is saved, hbase the default version of the number of saved versions is 3.

A 6.mongodb delete marks the row's data as deleted because MongoDB does not actually remove records from memory or files when deleting records. Instead, leave the Delete record data blank (write 0 or a special number to identify it) and place the record address in a list of release lists. The good thing is that if a user wants to perform an Insert record operation, MongoDB will first get the size appropriate from the release list. Deleted records "Address return, this method improves performance (avoids malloc memory operation), and MongoDB also uses bucket size array to define multiple lists of size sizes, and for the records to be deleted, put them in the appropriate release list based on their size size. HBase Delete is to create a new tombstonemarkers, and then read and tombstonemarkers do merge, in the occurrence of major compaction when the delete data records will really delete.

7.mongodb and HBase support MapReduce, but MongoDB mapreduce support is not strong enough, if MongoDB fragmentation is not used, MapReduce is not actually executed in parallel

8.mongodb Support Shard Fragmentation, hbase according to the row key automatic load balancing, here Shard key and row key selection as far as possible with a non-incremental field, as far as possible with the distribution of balanced fields, because the fragmentation is based on the scope of the choice of the corresponding access to the server, If you use the Increment field is easy to create hotspot server, because it is based on the scope of key automatically fragmented, if the key distribution is uneven will lead to some key can not be split, resulting in uneven load.

9.MONGODB read efficiency than write high, hbase default for write-read less than the case, can be configured through the Hfile.block.cache.size, the configuration storefile read cache to occupy the heap size of the percentage, 0.2 represents 20%. This value directly affects the performance of the data read. If writing is much less than reading, it's no problem to drive to 0.4-0.5. If reading and writing is more balanced, 0.3 or so. If write more than read, decisive default 0.2 bar. When you set this value, you also refer to Hbase.regionserver.global.memstore.upperLimit, which is the maximum percentage of the Memstore heap, with two parameters one affecting reading and one affecting writing. If the two values add up to more than 80-90%, there will be oom risk, carefully set.

The LSM thought (log-structured Merge-tree) used in 10.hbase is to change the data hold in memory, to the specified threadhold, and then batch to disk after the batch changes are made, so that a single write becomes a batch write. Greatly improve the write speed, but this is difficult to read, it requires the data on the merge disk and the modified data in the memory, which obviously reduces the performance of reading. MongoDB is the idea of mapfile+journal, if the record is not in memory, first loaded into memory, and then in memory after the change log, and then a batch of time to write the data file, so that the higher requirements for memory, at least to accommodate the hot data and index.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.