HBase vs. Oracle comparison (column and row database)

Last Update:2018-04-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 Main differences

HBase is suitable for a large number of insertions and read cases
The bottleneck of HBase is hard drive transfer speed , and Oracle's bottleneck is hard drive seek time .

Hbase essentially has only one operation, which is insert, whose update operation is to insert a row with a new timestamp, and the deletion is to insert a row with an insertion mark .

The main operation is to collect a batch of data in memory, and then bulk write to the hard disk , so the speed of its writing depends mainly on the speed of the hard drive transmission .

Oracle is different because he often has to read and write randomly , so the drive head needs to constantly look for data , so the bottleneck is the hard drive seek time .

HBase is ideal for finding scenes that sort top N by Time
different indexes cause differences in behavior .
Oracle can do both OLTP and OLAP, but in some extreme cases (the load is very large), it is not appropriate .

2 Limitations of HBase:

Can only do simple key value query , complex SQL statistics do not .
you can only do quick queries on the row key .

3 row-style storage for traditional databases

In the case of data analysis, we often use a column as a query condition, and the returned results are often just some columns, not all columns .

The I/O performance of the row database is poor in this case ,

Oracle, for example, has a large data file that

In this data file, a number of blocks are divided, and then the rows are placed in each block ,
Rows are put in one line, squeezed together, and then filled with blocks, and of course, some space is reserved for future update.

The disadvantages of this structure are:

When we read a column , for example, when we just need to read the red labeled column, we can't just read this part of the data, I have to read the entire block into memory and then take the data out of those columns.

In other words, in order to read the data of some columns in the table, I had to read the entire column before I could read the columns.

If the data of these columns is very small , such as the 1T data only accounted for 100M, in order to read 100M data but to read 1TB data into memory, it is obviously not cost-effective .

3.1 B + Index

The data access technology used in Oracle is primarily a B-Number index :

From the tree and the node, you can find the leaf node, which records the key value corresponding to the position of the row.

operation on B-Tree:

B-Tree insertion- split node

B-Number Delete-- merge node

4-Column Storage

the same column of data will be squeezed together , such as squeezed in block, when I need to read a column , only need to read the relevant files or blocks in memory, the entire column will be read out , so I/O will be much less .
The format of the data in the same column is similar , so you can do a large compression . This saves storage space and I/O, because the data is compressed so that the amount of data read is less .

A row database is suitable for OLTP, whereas a column database is not suitable for OLTP.

4.1 BigTable's LSM (Log Struct Merge) index

in HBase The log is the data, the data is the log, they are integrated .

Why do you say that because the update of hbase inserts a row, delete is also inserted into a row , and then hit the delete tag, is not the log?

In HBase, there is the memory store, and the store file, in fact each memory store and each store file is a B + tree attached to each column family ( a bit like the Index organization table of Oracle, Data and indexes are integrated), that is, the following is the column family, above the B + tree, when the data query, the first in memory store in the B + tree to find, if not found, and then to the store file to find .

If the data for a row is scattered across several columns, how do you find the data for the row? Then you need to find several B + trees, which is less efficient. So try to make each insert row of the column family is sparse, only one column family has a value, the other column family has no value,

HBase vs. Oracle comparison (column and row database)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HBase vs. Oracle comparison (column and row database)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HBase vs. Oracle comparison (column and row database)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support