HBase vs. Oracle comparison (column and row database)
1 Main differences
- HBase is suitable for a large number of insertions and read cases
- The bottleneck of HBase is hard drive transfer speed , and Oracle's bottleneck is hard drive seek time .
Hbase essentially has only one operation, which is insert, whose update operation is to insert a row with a new timestamp, and the deletion is to insert a row with an insertion mark .
The main operation is to collect a batch of data in memory, and then bulk write to the hard disk , so the speed of its writing depends mainly on the speed of the hard drive transmission .
Oracle is different because he often has to read and write randomly , so the drive head needs to constantly look for data , so the bottleneck is the hard drive seek time .
- HBase is ideal for finding scenes that sort top N by Time
- different indexes cause differences in behavior .
- Oracle can do both OLTP and OLAP, but in some extreme cases (the load is very large), it is not appropriate .
2 Limitations of HBase:
- Can only do simple key value query , complex SQL statistics do not .
- you can only do quick queries on the row key .
3 row-style storage for traditional databases
In the case of data analysis, we often use a column as a query condition, and the returned results are often just some columns, not all columns .
The I/O performance of the row database is poor in this case ,
Oracle, for example, has a large data file that
- In this data file, a number of blocks are divided, and then the rows are placed in each block ,
- Rows are put in one line, squeezed together, and then filled with blocks, and of course, some space is reserved for future update.
The disadvantages of this structure are:
When we read a column , for example, when we just need to read the red labeled column, we can't just read this part of the data, I have to read the entire block into memory and then take the data out of those columns.
In other words, in order to read the data of some columns in the table, I had to read the entire column before I could read the columns.
If the data of these columns is very small , such as the 1T data only accounted for 100M, in order to read 100M data but to read 1TB data into memory, it is obviously not cost-effective .
3.1 B + Index
The data access technology used in Oracle is primarily a B-Number index :
From the tree and the node, you can find the leaf node, which records the key value corresponding to the position of the row.
operation on B-Tree:
B-Tree insertion- split node
B-Number Delete-- merge node
4-Column Storage
- the same column of data will be squeezed together , such as squeezed in block, when I need to read a column , only need to read the relevant files or blocks in memory, the entire column will be read out , so I/O will be much less .
- The format of the data in the same column is similar , so you can do a large compression . This saves storage space and I/O, because the data is compressed so that the amount of data read is less .
A row database is suitable for OLTP, whereas a column database is not suitable for OLTP.
4.1 BigTable's LSM (Log Struct Merge) index
in HBase The log is the data, the data is the log, they are integrated .
Why do you say that because the update of hbase inserts a row, delete is also inserted into a row , and then hit the delete tag, is not the log?
In HBase, there is the memory store, and the store file, in fact each memory store and each store file is a B + tree attached to each column family ( a bit like the Index organization table of Oracle, Data and indexes are integrated), that is, the following is the column family, above the B + tree, when the data query, the first in memory store in the B + tree to find, if not found, and then to the store file to find .
If the data for a row is scattered across several columns, how do you find the data for the row? Then you need to find several B + trees, which is less efficient. So try to make each insert row of the column family is sparse, only one column family has a value, the other column family has no value,
HBase vs. Oracle comparison (column and row database)