Hbase
Hbase is a distributed, column-oriented open source database. This technology comes from the Google paper "bigtable: a distributed storage system for structured data" written by Chang et al ".
Just as bigtable uses the distributed data storage provided by the Google file system, hbase provides bittable-like capabilities on hadoop.
Hbase is a subproject of Apache hadoop project.
Hbase is different from general relational databases. It is a database suitable for storing unstructured data. The other difference is that hbase is column-based instead of Row-based.
Big Table ideas:
Take the student table that stores the link as an example:
In the idea of using bigtable, the big tables with three columns: Student ID (key), attribute (name, age,), and value (value)
All the tables in the world can be represented by big tables with three columns:
The id value of the row key object,
Properties,
Value
Bigtable query: Quick query of key-Value
Hbase logic model:
Store data in the form of tables;
A table consists of rows and columns. Each column belongs to a column family. The storage unit determined by rows and columns is called an element;
Each element stores multiple versions of the same data, which are identified by timestamps;
Habase has multiple tables
Columnfamily needs to be pre-defined, and columns in the columnfamily do not need to be pre-defined. columns in the column family are limited by delimiters;
The row key can be repeated;
Hbase solves the problem that HDFS file systems cannot be modified:
Mark Deletion
Set up a mechanism in the memory, data storage memory, data modification (append,) in the memory data reorganization mechanism,
After collecting data for a certain period of time in the memory, write a file to the hard disk as a block
Data is reorganized and merged at every break time, and small files are merged to solve the deletion problem.
Time-oriented query: Suitable for social networking websites and other application scenarios
Row key:
The row key is the unique identifier of a Data row in the table and serves as the primary key for record retrieval;
There are only three ways to access the rows in the table:
Access through a single row key
Range access for a given row key
Full table Scan
The row key can be any string with a maximum length of no more than 64 KB and stored in lexicographically;
For rows that are frequently read together, you must carefully design the row key values so that they can be stored together;
Column families and columns:
The column is represented as <column family >:< qualifier>
The column family is pre-defined, and columns in the column family are randomly added.
Hbase stores data on disks according to the columnfamily. This columnar database design is very suitable for data analysis;
It is recommended that the elements in the column family have the same read/write mode (such as long strings) to improve performance;
Column-oriented storage: When the row keys are the same, columns with the same family will be put together,
Timestamp:
The time corresponding to each data operation can be automatically generated by the system or displayed by the user;
Hbase supports two data version recycling methods:
1. Each data unit only stores the latest version of a specified number.
2. Save the version with the specified time length
Common Client time queries "the latest data from a certain time point" or "giving me all versions of Data"
The element is determined by the row key, column family qualifier, and unique timestamp;
The elements are stored in bytecode, and there are no types;
Hbase physical model:
650) This. width = 650; "Title =" hbase.png "src =" http://s3.51cto.com/wyfs02/M01/48/66/wKioL1QHVYKzK1M7AAIcMsaxpOo869.jpg "alt =" wkiol1qhvykzk1m7aaicmsaxpoo869.jpg "/>
This article from "Linux _ ant" blog, please be sure to keep this source http://onlyoulinux.blog.51cto.com/7941460/1548558
Hadoop learning notes ----- hbase Theory