One, HBase (NoSQL) data model
1.1 Tables (table), which is the storage management data.
1.2 Rows key (Row key), similar to the primary key in MySQL, the row key is the HBase table naturally comes with, do not need to specify when creating the table
1.3 Column family (column family), a collection of columns.
There are multiple rows in a table, a row jian reads a record, the column family is similar to the column in MySQL, but it is a collection of columns
The column families in HBase need to be specified when the table is defined, and the columns are dynamically incremented when the record is inserted.
When data in an hbase table is stored on a local disk, each column family is stored separately as a file.
Represents a row of tables in HBase
Unlike relational databases,
The value of each column in a relational database row can only be one, such as:
UserId UserName
1 jchubby
In NoSQL, the value of one column in a row can be multiple, such as, or:
UserId UserName
1 jchubby
Looky
The timestamp timestamp column is omitted, but when reading this line of data in NoSQL, the data should be the same as the relational database read.
The timestamp column acts as the identity column data version, and when the timestamp is not specified the default is the most recent column data, please refer to
1.4 The stored data is a byte array.
Ii. Physical model of HBase
2.1 HBase is a database of simple second-level queries for massive data, such as 20PB.
2.2 The records in the HBase table, split by row key and split into region.
For example, in a table with 1W rows, each 2K row is divided into a region stored in a different node, each region records the starting and final position of the line health [Startkey,endkey]
Many region stores are stored in region server (a separate physical machine).
In this way, the operation of the table translates into a parallel query to multiple region servers.
There are two special tables in HBase, namely-root and. META
. The meta records the beginning and end of each region, and when the. Meta records are large, they are split into different region records in the-root table according to the same rules.
As shown, when querying the data, find the region information recorded in the-root table and locate the corresponding. Region in the meta table, querying data on the region to the actual node
Iii. the system structure of HBase
3.1 HBase is a master-slave structure, hmaster, hregionserver
Iv.. HBase Pseudo-distributed installation
HBase installation is built on top of Hadoop and zookeeper clusters
Ensure that Hadoop and zookeeper clusters are installed successfully and started during installation
4.1 decompressing, renaming, setting environment variables
Copy the hbase-0.94.2-security.tar.gz to the/home/hadoop.
Unzip hbase-0.94.2-security.tar.gz and rename
#cd/home/hadoop
#tar-ZXVF hbase-0.94.2-security.tar.gz
#mv hbase-0.94.2-security HBase
Modify the/etc/profile file.
#vi/etc/profile
Increase
Export Hbase_home=/home/hadoop/hbase
Modify
Export path= $JAVA _home/bin: $PATH: $HADOOP _home/bin: $HBASE _home/bin
Save exit
#source/etc/profile
4.2 Modify $hbase_home/conf/hbase-env.sh, modify the content as follows:
Export java_home=/usr/java/jdk1.6.0_45
Export Hbase_manages_zk=true
The first configuration of the Java environment variable
The second hbase configured on this machine can start zookeeper itself and use
4.2 Modify $hbase_home/conf/hbase-site.xml, modify the content as follows:
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Hbase.rootdir Configuring the path to HBase storage on the HDFs file system
Hbase.cluster.distributed whether the configuration is distributed
Hbase.zookeeper.quorum configuration zookeeper on which node
Dfs.replication Number of configuration replicas
Note: The host and port number of the Hbase.rootdir is consistent with the Fs.default.name host and port number of the Hadoop configuration file Core-site.xml
4.3 (optional) The content of the file regionservers is master, which records the hostname of each node of the regionserver because it is a pseudo-distributed installation, only one, localhost, or host name can be
4.4 Start HBase and execute the command in the bin directory start-hbase.sh
Before starting HBase, make sure that Hadoop is healthy and can write to the file *******
4.5 Verify that the installation is successful:
(1) Implementation of JPS, found that the new addition of 3 Java processes, respectively, Hmaster, Hregionserver, Hquorumpeer
(2) Access to http://master:16010 using a browser, you can enter a Web management page similar to Hadoop
HBase Foundation and pseudo-distributed installation configuration