Environment configuration of HBase and its application

Source: Internet
Author: User

--------------------------------------------------------------------------------------
[Copyright: The author of this article is original, reproduced please indicate the source]
Article Source: http://blog.csdn.net/sdksdk0/article/details/51680296
Juppé Id:sdksdk0

-----------------------------------------------------------------------------------


I. HBase INTRODUCTION 1.1 Introduction

HBase is an open source cottage version of BigTable. is built on the HDFS, providing high reliability, high performance, Columnstore, scalable, real-time reading and writing database system.
It is between NoSQL and RDBMS and can only retrieve data from the primary key (row key) and the range of the primary key, supporting only single-line transactions (complex operations such as multi-table joins can be implemented through hive support). It is mainly used to store unstructured and semi-structured loose data. Like Hadoop, hbase targets rely primarily on scale-out to increase compute and storage capacity by increasing the number of inexpensive commercial servers.

HBase stores data in the form of a table. The table is made up of rows and columns. Columns are divided into a number of column families (row family).

Comparison of 1.2 HBase with traditional database

We can first look at the tables in the traditional relational database:


Then, comparing with HBase's table, HBase's table structure differs greatly from the traditional relational database.



We can find many different places:

HBase does not support SQL statements, it is a nosql one, and if you haven't learned NoSQL or Rubey, we can use Help


1. Do not specify a field when defining a table
2, when the definition of the table as long as the column family name, the number of column families Unlimited
3, each row has a fixed field (row key), with uniqueness
4, the value of the modification, the original value is retained, each value can be preserved multiple versions. The default query is the most recent version of the value. (one version is reserved by default)

The important concepts in 1.3 hbase

Column family: Each column in an hbase table is attributed to a column family. The column family is part of the Chema of the table (and the column is not) and must be defined before the table is used. Column names are prefixed with the column family. For example Courses:history, Courses:math belong to the courses family.

Access control, disk, and memory usage statistics are performed at the column family level. In practical applications, control permissions on the column family help us manage different types of applications: we allow some apps to add new basic data, some apps can read basic data and create inherited column families, and some apps will only allow browsing data (and maybe not even browsing all data for privacy reasons).

Timestamp: A storage unit identified by row and columns in HBase is called a cell. Each cell holds multiple versions of the same piece of data. The version is indexed by time stamp. The type of timestamp is a 64-bit integer. The timestamp can be assigned by HBase (automatically when the data is written), at which time the timestamp is the current system time that is accurate to milliseconds. Timestamps can also be explicitly assigned by the customer. If your application avoids data versioning conflicts, it must generate its own unique timestamp. In each cell, different versions of the data are sorted in reverse chronological order, that is, the most recent data is in the front row.

To avoid the burden of management (including storage and indexing) caused by too many versions of data, HBase provides two ways to recover data versions. The first is to save the last n versions of the data, and the second is to save the version for the most recent period (for example, the last seven days). Users can set them for each column family.

Cell: The only unit determined by {row key, column (=<family> + <label>), version}. The data in the cell is of no type and is all stored in bytecode form.


Ii. hbase Architecture




1. A table is divided into a number of region by row, each region is assigned to a specific regionserver management
2, each region inside also a row of family divided into a number of Hstore
3. The data in each hstore will be landed in several hfile files
4, region volume will continue to grow with the data insertion, to a certain threshold value regret splitting
5, with the division of Region, a regionserver on the management of the region will be more and more
6. Hmaster will load balance according to the number of region managed on Regionserver
7. Data in region has a memory cache: Memstore, access to data takes precedence in Memstore
8, Memstore in the data because space is limited, so you need to flush to the file storefile periodically, each flush is to generate a new storefile
9, the number of storefile will continue to increase over time, Regionserver will regularly merge a large number of storefile (merge)


The design of row keys has a great impact on the efficiency of data query. HBase has good scalability: if storage capacity is insufficient, add datanode or regionservers directly
HBase can be used as a function of the underlying system of an online system.

The hmaster can do load balancing and monitor the data storage between each node.
Each store (column family) will have a memory cache that holds some of the hottest data (recently accessed), so the data can be read much faster.

Files are indexed, so it's quicker to look up.
region will periodically merge in StoreFile.

Three, hbase environment construction

1, first to download an HBase installation files: http://hbase.apache.org/, and then extract to the directory you need to install, if you have learned hbase, I believe that these basic installation will be all. 2, in the HABSE directory under the Conf directory to find hbase-env.sh and Hbase-site.xml, as well as regionservers, and then according to the following configuration, the entire configuration process is very simple.
In Hbase-env.sh, the main is to configure the Java environment variables, there is to turn on the zookeeper function, here to change the default true to False, meaning is to enable zookeeper, but not to enable the zookeeper of hbase, but using my own installed Z Ookeeper.

Export JAVA_HOME=/USR/LIB/JVM/JAVA-1.7.0-OPENJDK-AMD64 Export Hbase_manages_zk=false


In Hbase-site.xml, the main is to configure the host address of HDFs, and the following ubuntu1,2,3 is zookeeper host name Port 2181, different machines can be configured as appropriate.

configuration><property>        <name>hbase.rootdir</name>        <value>hdfs://ubuntu2 :9000/hbase</value></property><property>        <name>hbase.cluster.distributed</name >        <value>true</value></property><property>        <name> Hbase.zookeeper.quorum</name>        <value>ubuntu1:2181,ubuntu2:2181,ubuntu3:2181</value></ Property></configuration>

3, finally modify the regionservers, the default localhost is changed to the host address, this configuration file means that the set from the node, and we previously configured Hadoop cluster is similar, equivalent to that salver.

Ubuntu1ubuntu2ubuntu3


4. Finally copy the Core-site.xml and Hdfs-site.xml in Hadoop to the Conf directory of HBase. 5. The configured file is then sent to the other two nodes through the SCP.

Finally, I just want to say not in this configuration file more dozen letters, or will error. 1. Start all hbase processes
Start the ZK cluster first
./zkserver.sh Start
Start the HBase cluster
start-dfs.sh
Start HBase and run on the master node:
start-hbase.sh
2. Access the HBase Administration page via the browser
192.168.44.131:60010
3, in order to ensure the reliability of the cluster, to start multiple hmaster
hbase-daemon.sh Start Master

The effect of JPS on the master node is that Hregionserver and Hmaster will start

The other child nodes just start the hregionserver process

We can view the startup situation through the Web page: 192.168.44.131:60010, that is, the IP or hostname of your master node + port number 60010 is available.



Iv. use of HBase Shell 4.1 start bhase shell is enabled as long as it runs
Bin/hbase Shell
Let's first show the database, from the picture, we can see that there are errors, and even if HBase does not support the syntax of SQL, we have said earlier. Then we can enter the help command to see the basic statement syntax for HBase.


4.2 Official examples of the construction of the table are:
Examples:  hbase> create ' ns1:t1 ', ' F1 ', splits = [' ten ', ' + ', ' + ', ' + ']  hbase> create ' t1 ', ' F1 ', spli TS = [' Ten ', ' + ', ' + ', ' + ']  hbase> create ' t1 ', ' F1 ', splits_file = ' splits.txt ', OWNER = ' JohnDoe ' 
   hbase> create ' t1 ', {NAME = ' F1 ', VERSIONS = 5}, METADATA = {' MyKey ' = ' myvalue '}  hbase> # Optionally pre-split the table into Numregions, using  hbase> # Splitalgo ("Hexstringsplit", "Uniformsplit" or class  Name)  hbase> create ' t1 ', ' F1 ', {numregions =, Splitalgo = ' hexstringsplit '}  hbase> create ' t1 ', ' F1 ', {numregions = Splitalgo = ' Hexstringsplit ', CONFIGURATION = ' Hbase.hregion.scan.loadColumnFamiliesOnDemand ' = ' + ' True '}}

Then we will take the example to create a new user information table. The table name is User-info, which contains two column families (Base_info and Extra_info), preserving 3 versions.
Create ' user-info ',{name=> ' base_info ',versions=>3},{name=> ' extra_info '}

4.3 Insert the official given statement is:
Hbase> put ' ns1:t1 ', ' R1 ', ' C1 ', ' value ', ts1

So we're going to write it according to its syntax:
Put ' user-info ', ' rk-100001 ', ' base_info:name ', ' Zhang S ' put ' user-info ', ' rk-100001 ', ' base_info:age ', ' Put ' user-info ', ' rk-100001 ', ' base_info:address ', ' Changsha, Hunan '

In HBase, you can insert only one of the strips, like only one at a time, so if we want to insert age,address we need a put.
4.4 Query 1, we can query by scan:
Scan ' User-info '


We can see that it is sorted by key (the name of the field will be sorted by dictionary) k-value

If I insert another line,
Put ' user-info ', ' rk100003 ', ' base_info:name ', ' angelabby '

All field names in a row + field values, when stored, hbase sorts, sorted by the dictionary order of K, all rows are stored sequentially, sorted by Rowkey dictionary order.
This feature can affect continuous storage.
2, get fetch data, can only fetch one row of data at a time

Get ' user-info ', ' rk100003 '

4.5 Modification of three versions:
Put ' user-info ', ' rk100003 ', ' base_info:name ', ' yangying ' put ' user-info ', ' rk100003 ', ' base_info:name ', ' Baobao '

To view the values of previous versions:
Scan ' User-info ', {versions=>10}


4.6 Delete the table must be disabled before it can be dropped.
You need to disable this table before you can drop it. Disable ' user-info ' drop ' user-info '

V. Using HBase in Eclipse

Open Eclipse and import all the packages in Hbase/lib. And then you can start writing happily, here's an example of building a table and inserting data in Eclipse's golden mean hbase:


Build table, DDL operations public static void main (string[] args) throws Masternotrunningexception, Zookeeperconnectionexception, IOException {//configuration conf=new configuration ();//Load hbase-site.xml config file conf= Hbaseconfiguration.create (); Conf.set ("Hbase.zookeeper.quorum", "ubuntu1:2181,ubuntu2:2181,ubuntu3:2181"); Hbaseadmin  admin=new hbaseadmin (conf); TableName name = tablename.valueof ("User-info"); Htabledescriptor  tabledescriptor=new htabledescriptor (name);//Create column name Hcolumndescriptor base_info = new Hcolumndescriptor ("Base_info");//Add the version constraint Base_info.setmaxversions (3) to the column family;//Adds the column family to the table description object Tabledescriptor.addfamily ( Base_info);//Use the CreateTable method to create an object described by the Tabeldescriptor admin.createtable (tabledescriptor);//close connection admin.close ();}


Finally, we can see if the table has been built in the shell window of hbase. Enter list to query the
Then to insert the data:

@Test//Insert data, belong to DML operation public void Put () throws Ioexception{configuration conf=hbaseconfiguration.create (); Conf.set (" Hbase.zookeeper.quorum "," ubuntu1:2181,ubuntu2:2181,ubuntu3:2181 "); htable htable = new htable (conf, "User-info"); Put Put=new put (bytes.tobytes ("rk-10001"));p Ut.add ("Base_info". GetBytes (), "name". GetBytes (), "wangming". GetBytes ( ));p Ut.add ("Base_info". GetBytes (), "Age". GetBytes (), "". GetBytes ()); Htable.put (put); Htable.close ();}

Finally, we can look at the shell window of hbase to see if the table has good data inserted.



This is where HBase's environment is configured and its basic usage and sharing is complete! If you want to learn more about the relevant knowledge, welcome attention, if you have any questions about HBase welcome message!

HBase is very suitable for a large number of data storage, because it is a very large table, can have countless column family, can continue to expand, this feature is the traditional mysql,oracle and other relational database is incomparable!





Environment configuration for HBase and its application

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.