HBase is the Open-source implementation of Google BigTable, using Hadoop HDFs as its file storage system, using Hadoop MapReduce to handle the massive data in HBase, using zookeeper as a collaborative service.
1. Introduction
HBase is a distributed, column-oriented open source database, rooted in a Google paper, BigTable: A distributed storage system with struc
What is hbase?
Hbase is a sub-project in Apache hadoop. hbase relies on hadoop's HDFS as the basic storage unit. By using hadoop's DFS tool, you can see the structure of these data storage folders, you can also use the MAP/reduce framework (Algorithm) Perform operations on hbase
be encountered while the HBase server is running.The tables on the HBase logic may be partitioned into multiple hregion and then stored on the hregion server.
HBase does not involve the direct deletion and updating of data, and the merge operation is triggered when the number of StoreFile in store exceeds the threshold
The main task of Hmaster is to
StoreFile is HFile.
2. HLog File: The storage format of WAL (Write Ahead Log) in HBase. It is a Hadoop Sequence File physically.HFile
Is the storage format of HFile:
First, the HFile file is not long, and the fixed length is only two of them: Trailer and FileInfo. As shown in the middle, the Trailer has a pointer to the starting point of other data blocks. File Info records the Meta information of the File, such as AVG_KEY_LEN, AVG_VALUE_LEN, LAST_K
This article focuses on people who do not know hbase. I want to answer the following questions based on my personal understanding:
What is hbase?
When to use hbase?
What is the difference with hive and pig?
Hbase Structure
Why is
socorrade systems are built on hbase. [1]
Hbase allows the basic analysis to use much more data than before. This analysis guides Mozilla developers to focus more on developing versions with the least bugs.
Trend Micro provides Internet security and intrusion management services for enterprise customers. An important part of security is perception. Log collection and analysis are crucial to providing this
command to view the current userHBase (main) > WhoAmIManagement of Tables1) all created tables (except the-root table and the. Meta table (filtered out) can be listed by listHBase (main) > list2) Create the table, where T1 is the table name, F1, F2 is the column family of T1. Tables in HBase have at least one column family. Among them, the column families directly affect the physical characteristics of the HBase
HBase from getting started to masteringCourse Study Address: http://www.xuetuwuyou.com/course/188The course out of self-study, worry-free network: http://www.xuetuwuyou.comCourse IntroductionIn the face of large-scale data storage and real-time query, traditional RDBMS has been unable to meet, based on the HDFs of HBase came into being, each table of data can reach millions of columns and billions of, data
Document directory
Hfile
Hlogfile
Http://www.searchtb.com/2011/01/understanding-hbase.htmlHbase Introduction
Hbase-hadoop database is a highly reliable, high-performance, column-oriented, and Scalable Distributed Storage System. hbase technology can be used to build large-scale structured storage clusters on low-cost PC servers.
Hbase is an open-source impl
1. What are the basic features of hbase?
2. What are the issues that hbase can solve relative to a relational database?
3. What is the data model for HBase? How to express. What are the forms of operation.
4. Some concepts and principles of the HBase schema schema design
5. What is the topological
HBase is an open-source NoSQL Scalable Distributed Database. It is column-oriented and suitable for storing very large volumes of loose data. HBase is suitable for real-time business environments that perform random read/write operations on Big data. For more information about HBase, see the HBase project website.
The
Hbase is an open-source nosql Scalable Distributed Database. It is column-oriented and suitable for storing very large volumes of loose data. Hbase is suitable for real-time business environments that perform random read/write operations on big data. For more information about hbase, see the hbase project website.
The
row key in the vicinity:
1
hbase(main):012:0> put ‘PageViews‘, ‘srowkey2‘, ‘info:page‘, ‘/myotherpage‘
Now, if I add a point limit and want to query the record of the row key between R and S, you can use the following structure:
123456789
hbase(main):014:0> scan ‘PageViews‘, { STARTROW => ‘r‘, ENDROW => ‘s‘ }ROW C
column family.
3. Basic usage of hbase Shell
Hbase provides a shell terminal for user interaction. Run the hbase shell command to enter the command interface. Run the HELP command to view the help information of the command.
The usage of hbase is demonstrated using an example of an online student sequence table.
A. HBase Data ModelThe logical entities in HBase mode include:(1) table: HBase uses tables to organize data. The table name is a string, consisting of characters that can be used in the file system path.(2) row: In the table, the data is stored by row. Rows are uniquely identified by row keys (Rowkey). The row key has no data type and is always treated as a byte
Transferred from: http://support.huawei.com/ecommunity/bbs/10242721.htmlThe application of zookeeper in HBaseThe HBase deployment is relatively a larger action that relies on zookeeper Cluster,hadoop HDFS.The Zookeeper function is:1, HBase Regionserver to zookeeper registration, provide hbase regionserver status information (whether online).2, Hmaster start time
: Avg_key_len, Avg_value_len, Last_key, COMPARATOR, Max_seq_id_key, and so on. The data index and Meta index blocks record the starting point for each data block and meta block.The Data block is the basic unit of HBase I/O, and in order to improve efficiency, the hregionserver is based on the LRU block cache mechanism. The size of each data block can be specified by parameters when creating a table, the large block facilitates sequential scan, and the
operations do not need to go through the master cluster, so master generally does not need a high configuration can.
Regionserver Group: Regionserver Group is the real data storage place, each regionserver is composed of several regions, while a region maintains a certain interval rowkey values of data, the entire structure such as:
HBase structure diag
Recently in the study of the use of hbase, and carefully read an official recommended blog, here on the side of the translation as a summary of the way and everyone together to comb the HBase data model and basic table design ideas.
Official recommended Blog Original address: http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_ Khurana.pdf Click on the Open link
This article is based on the example mentioned above after reading the hbase authoritative guide, but it is slightly different.
The integration of hbase and mapreduce is nothing more than the integration of mapreduce jobs with hbase tables as input, output, or as a medium for sharing data between mapreduce jobs.
This article will explain two examples:
1. Read TXT
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.