Hbase is a distributed, column-oriented, open-source database that is built on Google's bigtable theory and based on the Hadoop HDFs file system. HBase differs from a generic relational database (RDBMS). is a database that is suitable for unstructured data storage, and HBase is a column-based database. The following content is based on our already installed Hadoop, HBase. One, hbase shell introduction HBase shell is one of the interfaces that users interact with hbase, and of course, the following table lists the HBase Basic command operations, such as the Java API, in other ways:
Operation |
Command-expression |
Attention
|
Create a table |
Create ' table_name ', ' family1 ', ' family2 ', ' Familyn ' |
|
Add a record |
Put ' table_name ', ' rowkey ', ' family:column ', ' value ' |
|
View Records |
Get ' table_name ', ' Rowkey ' |
Querying a single record is also the most common command for HBase |
View the total number of records in a table |
Count ' table_name ' |
This command is not fast and does not currently find a faster way to count rows |
Deleting records |
Delete ' table_name ', ' rowkey ', ' family_name:column ' DeleteAll ' table_name ', ' Rowkey ' |
The first way to delete a single record column of data The second way to delete the entire record |
|
Delete a table |
1, disable ' table_name ' |
|
2. Drop ' table_name ' |
View all records |
Scan "table_name", {LIMIT=>10} |
LIMIT=>10 only returns 10 records, otherwise it will show all |
Use the basic command above to complete the base hbase operation, the following shell commands in the subsequent HBase operations can play a very important role, and mainly in the process of building the table, see the following several create Properties 1, Bloomfilter The default is none whether the use of the fabric filter using the way Bron filtering can be enabled separately for each column family. Use hcolumndescriptor.setbloomfiltertype (NONE | ROW | Rowcol) Enable Bron for the column families individually. The Default = none does not have a cloth-long filter. For row, the hash of the row key is added to the Bron each time a row is inserted. For rowcol, the hash of row key + column family plus column family adornments will be added to the use method each time the row is inserted: create ' table ', {bloomfilter = ' ROW '} &NBS p; Enable the filter can save the need to read the disk process, can help improve the read delay 2, VERSIONS default is 3 This parameter means that the data is retained three versions, if we think our data is not so much necessary to keep so many, is updated at any time, And the old version of the data is of no value to us, that set this parameter to 1 can save 2/3 of space usage: create ' table ',{versions=> ' 2 '} 3, COMPRESSION default value is None That is, do not use compression This parameter means whether the column family is compressed, what compression algorithm is used How to use: Create ' table ',{name=> ' info ', Compression=> ' SNAPPY '} I recommend the use of SNAPPY compression algorithm, a comparison of compression algorithm online more, I extract a form from the Internet as a reference, specific SNAPPY The installation follow-up will be described in a separate section. This is a set of test data released by Google a few years ago, and the actual test snappy is similar to the one listed in the table below. hbase, before Snappy was released (GoOGLE 2011 External Release snappy), the use of the LZO algorithm, the goal is to achieve as fast as possible compression and decompression speed, while reducing the consumption of CPU; after the snappy release, the snappy algorithm is recommended (refer to the Hbase:the Definitive Guide), specifically, according to the actual situation of lzo and snappy have done more detailed comparison test and then make the choice.
algorithm |
% remaining |
Encoding |
decoding |
Gzip |
13.4% |
MB/s |
118 MB/s |
LZO |
20.5% |
135 MB/s |
410 MB/s |
Zippy/snappy |
22.2% |
172 MB/s |
409 MB/s |
If there is no compression at the beginning of the table, and later want to join the compression algorithm, how to do HBase has another command alter 4, alter HOW to use: such as modifying the compression algorithm disable ' table ' alter ' Table ',{name=> ' info ',compression=> ' snappy '} Enable ' table ' Delete column family disable ' table ' alter ' table ', {name=> ' info ',method=> ' delete '} enable ' table ' but after this modification it was found that the table data was still so large that it did not change much. What to do after the major_compact ' table ' command does not do the actual operation.
5, the TTL default is 2147483647 that is: Integer. The Max_value value is probably 68. This parameter is a description of the column family data survival time is the life cycle unit of the data is written in the writing unit is MS is wrong. This parameter can be set to live on the data according to the specific requirements, the data beyond the time of storage will not be displayed in the table, the next time major compact and then completely delete the data Why delete data in the next major compact, which will be described in detail later. Note that after the TTL setting MIN_VERSIONS=> ' 0 ', after the TTL timestamp expires, all the data under the family will be completely erased if min_versions Not equal to 0 that will keep the latest min_versions version of the data, all other deletions, such as min_versions=> ' 1 ' will keep a current version of the data, other versions of the data will no longer be saved. 6. Describe ' table ' This command looks at the parameters of the CREATE table or is the default value. 7. Disable_all ' toplist.* ' Disable_all supports regular expressions, and lists the following tables for the current match: toplist_a_total_1001 , &NB Sp , &NB Sp , &NB Sp toplist_a_total_1002
toplist_a_total_1008
toplist_a_total_1009
toplist_a_total_1019
toplist_a_total_1035 ... Disable the above tables (y/n)? And give a confirmation prompt 8, drop_all This command and Disable_all use the same way 9, HBase table pre-partitioning that is, manual partitioning
By default, a region partition is created automatically when the HBase table is created, and when the data is imported, all HBase clients write data to this region until the region is large enough to slice. One way to speed up bulk write is by pre-creating some empty regions, so that when data is written to HBase, the data is load-balanced within the cluster according to the region partitioning situation. How to use: Create ' t1 ', ' F1 ', {numregions = Splitalgo ' Hexstringsplit '} How to use the API hbase org.apache.hadoop.hbase.util.RegionSplitter test_table hexstringsplit-c 10- F info parameters easy to read test_table is the table name HexStringSplit is split mode-C is divided into 10 region-f Yes family This allows the table to be pre-divided into 10 zones, reducing the time to automatically partition the data to reach the storefile size, and with one advantage, is the reasonable design rowkey can make each region's concurrent requests The average distribution (tends to be uniform) makes the IO efficiency highest, but the pre-partitioning needs to set the filesize to a larger value, which parameter is set hbase.hregion.max.filesize this value defaults to 10G, which means that a single region The default size is 10G This value occurs from 0.90 to 0.92 to 0.94.3 from 256m--1g--10g, which modifies this value according to its own requirements. But if the MapReduce input type is tableinputformat using HBase as input, be aware that each region is a map, and if the data is less than 10G, then only one map will be enabled. Cause a lot of waste of resources, you can consider the appropriate adjustment of the parameterThe value of a number, or a pre-allocated region, and set Hbase.hregion.max.filesize to a relatively large value, not easy to reach the value of such as 1000G, detect if this value is reached, and then manually assigned region.
The previous talk about why the compact has set the TTL beyond the time of survival disappears and how it disappears. Is it deleted? Which parameters were removed by the. We're going to talk about HBase compact.