HBase Learning Note two----shell

Source: Internet
Author: User
Tags compact comparison hash table name
Hbase is a distributed, column-oriented, open-source database that is built on Google's bigtable theory and based on the Hadoop HDFs file system. HBase differs from a generic relational database (RDBMS).      is a database that is suitable for unstructured data storage, and HBase is a column-based database.     The following content is based on our already installed Hadoop, HBase. One, hbase shell introduction HBase shell is one of the interfaces that users interact with hbase, and of course, the following table lists the HBase Basic command operations, such as the Java API, in other ways:
Operation Command-expression Attention
Create a table Create ' table_name ', ' family1 ', ' family2 ', ' Familyn '
Add a record Put ' table_name ', ' rowkey ', ' family:column ', ' value '
View Records Get ' table_name ', ' Rowkey ' Querying a single record is also the most common command for HBase
View the total number of records in a table Count ' table_name ' This command is not fast and does not currently find a faster way to count rows
Deleting records Delete ' table_name ', ' rowkey ', ' family_name:column '
DeleteAll ' table_name ', ' Rowkey '
The first way to delete a single record column of data
The second way to delete the entire record

Delete a table 1, disable ' table_name '
2. Drop ' table_name '
View all records Scan "table_name", {LIMIT=>10} LIMIT=>10 only returns 10 records, otherwise it will show all
  Use the basic command above to complete the base hbase operation, the following shell commands in the subsequent HBase operations can play a very important role, and mainly in the process of building the table, see the following several create Properties 1, Bloomfilter   The default is none whether the use of the fabric filter using the way       Bron filtering can be enabled separately for each column family. Use  hcolumndescriptor.setbloomfiltertype (NONE | ROW | Rowcol)   Enable Bron for the column families individually. The Default = none  does not have a cloth-long filter. For  row, the hash of the row key is added to the Bron each time a row is inserted. For  rowcol, the hash of row key + column family plus column family adornments will be added to the     use method each time the row is inserted: create ' table ', {bloomfilter = ' ROW '}    &NBS p; Enable the filter can save the need to read the disk process, can help improve the read delay   2, VERSIONS default is 3 This parameter means that the data is retained three versions, if we think our data is not so much necessary to keep so many, is updated at any time, And the old version of the data is of no value to us, that set this parameter to 1 can save 2/3 of space       usage: create ' table ',{versions=> ' 2 '} 3, COMPRESSION default value is None That is, do not use compression       This parameter means whether the column family is compressed, what compression algorithm is used       How to use: Create ' table ',{name=> ' info ', Compression=> ' SNAPPY '}        I recommend the use of SNAPPY compression algorithm, a comparison of compression algorithm online more, I extract a form from the Internet as a reference, specific SNAPPY The installation follow-up will be described in a separate section.       This is a set of test data released by Google a few years ago, and the actual test snappy is similar to the one listed in the table below.     hbase, before Snappy was released (GoOGLE 2011 External Release snappy), the use of the LZO algorithm, the goal is to achieve as fast as possible compression and decompression speed, while reducing the consumption of CPU;      after the snappy release, the snappy algorithm is recommended (refer to the Hbase:the Definitive Guide), specifically, according to the actual situation of lzo and snappy have done more detailed comparison test and then make the choice.           
algorithm % remaining Encoding decoding
Gzip 13.4% MB/s 118 MB/s
LZO 20.5% 135 MB/s 410 MB/s
Zippy/snappy 22.2% 172 MB/s 409 MB/s








If there is no compression at the beginning of the table, and later want to join the compression algorithm, how to do HBase has another command alter 4, alter HOW to use: such as modifying the compression algorithm disable ' table ' alter ' Table ',{name=> ' info ',compression=> ' snappy '} Enable ' table ' Delete column family disable ' table ' alter ' table ', {name=> ' info ',method=> ' delete '} enable ' table ' but after this modification it was found that the table data was still so large that it did not change much. What to do after the major_compact ' table ' command does not do the actual operation.
5, the TTL default is 2147483647 that is: Integer. The Max_value value is probably 68.       This parameter is a description of the column family data survival time is the life cycle unit of the data is written in the writing unit is MS is wrong.       This parameter can be set to live on the data according to the specific requirements, the data beyond the time of storage will not be displayed in the table, the next time major compact and then completely delete the data       Why delete data in the next major compact, which will be described in detail later.       Note that after the TTL setting  MIN_VERSIONS=> ' 0 ', after the TTL timestamp expires, all the data under the family will be completely erased if min_versions Not equal to 0 that will keep the latest       min_versions version of the data, all other deletions, such as min_versions=> ' 1 ' will keep a current version of the data, other versions of the data will no longer be saved. 6. Describe ' table ' This command looks at the parameters of the CREATE table or is the default value. 7. Disable_all ' toplist.* ' Disable_all supports regular expressions, and lists the following tables for the current match:       toplist_a_total_1001                            ,         &NB Sp                          ,         &NB Sp                                                 ,         &NB Sp                  toplist_a_total_1002                                                                                                                                                       
toplist_a_total_1008
toplist_a_total_1009
toplist_a_total_1019
toplist_a_total_1035 ... Disable the above tables (y/n)? And give a confirmation prompt 8, drop_all This command and Disable_all use the same way 9, HBase table pre-partitioning that is, manual partitioning
      By default, a region partition is created automatically when the HBase table is created, and when the data is imported, all HBase clients write data to this region until the region is large enough to slice. One way to speed up bulk write is by pre-creating some empty regions, so that when data is written to HBase, the data is load-balanced within the cluster according to the region partitioning situation.       How to use: Create ' t1 ', ' F1 ', {numregions = Splitalgo ' Hexstringsplit '}       How to use the API        hbase org.apache.hadoop.hbase.util.RegionSplitter test_table hexstringsplit-c 10- F info         parameters easy to read  test_table  is the table name  HexStringSplit  is split mode-C is divided into 10 region-f Yes family       This allows the table to be pre-divided into 10 zones, reducing the time to automatically partition the data to reach the storefile size, and with one advantage, is the reasonable design rowkey can make each region's concurrent requests The average distribution (tends to be uniform) makes the IO efficiency highest, but the pre-partitioning needs to set the filesize to a larger value, which parameter is set  hbase.hregion.max.filesize this value defaults to 10G, which means that a single region The default size is 10G       This value occurs from 0.90 to 0.92 to 0.94.3 from 256m--1g--10g, which modifies this value according to its own requirements.       But if the MapReduce input type is tableinputformat using HBase as input, be aware that each region is a map, and if the data is less than 10G, then only one map will be enabled. Cause a lot of waste of resources, you can consider the appropriate adjustment of the parameterThe value of a number, or a pre-allocated region, and set Hbase.hregion.max.filesize to a relatively large value, not easy to reach the value of such as 1000G, detect if this value is reached, and then manually assigned region.
The previous talk about why the compact has set the TTL beyond the time of survival disappears and how it disappears. Is it deleted? Which parameters were removed by the. We're going to talk about HBase compact.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.