Hbaseshell basics and common commands

Source: Internet
Author: User
Hbaseshell basics and common commands HBase is a distributed, column-oriented open source database, originated from a google paper bigtable: A Distributed Storage System for structured data. HBase is an open-source implementation of GoogleBigtable. it uses HadoopHDFS as its article... hbase shell is a distributed, column-oriented open source database and a distributed storage system for structured data. HBase is an open-source implementation of Google Bigtable. it uses Hadoop HDFS as its file storage system, Hadoop MapReduce to process massive data in HBase, and Zookeeper as a collaborative service. Www.2cto.com 1. introduction HBase is a distributed, column-oriented open source database and a distributed storage system for structured data. HBase is an open-source implementation of Google Bigtable. it uses Hadoop HDFS as its file storage system, Hadoop MapReduce to process massive data in HBase, and Zookeeper as a collaborative service. 2. HBase table structure www.2cto.com HBase stores data in the form of tables. A table consists of rows and columns. The column is divided into several column families/column families ). Row Key column-family1 column-family2 column-family3column1 column2 column1 column2 column3 column1key1 key2 key3 as shown above, key1, key2, key3 is the only row key value of three records, column-family1, column-family2, the column-family3 is a three-column family, each containing several columns. For example, the column-family1 family contains two columns named column1 AND column2, t1: abc, t2: gdxdf is a cell uniquely identified by row key1 and the column-family1-column1. The cell contains two data types: abc and gdxdf. The timestamps of the two values are different, t1, t2, and hbase returns the value of the latest time to the requester. The specific meanings of these terms are as follows: (1) the Row Key is the same as that of nosql databases, and the row key is the primary Key used to retrieve records. There are only three methods to access rows in hbase table: (1.1) access through a single row key (1.2) scan the row key Row key (Row key) through the range (1.3) of the row key) it can be any string (the maximum length is 64 kB, and the actual length is generally 10-bytes). in hbase, the row key is saved as a byte array. Data is stored in the lexicographic order (byte order) of the Row key. When designing keys, you need to fully sort and store the rows that are frequently read together. (Location correlation) Note: the result of int sorting in lexicographic order is, 11, 21 ,..., 9,91, 92,93, 94,95, 96,97, 98,99. To maintain the natural order of the integer, the row key must be left filled with 0. One read/write operation on a row is an atomic operation (no matter how many columns are read/written at a time ). This design decision makes it easy for users to understand the program's behavior when performing concurrent update operations on the same row. (2) each column in the column family hbase table belongs to a column family. A column family is a part of the table's chema (rather than a column) and must be defined before the table is used. All column names are prefixed with column families. For example, courses: history and courses: math all belong to the courses column family. Access Control, disk and memory usage statistics are all performed at the column family level. In practical applications, the control permissions on the columnfamily can help us manage different types of applications: we allow some applications to add new basic data, some applications to read basic data and create inherited columnfamily, and some applications to only browse data (or even not because of privacy ). all data ). (3) a storage unit identified by row and columns in Cell HBase is called cell. By {row key, column (= + ), Version} the unique unit. The data in cell is of no type and all are stored in bytecode format. (4) timestamp each cell stores multiple versions of the same data. Versions are indexed by timestamps. The timestamp type is a 64-bit integer. The timestamp can be assigned by hbase (automatically when data is written). the timestamp is accurate to the current system Time in milliseconds. The timestamp can also be explicitly assigned by the customer. To avoid data version conflicts, the application must generate a unique timestamp. In each cell, data of different versions are sorted in reverse chronological order, that is, the latest data is ranked first. To avoid the management (including storage and indexing) burden caused by excessive data versions, hbase provides two data version recycling methods. The first is to save the last n versions of the data, and the second is to save the versions (for example, the last seven days) in the recent period ). You can set for each column family. 3. basic usage of HBase shell hbase provides a shell terminal for user interaction. Run the hbase shell command to enter the command interface. Run the help command to view the help information of the command. The usage of hbase is demonstrated using an example of an online student sequence table. Name grad coursemath artTom 5 97 87Jim 4 89 80 here grad is a column family only for the table. course is a column family with two columns for the table, this column family consists of two columns: math and art. of course, we can create more columns in course as needed, such as computer, physics and other columns are added to the course columnfamily. (1) create a table scores with two columnfamilies grad and courese hbase (main): 001: 0> create 'scores', 'grad ', 'course' can use the list command to view the tables in the current HBase. Use the describe command to view the table structure. (Remember to add quotation marks to all the representations and column names.) (2) insert a value based on the designed table structure: put 'scores', 'Tom ', 'grade :', '5' put 'scores', 'Tom ', 'course: Math', '97' put 'scores', 'Tom ', 'course: art ', '87 'put' scores', 'Jim', 'grade ', '4' put 'scores', 'Jim', 'course :', '89 'put' scores ', 'Jim', 'course: ', '80', the table structure is up. In fact, it is relatively free to add subcolumns freely in the column family. If there are no child columns in the column family, you can add a colon without adding it. The put command is relatively simple. only this method is used: hbase> put 't1 ', 'r1', 'C1 ', 'value', ts1 t1 indicates the table name, r1 indicates the row key name, c1 indicates the column name, and value indicates the cell value. Ts1 indicates the timestamp, which is generally omitted. (3) get 'scores', 'Jim' get 'scores', 'Jim' Jim ', and 'grade' based on the key value. you may find the regular expression in HBase shell operations, an approximate sequence is the sequence of Operation keywords followed by the table name, row name, and column name. if there are other conditions, add them in curly brackets. Get useful methods: hbase> get 'T1', 'r1 'hbase> get 'T1', 'R1', {TIMERANGE => [ts1, ts2]} hbase> get 'T1', 'R1', {COLUMN => 'C1'} hbase> get 'T1', 'R1 ′, {COLUMN => ['C1 ', 'C2', 'C3']} hbase> get 't1 ', 'R1', {COLUMN => 'C1 ′, TIMESTAMP => ts1} hbase> get 't1 ', 'r1', {COLUMN => 'C1', TIMERANGE => [ts1, ts2], VERSIONS => 4} hbase> get 't1 ', 'r1', {COLUMN => 'C1 ', TIMESTAMP => ts1, VERSIONS => 4} hbase> get 'T1', 'R1 ′, 'C1 'hbase> get 'T1', 'R1', 'C1', 'C2' hbase> get 'T1', 'R1', ['C1 ', 'C2 '] (4) scan all data scan 'scores' can also specify modifiers: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS. There is no modifier, that is, the above example, all data rows will be displayed. Example: hbase> scan '. META. 'hbase> scan '. META. ', {COLUMNS => 'info: regioninfo'} hbase> scan 'T1', {COLUMNS => ['C1', 'C2 '], LIMIT => 10, STARTROW => 'XYZ'} hbase> scan 'T1', {COLUMNS => 'C1', TIMERANGE => [1303668804,130 3668904]} hbase> scan 'T1 ′, {FILTER => "(PrefixFilter ('row2') AND (QualifierFilter (>=, 'binary: XYZ') AND (TimestampsFilter (123,456 )) "} hbase> scan 'T1', {FILTER => org. apache. ha Doop. hbase. filter. columnPaginationFilter. new (1, 0)} filters have two methods:. using a filterString-more information on this is available in theFilter Language document attached to the HBASE-4176 JIRAb. using the entire package name of the filter. there is also a CACHE_BLOCKS modifier that switches the scan cache. by default, it is enabled (CACHE_BLOCKS => true). you can choose to disable it (CACHE_BLOCKS => false ). (5) delete the specified data delete 'scores', 'Jim', 'grade 'delete' scores', and 'Jim '. The data deletion command does not change much either. there is only one: hbase> delete 't1 ', 'r1', 'C1 ', and ts1 have another deleteall command. you can delete the entire row range with caution! If you need to delete a full table, use the truncate command. In fact, there is no direct full table deletion command. This command is also combined by the disable, drop, and create commands. (6) modify the table structure disable 'scores' alter 'scores', NAME => 'info' enable 'scores' alter command to use the following (if the version cannot be successful, general tables need to be disable first): a. change or add a column family: hbase> alter 'T1', NAME => 'F1 ′, VERSIONS => 5 B. delete a column family: hbase> alter 'T1', NAME => 'F1', METHOD => 'Delete' hbase> alter 'T1 ′, 'delete' => 'F1' c. You can also modify table attributes such as MAX_FILESIZEMEMSTORE_FLUSHSIZE, READONLY, and DEFERRED_LOG_FLUSH: hbase> alter 'T1', METHOD => 'Table _ att ', MAX_FILESIZE => '100 728 'd. you can add a table coprocessor hbase> alter 'T1', METHOD => 'Table _ att ', 'coprocessor' => 'hdfs: // foo. jar | com. foo. fooRegionObserver | 1001 | arg1 = 1, arg2 = 2' a table can be configured with multiple coprocessor, and a sequence will automatically grow for identification. To load a coprocessor (a filter program), you must comply with the following rules: [coprocessor jar file location] | class name | [priority] | [arguments] e. remove coprocessor: hbase> alter 'T1', METHOD => 'Table _ att_unset', NAME => 'Max _ filesize' hbase> alter 'T1', METHOD => 'Table _ att_unset ', NAME => 'coprocessor $ 1' f. multiple alter commands can be executed at a time: hbase> alter 'T1', {NAME => 'F1 ′}, {NAME => 'F2', METHOD => 'Delete'} (7) number of statistics rows: hbase> count 'T1' hbase> count 'T1', INTERVA L => 100000 hbase> count 'T1', CACHE => 1000 hbase> count 'T1', INTERVAL => 10, CACHE => 1000 count is generally time-consuming, when mapreduce is used for statistics, the statistical results are cached. the default value is 10 rows. The default statistical INTERVAL is 1000 rows (INTERVAL ). (8) for disable and enable operations, you must first suspend the table availability. for example, the alter operation mentioned above also requires this operation to delete the table. Disable_all and enable_all can operate more tables. (9) to delete a table, stop the table's usability and run the Delete command. Drop 'T1' is a detailed explanation of some common commands. the specific shell commands of all hbase are as follows, which are divided into several command groups. you can see the approximate usage in English, use help "cmd" for detailed usage. Command groups: Group name: general Commands: status, version Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, is_disabled, is_enabled, list, show_filters Group name: dml Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate Group name: tools Commands: assign, balance_switch, bal Ancer, compute, compact, flush, hlog_roll, major_compact, move, split, unassign, zk_dump Group name: replication Commands: add_peer, disable_peer, enable_peer, list_peers, remove_peer, start_replication, stop_replication Group name: security Commands: grant, revoke, user_permission4. since hbase shell is a shell command, you can also write all hbase shell commands to a file and execute all commands in sequence like the linux shell script program. Like writing a linux shell, write all hbase shell commands in a file, and then execute the following command: $ hbase shell test. hbaseshell is convenient and easy to use.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.