NetEase Video Cloud Technology share: HBase-Build Table statement parsing

Source: Internet
Author: User
Tags md5 hash

NetEase Video Cloud Technical experts to share a technical article: HBase-Build Table statement parsing.

  Like all other databases, HBase also has the concept of a table, where there are table statements, and the table statement also largely determines the form of storage, read and write performance. For example, we are familiar with MySQL, the data type in the building table statement determines the data storage form, the primary key, index will greatly affect the data read and write performance. Although HBase does not have the concept of a primary key or index, some things are as important in the HBase world as they are!

  Nonsense not to say, directly on an HBase build table statement, for you crossing decomposition analysis:

Create ' newsclickfeedback ',{name=> ' Toutiao ',versions=>1,blockcache=>true,bloomfilter=> ' ROW ', Compression=> ' SNAPPY ', TTL = ' 259200 '},{splits = [' 1 ', ' 2 ', ' 3 ', ' 4 ', ' 5 ', ' 6 ', ' 7 ', ' 8 ', ' 9 ', ' A ', ' B ', ' C ', ' d ', ' e ' ', ' F ']}

  The above-mentioned statement represents the creation of a table with a table named "Newsclickfeedback", which contains only one column cluster "Toutiao". Next, focus on what the other fields mean and how to set them up correctly. Note: Because space is limited this article does not explain the specific work principle, the follow-up will have related topics to analyze it.

  VERSIONS

  Data version number, the HBase data model allows a cell's data to be a multi-version dataset with different timestamps, and the versions parameter specifies a maximum of several versions of data to be saved, with a default of 1. If a user wants to save two historical versions of the data, you can set the versions parameter to 2, and then use the following Scan command to get all the historical data:

Scan ' Newsclickfeedback ', {VERSIONS = 2}

  Bloomfilter

  Bron filter, optimized for HBase read performance, optional value none| row| Rowcol, the default is None, which can be enabled separately for a single column cluster. Enable the filter, for get operation and partial scan operation can eliminate the unused storage files, reduce the actual IO times, improve the random read performance. The row type applies to lookups based on row only, whereas the Rowcol type is for federated lookups based on Row+col, as follows:

  Row type applies to: Get ' newsclickfeedback ', ' row1′

  Rowcol type applies To: Get ' newsclickfeedback ', ' row1′,{column = ' Toutiao '}

  For businesses with random reads, it is recommended to turn on the row type filter and use space change time to improve random read performance.

  COMPRESSION

  Data compression method, HBase supports multiple forms of data compression, on the one hand to reduce data storage space, on the one hand, reduce the data network transmission and improve read efficiency. There are currently three types of compression algorithms supported by HBase: GZIP | LZO | Snappy, the table below compares the compression rate and codec rate in three different ways:

  Snappy compression rate is the lowest, but the codec rate is the highest, the CPU consumption is also minimal, it is generally recommended to use snappy

  Ttl

  The data expires in seconds and is persisted by default. For many businesses, sometimes there is no need to permanently save some data, permanent saving will lead to more and more data, consumption of storage space is one, on the other hand will lead to inefficient query. If the expiration time is set, HBase checks the data for expiration in the compact, and the expired data is deleted. The user can be set to one months or three months depending on the specific business scenario. Example ttl = ' 259200 ' Setting data expiration time is three days

  In_memory

  Whether the data resides in memory and defaults to false. HBase provides a cache area for frequently accessed data, which typically stores data that is small, frequently accessed, and common scenarios for metadata storage. By default, the cache area is equal to the JVM heapsize * 0.2 * 0.25, and if the JVM heapsize = 70G, the size of the storage area is approximately equal to 3.2G. It is important to note that HBase meta metadata information is stored in this area, and if the business data is set to true and is too general to cause the meta data to be swapped out, resulting in a degraded overall cluster performance, you need to be extra careful when setting this parameter.

  Blockcache

  Whether to turn on the block cache cache, which is turned on by default.

  Splits

  Region pre-allocation policy. Through the region pre-distribution, the data will be balanced to more than one machine, so as to some extent to solve the hotspot application data volume explosion caused by automatic split system performance problems. HBase data is in ascending order according to Rowkey, in order to avoid hot data generation, generally use hash + partition way to pre-allocate region, such as example Rowkey first use MD5 hash, and then followed by the first letter partition to 16 parts, You can pre-allocate 16 region.

More technical advice, please pay attention to NetEase Video Cloud official website (vcloud.163.com) and official (vcloud163).

NetEase Video Cloud Technology share: HBase-Build Table statement parsing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.