Document directory
- Concepts
- Tall-narrow vs. Flat-wide tables
- Partial key scans
- Pagination
- Time Series
- Time ordered relations
- Implicit Versioning
- Custom Versioning
This chapter mainly describes how to design hbase schema. We strongly recommend the following presentation for this topic, which is clearly written.
First, I once again stressed that nosql cannot replace SQL. For non-bigdata, there is no doubt that SQL is more useful. for systems or scenarios, we should not persistently consider using nosql to replace SQL, but simply put the big data that SQL cannot handle (often with poor relationship) on nosql.
For example, when designing nested relationships (multi-layer one-to-multiple relationships), it is very troublesome to use SQL statements. Multi-table join is required for queries, while hbase or couchdb is very convenient for design and query.
For complex relationships, such as many-to-many relationships, it is a headache to use hbase design, because hbase only uses nested entities to indicate relationships.
Because the core design of hbase is the DDI mentioned by Lars,Denormalization, duplication and intelligent keys
The anti-normalization mechanism avoids join and other complex relational operations through repeated data. Therefore, if the data is updated frequently, this design will be very troublesome.
Therefore, in the nosql design, Jin Ke Yu law is: Design for the questions, not the answers
First, we need to understand that nosql is a compromise. Any nosql design is only optimizing the current scenario, just to answer this question.
If you forget about perfection, you cannot propose a nosql design that fits all needs. Therefore, nosql must be a transitional technology. It cannot be revolutionary, not beautiful...
Hbasecon 2012 | hbase schema design-ian Varley, Salesforce
Http://www.slideshare.net/cloudera/5-h-base-schemahbasecon2012
There are a lot of content, and it is clear that we will list the last few points summarized in hbase pattern design.
0: the row key design is the single most important demo-you will make.
The row key design is the most important. What aspects should we consider?
1: Design for the questions, not the answers.
If you aren't 100% sure what the questions are going to be. Use a relational DB for that! Or a document database like couchdb
When the problem is unclear and the scenario requirement is unclear, a common dB schema is required. Therefore, hbase is not suitable. You should select a document database such as SQL or couchdb.
2: there are only two sizes of data: too big, and not too big.
If the data volume is not large enough to use hbase, use SQL
3: be compact. You can squeeze a lot into a little space.
This typical example shows how opentsdb optimizes rowkey and column design. For more information, see lessons learned from opentsdb.
4: Use row atomicity as a design tool.
Hbase provides the row atomicity guarantee, which is very important. Therefore, if you want to perform a transaction operation, put it in the same row.
If data is split into two tables and stored as two rows, atomicity cannot be guaranteed.
5: attributes can move into the row key.
Even if it's not "identifying" (part of the uniqueness of an entity), adding an attribute into the row key can make access more efficient in some cases.
This can effectively reduce the storage granularity, tall-narrow design
6: If you nest entities, You Can transactionally pre-aggregate data.
You can recalculate aggregates on write, or periodically with a map/reduce job.
Advanced hbase Schema Design
Berlin buzzwords, June 2012
Lars George
Http://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/slides/hbase-lgeorge-bbuzz12.pdf
Basically, the content in the definite guide is used as a presentation.
DDI
• StandsDenormalization, duplication and intelligent keys
• Needed to overcome into comings of Architecture
• Denormalization-> replacement for joins
• Duplication-> Design for reads
• Intelligent keys-> implement indexing and sorting, optimize reads
Hbasecon 2012 | lessons learned from opentsdb-Benoit sigoure, stumbleupon
Http://www.slideshare.net/cloudera/4-opentsdb-hbasecon
Key Design
Concepts
Figure 9.1, "The cells are stored self-contained and moving pertinent information is nearly free"
Refer to advanced hbase schema design to explain how hbase converts a logical data model to a physical implementation.
In the figure, the logic model is described as fold by columnfamliy, and different columnfamliy files are stored in different storage files.
The last step shift is very important. The key design can be used to operate mashing on rowkey and columnkey to achieve better read and write efficiency.
Figure 9.2. From left to right the performance of the retrieval decreases
Rowkey determines region, while columnfamliy determines hfile.
In addition, due to the multi-version nature of hbase, different hfiles have different timestamp ranges.
Therefore, when querying, of course, rowkey is required and determines the read region.
Different from relational databases, specifying columnfamliy during query greatly improves query efficiency because it determines the number of hfiles read.
If you can specify timestamp, You can further filter hfile to improve query efficiency.
For column qualifier, the value needs to read the data one by one for filtering, which is relatively inefficient.
Tall-narrow vs. Flat-wide tables
At this time you may ask yourself where and how you shoshould store your data. The two choices are tall-narrow, and flat-wide.
The former is a table with few columns but tables rows, while the latter has fewer rows, but has columns.
For the example of email inbox,
The Design of Flat-wide is rowkey: userid, columnkey: emailid
12345: Data: 5fc38314-e290-ae5da5fc375d: 1307097848: "Hi Lars ,..."
The Design of tall-narrow is rowkey: userid-emailid.
12345-5fc38314-e290-ae5da5fc375d: Data :: 1307097848: "Hi Lars ,..."
Which design has its own advantages and disadvantages? Flat-wide can ensure atomicity, but it will lead to a too large row. The design of tall-narrow is more common.
Partial key scans
For the tall-narrow design, many mashing keys need to support partial key scan without providing specific userid-emailid. I only give userid to scan all the messages of this user.
Pagination
Using the above approach of partial key scans it is possible to iterate over subsets of rows.
Common applications, such as providing a userid to scan all the messages of the user, require pagination. It is impossible for me to display all the messages
In addition to start and stop key, you also need to provide the offset and limit parameter
This approach works wellA low numberOf pages. If you were to page through thousands of them, then a different approach wocould be required.
I personally think that, like MySQL, although you set Offset, you still need to read from the beginning, so when the page number is large, it is relatively inefficient.
The solution is very simple, change rowkey to user ID-sequential ID, so that directly set start and stop key, you can solve the page, such as userid-500, userid-550
Time Series
When dealing with stream processing of events, the most common use-case is time series data.
These cocould be coming from a sensor in a power grid, a stock exchange, or a monitoring system for computer systems. Their salient feature is that theirRow key represents the event time.
The key issue is the hot issue. It will be allocated to a region server within a certain range of rowkey ranges. How can this problem be solved?
Simple: randomize the row. The randomization method is as follows, with different degrees
Salting
You can use a salting prefix to the key that guarantees a spread of all rows each SS all region servers.
0myrowkey-1, 1myrowkey-2, 2myrowkey-3, 0myrowkey-4, 1myrowkey-5,
Keys 0myrowkey-1 and 0myrowkey-4 wocould be sent to one region, 1myrowkey-2 and 1myrowkey-5 are sent to another
Use-case:Mozilla Socorro
The Mozilla organization has built a crash reporter-named Socorro [91]-For Firefox and Thunderbird, which stores all the pertinent details when a client asks its user to report a programanomaly.
Field swap/promotion
In case you already have a row key with more than one field, then you canSwapThose. If you have only the timestamp as the current row key then you needPromoteAnother field from
The column keys, or even the value, into the row key.
Use-case:Opentsdb
The opentsdb [92] project provides a time series database used to store Metrics about servers and services, gathered by external collection agents.
<Metric-ID> <base-timestamp>...
Randomization
Using a hash function like MD5 is giving you a random distribution of the key pair SS all available region servers.
UsingSaltedOrPromoted-field keysCan strikeGood balanceOf distribution for write performance, and sequential subsets of keys for read performance. if you are only doing random reads, then it makes most sense to use random keys: This will avoid creating region hot spots.
Summary: As the randomness of rowkey increases, the write efficiency is constantly improved, but the sequential Read efficiency is constantly reduced, so balance is required.
Time ordered relations
Since all of the columns are sorted per column family you can treat this sorting as a replacement for a secondary index, as available in RDBMSs.
Multiple secondary indexes can be emulated by using multiple column families-although that is not the recommended way of designing a schema.
But for a small number of indexes this might be what you need.
Because the columns in the column family are sorted, we can use columnkey for secondary index during the flat-wide design.
Consider the earlier example of the user inbox, which stores all of the emails of a user in a single row.
If you want to sort by receipt time, the receipt time is used as the columnkey, then the email will be automatically sorted by time.
If you want to sort by topic, add a column family and use subject as the column key.
For example, there are two column families in total.
Data family,
12345: Data: 725aae5f-d72e-f90f3f070419: 1307099848: "Welcome, and ..."
12345: Data: cc6775b3-f249-c6dd2b1a7467: 1307101848: "to whom it ..."
12345: Data: dcbee495-6d5e-6ed48124632c: 1307103848: "Hi, how are ..."
From index family,
12345: Index: idx-from-asc-paul@foobar.com: 1307103848: dcbee495-6d5e...
12345: Index: idx-from-asc-pete@foobar.com: 1307097848: 5fc38314-e290...
12345: Index: idx-from-asc-sales@ignore.me: 1307101848: cc6775b3-f249...
Subject Index family,
12345: Index: idx-subject-Desc-\ xA8 \ x90 \ x8d \ x93 \ x9b \ xde: 1307103848: dcbee495-6d5e-6ed48124632c
12345: Index: idx-subject-Desc-\ xb7 \ x9a \ x93 \ x93 \ x90 \ xD3: 1307099848: 725aae5f-d72e-f90f3f070419
Here, he put all the indexes in a family, which are differentiated by the prefix. I think different indexes are clearer with different family names.
Hbasecon 2012 | hbase schema design-ian Varley, Salesforce, page 157, with diagrams
Secondary Indexes
Although hbase has no native support for secondary indexes there are use-cases that need them. the requirements are usually That You Can lookup a cell with not just the primary coordinates-the row key, column family name, and qualifier-but an alternative one. in addition, you can scan a range of rows from the main table but ordered by the secondary index.
Similar to an index in RDBMSs, secondary indexes store a mapping between the new coordinates and the existing ones. Here is a list of possible solutions:
Hbase does not support secondary indexes, but it is required in many scenarios. You only need row key and famliy key, so you need to implement secondary indexes.
In fact, the method mentioned above can also be used to replace index, but the problem is that the row is too large, so a more general solution is required.
Client managed
Moving the responsibility completely into the application layer, this approach is typicallyCombination of a data table and one (or more) lookup/mapping table.
Whenever the code writes into the data table it also updates the Lookup tables. reading data is then either a direct lookup in the main table, or, in case the key is from a secondary index, a lookup of the main row key and then retrieve the data in a second operation.
There are advantages and disadvantages to this approach.
First, since the entire logic is handled in the client code, you have all the freedom to map the keys exactly the way needed.
The list of tables comings is longer though: Since you haveNo cross-row atomicity, For example, in the form of transactions, you cannot guarantee consistency of the main and dependent tables.
This solution puts data and index in two tables, which naturally cannot guarantee atomicity and will lead to inconsistent data and index.
Indexed-transactional hbase
A different solution is offered by the open-source indexed-transactional hbase (in short ithbase) project.
The core extension is the addition of transactions, which are used to guarantee that all secondary index updates are consistent.
TheDrawbackIs that it may not support the latest version of hbase available as it is not tied to its release cycle. It also addsConsiderableAmount of SynchronizationOverheadThat results
In a decreased performance, so you need to benchmark carefully.
Indexed hbase
Another solution to add secondary indexes to hbase is indexed hbase (in short ihbase ).
It forfeits the use of separate tables for each index but maintains themPurely in memory.
These are generated whenRegion is openedFor the first time, or whenMemstore is flushed to disk-Involving an entire regions scan to build the index.
Only the on-disk information is indexed, the in-memory data is searched as-is
This disadvantage is obvious. The space is changed for time, so it is relatively memory-consuming.
Search Integration
A very common use-case is to combine the arbitrary nature of keys with a search based lookup, often backed by a full search engine integration.
Client managed
A prominent implementation of a client managed solution isFacebook inbox search.
The schema is built roughly like this:
• Every row is a single inbox, I. e., every user has a single row in the search table,
• The columns are the terms indexed from the messages, the versions are the message IDs,
• The values contain additional information, such as the position of the term in the document.
Column key is the term, column value is the message ID list, positions, so that you can implement the term search in a single inbox
Lucene
Using Lucene-or a derived solution-separately from hbase involves building the index using a mapreduce job.
An externally hosted project [99] providesBuildtableindexClass, which was formerly part of the contrib modules shipping with hbase.
It scans an entire table and builds the Lucene indexes, which ultimately end up as directories on HDFS-their count depends on the number of specified CERs used. These indexes
Can be downloaded to a Lucene based server, and accessed locally using, for example, a multisearcher class, provided by Lucene.
This simply refers to converting data into the Lucene index file using mapreduce, then you can use Lucene to find the rowkey, and then retrieve data in hbase.
Considering the efficiency, you can also import the required data to the index file when importing the index, so that you do not need to check hbase.
Hbasene
The approach chosen by hbasene is to build an entire search index directly inside hbase, while supporting the well-established Lucene API to its users. The schema used
Stores each document field, aka term, in a separate row, with the specified ents containing the term stored as columns inside that row.
This is similar to client managed solution, which implements a full-text search based on hbase. However, the good news is that he provides well-established Lucene API, and the user seems to be using Lucene.
Transactions
Transactions, offering acid compliance operation Ss more than one row, and more than one table. this is necessary in lieu of a matching schema pattern in hbase. for example updating the main data table, and the secondary index table requires transactions to be reliably consistent.
Often transactions are not needed as normalized data schemas can be folded into a single table, and row, design that does not need the overhead of a distributed transaction support. if you cannot do without this extra control, here are a few possible solutions:
For hbase, the best transaction solution is to put them in a row.
If you still want transaction, use the following solution.
Transactional hbase
The indexed transactional hbase project comes with a set of extended classes that replace the Default Client-and server-side ones, while adding support for transactions within SS row and table boundaries.
Zookeeper
Hbase requires a zookeeper ensemble to be present, acting as the seed, or Bootstrap mechanic, for cluster setup.
There are templates, or recipes, available that show how zookeeper can also be used as a transaction control back-end.
For example, the cages project offers an export action to implementLocksUsing SS multiple resources, and is scheduled to add a specialized transactions class-using zookeeper as the distributed
Coordination system.
Bloom Filters
To put it simply, hbase file hfile is organized by columnfamily, and a total of columnfamily may need to be stored into many hfiles. What should I do if I want to retrieve the data of this columnfamily in a row, the simple method is to check all hfiles, which is very inefficient.Bloomfilter can be used to create a rowkey filter for each hfile. Before opening the hfile, We can filter it to see if the file may contain the row. this greatly reduces the number of opened files and greatly improves the efficiency.
This article also mentions a row + column bloom filter, which is useful only when the columnfamily of the row is large and multiple hfiles are often required to store the columnfamily data of the row.
Versioningimplicit Versioning
It was pointed out before that you shoshould ensure that the clock on your servers is synchronized. one of the issue is that when you store data in multiple rows when SS different servers, using the implicit timestamps, you end up with completely different times set.
Hbase uses server time as the version Timestamp by default, but the problem is that the time of each server may not be synchronized, which may cause problems.
This can be avoided by setting an agreed, or shared, timestamp when storing these values.
The put operation allows you to set a client-side timestamp that is used instead, therefore overriding the server time.
Obviusly, the better approach is to rely on the servers doing this work for you, but you might be required to use this approach in some circumstances.
Of course, it is best to automatically synchronize time between servers. If not, we can specify a client-side timestamp during the put operation to avoid non-synchronization issues.
Another issue with servers not being aligned by time is exposed by region splits.
Assume you have saved a value on a server that is one hour ahead all other servers in the cluster, using the implicit timestamp of the server. ten minutes later the region is split and the half with your update is moved to another server. five minutes later you are then inserting a new value for the same column, again using the automatic server time. the new value is now considered older
Then the initial one, because the first version has a timestamp one hour ahead of the current server's time. if you do a standard get call to retrieve the newest version of the value, you wocould get the one that was stored first.
Another issue that does not synchronize time between servers, caused by region splits
Region on the server with fast time is allocated to a slow server after split. When new data is updated, the timestamp of the slow server is used, which leads to chaotic update order.
Custom Versioning
Since you can specify your own timestamp values-and therefore create your own versioning scheme-whileOverriding the server-side TimestampGeneration Based on the synchronized server time, you are free to not use epoch based versions at all.
For example, you cocould use the timestamp with a global number generator [104] that supplies you with ever increasing, sequential numbers starting at "1". Every time you insert a new value you
Retrieve a new number and use that when calling the put function.
You must do this for every put operation, or the server will insert an epoch based timestamp instead. there is flag in the table or column descriptors that indicate your use of custom timestamp values, or in other words your own versioning. if you fail to set the value it is silently replaced with the server timestamp.