Hbase Overview Big Data and NoSQL's past life
Traditional relational database processing is based on a comprehensive ACID guarantee that follows SQL92 's standard table design pattern (paradigm) and data type, based on the SQL language's DML data interaction. For a long time, this kind of information construction based on relational database is developing well, but it is restricted by the data model provided by the relational database, and for the data set of pre-defined model, the relational database can not work well. More and more business systems need to be able to adapt to different kinds of data formats and data sources, without the need for pre-paradigm definitions, often unstructured or semi-structured (such as the logs of users accessing the site), which requires the system to process several orders of magnitude higher than traditional relational databases (usually terabytes and petabytes of scale). Traditional relational databases can scale vertically to a certain extent (such as Oracle's RAC,IBM PureScale). But this usually means high software licensing fees and complex application logic.
Based on the huge changes in system requirements, data technology pioneers have had to redesign the database, and the dawn of big data-based NoSQL has emerged, with big data and NoSQL used first on Google, Facebook and other internet companies, followed by the financial and telecoms industries, and many Hadoop &nosql's open-source big data projects have sprung up and are being used by companies such as the Internet to handle massive and unstructured types of data. Some projects focus on fast Key-value key-value storage, some focus on built-in data structures or document-based abstraction, some NoSQL management technology frameworks sacrifice current data persistence for performance, do not support strict ACID, and some open-source frameworks even write data to hard drives for performance ...
HBase is a remarkable member of NoSQL, and HBase provides a key-value API that promises strong consistency, so clients can see the data as soon as they write. HBase relies on Hadoop's underlying distributed storage mechanism, so it can run on clusters of multiple nodes and transparently apply code to clients, making it easy for each developer to design and develop Hbase's big data projects. Hbase is designed to handle terabytes-to-petabytes of data and is optimized for this type of massive data and high concurrency access, as part of the Hadoop ecosystem, which relies on important features provided by other Hadoop components, such as DataNode data redundancy and MapReduce annotation processing.
Back to top of page
Introduction to Hbase Architecture and framework
In this article we briefly describe the architecture and framework of HBase, a database designed specifically for semi-structured data and horizontal extensibility. It stores the data in a table, organized by the four-dimensional coordinate system of the row, column, column qualifier, and time version. Hbase is a modeless database that requires only a pre-defined column family and does not need to specify a column qualifier. It is also a untyped database, all data is stored in binary bytes, and there are 5 basic ways to manipulate and access Hbase, namely Get, Put, Delete, and Scan, and Increment. The only way Hbase is based on non-row health-value queries is through a scan with filters.
HBase is designed as a fully distributed storage cluster that relies on Hadoop HDFS in terms of physical architecture and the MapReduce grid based on Hadoop, due to its high concurrency for petabytes, terabytes of storage, massive table records of billions of rows of data, and the design intent of the ultimate performance query retrieval. Compute framework to support high-throughput data access, support availability and reliability, as shown in the overall architecture of:
Figure 1.Hbase Overall architecture diagram
From what we can see the constituent parts of hbase, each table in HBase is divided into multiple sub-tables (hregion) by the row key according to a certain range, the default one hregion more than 256M will be divided into two, managed by Hregionserver, manage which Hregion is assigned by Hmaster.
Hregionserver When a child table is accessed, a Hregion object is created, and a store instance is created for each column family of the table (column Family), each store has 0 or more storefile corresponding to each store file will correspond to a hfile,hfile is the actual storage files. As a result, there are as many stores as a hregion number of column families. In addition, each hregion also has a memstore memory cache instance.
The HBase storage format is a Hadoop-based HDFs Distributed File system, and all data files in hbase are stored on Hadoop HDFS in two main formats:
(1) The storage format of KeyValue data in Hfile:hbase, hfile is the binary format file of Hadoop, in fact StoreFile is a lightweight packaging for hfile, that is storefile at the bottom of hfile.
(2) The storage format of the WAL (Write Ahead Log) in HLog File:hbase, which is physically the Sequence File of Hadoop.
HBase is a column-oriented distributed storage system based on Google's BigTable, and its storage design is based on memtable/sstable design, mainly divided into two parts: in-memory Memstore (memtable), and the other part is HDF hfile (sstable) on the S. There is also the storage of the WAL log, the main implementation class for HLog.
(3) Memstore:memstore is the map that holds the Key/value map in memory, and when Memstore (the default 64MB) is full, it starts to flush to the disk (that is, Hadoop's HDFS) operations.
To help the reader understand, here we will hfile more detail data structure to do a brief introduction, hfile is based on the Hadoop tfile file type, its structure as shown:
Figure 2.HFile Structure diagram
As shown, the file length of the hfile is variable, and only the info/trailer portion is fixed length, and Trailer has pointers to the starting point of the other data blocks. The INDEX data block then records the starting point for each data block and Meta block. Both the data block and the Meta block are optional, but for most hfile, there is a data block.
HLog is used to store hbase log files, similar to the traditional relational database, in order to ensure read consistency and Undo/redo rollback and other data recovery operations, HBase will write the data is the first Write-ahead-log (WAL) operation. Each hregionserver corresponds to a HLog instance, and Hregion hregionserver the HLog as a constructor at initialization to initialize the HLog instance.
HLog file is a Sequence file that can only be added to the end of a file. In addition to the file header, the HLog file is composed of a hlog.entry of strips. Entry is the basic part of HLog, and also the fundamental unit of Read/write.
If you are interested in the details of the HBase architecture, you can learn about HBase's overall architecture and underlying physical storage mechanisms through HBase's official website.
Back to top of page
Hbase Retrieval Time Complexity
Since the purpose of hbase is to efficiently, reliably, and highly concurrent access to large amounts of unstructured data, the time complexity of hbase retrieving data is related to the importance of the development and design of HBase-based business systems, and how quickly the operation of HBase is done, this paper makes a brief analysis from the mathematical perspective of computer algorithms. This allows the reader to understand the considerations in Hbase Business modeling and design patterns in the following project examples.
We will first define the data information for Hbase with the following variables:
The number of KeyValue entries in the N= table (including the Put result and the tag left by the Delete)
Number of B=hfile Lee database (hfile Block)
e= average number of KeyValue entries in a hfile (can be calculated if you know the size of the row)
C= The average number of columns in each row
We know that there are two special tables in Hbase:-root-&. Meta., where. Meta. Table records region partition information, and. META. You can also have more than one region partition, while the-root-table also records. META. The region information for the table, but-root-has only one region, while the location of the-root-table is recorded by the Hbase cluster control framework, which is Zookeeper.
About-root-&. META. The details of the table are no longer described here, and interested readers can refer to hbase–root-and. META. To understand the Hbase IO and data retrieval timing principles.
Hbase retrieves a piece of data as shown in the process.
Figure 3.Hbase Retrieving
As we can see, Hbase retrieves a customer data that needs to be processed roughly as follows:
(1) If you do not know the row Jian, directly find the column Key-value value, you need to find the entire region, or the entire Table, then the time complexity is O (n), this is the most time-consuming operation, usually the client program is not acceptable, We mainly analyze the time complexity of the search for a row-based scan, which is the following 2 to 4 steps.
(2) The client is looking for the correct regionserver and region. Call 3 fixed operations to find the correct region, including finding ZooKeeper, finding the-root-table, and looking for it. META table, which is an O (1) operation.
(3) In the specified region, the line in the reading process may exist in two places, if not written to the hard disk, it is in the Memstore, if it has been written to the hard disk, in a hfile
Assuming there is only one hfile, this row of data is either in this hfile or in Memstore.
(4) for the latter, the time complexity is usually relatively fixed, that is, O (log e), for the former, the analysis is much more complex, in the hfile to find the correct data block is a time complexity of O (log b) operation, find this row of data, and then find the column cluster inside KeyValue object is a linear scanning process (the column data of the same column is usually in the same data block), so the time complexity of the scan is O (el B), if the column data in the column family is not the same data block, you need to access multiple contiguous data blocks, the time complexity is O (c), So the time complexity is the maximum of two possibilities, namely O (Max (c, El B)
In summary, the time overhead for finding a row in Hbase is:
O (1) for locating region
+o (log e) is used to locate KeyValue in region if it is still in Memstore
+o (log b) is used to find the correct data block inside the hfile
+o (Max (c El B) is used to find hfile
Readers who are unfamiliar with the above O (1), log, and so on, can refer to the "algorithmic reference" in the reference resource to understand the relevant mathematical statistical symbols and the computational formulae of the time complexity of the computer.
Back to top of page
Hbase Design Combat
The overall architecture of HBase described above and the time complexity analysis of the search, we can see that the design and data storage of the row key, cluster, etc. determine the overall performance of hbase and the efficiency of executing queries, and many projects and technicians using hbase can skillfully use the HBase Shell or SDK API Access Hbase, make table creation, delete DDL, and Put/delete/scan, and so on, but delve into how many columns are needed, how many columns a column family needs, what data should be stored in the column name, and what data should be stored in the unit and other key issues in development design.
HBase-based system design and development, the need to consider the factors different from the relational database, Hbase mode itself is very simple, but gives you more room to adjust, some patterns write performance is good, but read the data is not performing well, or just the opposite, similar to traditional database based on the model or modeling, Considering the Hbase design pattern in a real-world project, we need to start with the following elements:
- How many columns of the table should be
- What data is used by a column cluster
- How many columns should be in each of the column families
- What the column name should be, although the column name does not have to be defined when the table is built, it is required to read and write data
- What data should the unit hold
- What time versions are stored for each cell
- What the health structure is and what information should be included.
The following is an example of a real customer case using hbase technology, which illustrates the practice of hbase design patterns in real-world projects, and through different table design patterns, shows how patterns affect the table structure and how tables are read and written, and how the performance of client-side retrieval queries can be affected
Customer Scenario Introduction
Customer Profile: The customer is an Internet mobile game platform, the need for the majority of hand-play home for hand-tour product statistical analysis, need to store each hand to play home that the customer's attention to each hand travel products (game heat), and storage time dimension of the attention of information, In order to be able to dig for customers ' preferences and carry out similar precision marketing of hand-guided push, advertising and marketing services, so as to expand the platform's user volume and improve user adhesion.
The platform to get a lot of product classification, a total of more than 500, registered players (user accounts) in the number of 2 million, the number of online players more than 50,000, daily use of hand-frequency peak in 100,000/people above, the annual increment of more than 10%.
According to the above demand, the hand tour product dynamic growth, can not determine which hand travel products need to be stored, all the storage will be more than 200 columns, resulting in a lot of wasted space, the players use the frequency and classification of hand travel every day, the customer registered users over million, the use of heat data by day more than 10 million lines, Massive data also makes the table query and business analysis need a large number of clusters and SQL optimization, inefficient, so the traditional relational database is not suitable for this kind of data analysis and processing needs, in the project we decided to use Hbase for data layer storage analysis.
High table Design
Let's go back to the design pattern above to consider the design of the table in this customer case, we need to store player information, usually number, QQ number and registered on the mobile platform account number, and need to store the user's attention to what hand travel products information, and users will play one or more hand tour products each day, each product play once or several times, So it should be the user's attention to a one-hand tour (number of uses), which is a dynamic value per day, and the user's competitor tour is a set of many-to-many keyvalue key values. The Hand tour platform manufacturers are concerned about such as "XXX Customer attention YYY hand tour?" "," YYY hand tour by the user attention? "This type of business dimension analysis.
Suppose each hand play home every day the attention of each product exists in the table, then one possible design is each user one line per day, a user id+ the day's time stamp as a row jian, to create a Save the Hand tour product usage information of the column, each column represents the day of the user's use of the product.
In this case, we only designed a single cluster, a specific cluster in the HDFS will be responsible for a region, the physical storage under this region may have multiple hfile, a column family to keep all the columns on the hard disk together, using this feature allows different types of column data placed on different columns, In order to isolate, this is also the reason Hbase is called column-oriented storage, in this table, because all the hand travel products do not have a clear classification, the access mode of the table does not need to break up the product type, and therefore does not require the division of multiple clusters, you need to be aware of one thing: once you create a table, Any action on the table's cluster usually requires the table to be offline first.
We can create a table using the HBase shell or the HBase SDK API, an example of the HBase shell script is as follows:
Listing 1. Hbase Shell Script Example
$hbase shellversion 0.92.0, r1231986, Mon Nov 13:16:35 UTC 2015$hbase (main): 001:0 >create ' prodfocus ', ' degeeinfo ' 0 Row (s) in 0.1200 secondshbase (main):008:0> describe ' prodfocus ' DESCRIPTION ENABLED ' Prodfocus ', {NAME = ' CF ' , data_block_encoding = True ' NONE ', bloomfilter = ' ROW ', replication_scope = ' 0 ', VERSIONS = ' 1 ' , COMPRESSION = ' NONE ', min_ve rsions = ' 0 ', TTL = ' 2147483647 ', keep_deleted_ce LLS = ' false ', BL ocksize = ' 65536 ', in_memory = ' false ', Blockcache = ' true '} 1 row (s) in 0.0740 seconds
Now, as shown in the table, a table that contains sample data.
Table 1. Example of Prodfocus expression
|
Rowkey: User id$ time of day |
Degee (column cluster, hand travel heat information) |
qq121102645$20141216 |
Degee:3darpg:6 |
Degeeinfo:dtlegend:1 |
|
weixin_295y603765de8$12140928 |
Degree:dtlegend:3 |
|
|
chaochenyy$12141109 |
Degree:3countrybattle:1 |
Degree:forget Xian:1 |
|
qq5863976645$20141214 |
Degree:frus3d:2 |
|
|
hexaoyang$20140907 |
Degree:space hunter:1 |
Degree:3countrybattle:2 |
Degree:frus3d:1 |
fengke_tony$20150216 |
Degree:dtlegend:1 |
|
|
junping_jeff$20141204 |
Degree:frus3d:2 |
|
|
xiaofenxia$20150716 |
Degree:forget Xian:3 |
|
|
The table design is explained as follows:
Rowkey for qq121102645$20141216 for QQ121102645 's hand-play home (in the case of the Federal certified QQ) on December 16, 2014, the day of the game record; column clusters Focuspro record the day of the bank account for each product type click Heat (number of games), such as space hunter::1 means play (or open) space Hunter: (Time Hunter) 1 times
Now you need to verify that the table meets the requirements, and the most important thing to do is to define the access pattern, that is, how the application accesses the data in the HBase table, and should do so as early as possible throughout the HBase system design and development process.
We are now looking at whether the Hbase table we are designing can answer customer concerns: for example, "What are the hands-on tours that users of the account QQ121102645?" ", further thinking in this direction, has a related business analysis problem:" QQ121102645 users have played 3CountryBattle (three 3) hand tour? "Which users have followed Dtlegend (turret legend)?" "3CountryBattle (three 3) has the tour been followed? ”
Based on the present Prodfocus table design, to answer the "account for the QQ121102645 of the user concerned about which hand tour?" "This access mode allows you to perform a simple scan scan on the table, which returns the entire row of the entire QQ121102645 prefix, and each row of columns can be traversed to find a list of user-focused hand tours."
The code examples are as follows:
Listing 2. Client query user focus on hand tour list
static {Configuration Hbase_config = new configuration (); Hbase_config.set ("Hbase.zookeeper.quorum", "192.168.2.6"); Hbase_config.set ("Hbase.zookeeper.property.clientPort", "2181"); CFG = new Hbaseconfiguration (hbase_config); }htablepool pool = new Htablepool (); Htableinterface ProdTable = pool.gettable ("Prodfocus"); Scan a = new scan (), a.addfamily (Bytes.tobytes ("Degreeinfo")), A.setstartrow (Bytes.tobytes ("QQ121102645")); Resultscanner results = Prodtable.getscanner (a); list<keyvalue> list = Result.list (); list<string> followgamess = new arraylist<string> (); for (Result r:results) {KeyValue kv = Iter.next ();; String Game =kv.get (1];followgames.add (user);}
Code interpretation: First set up the Hbase master host and Client connection ports through the configuration, and then use the Htableinterface interface sample to connect to the Prodfocus table because Prodfocus table Rowkey is designed as a user ID +$+ the time stamp of the day, so we create a scan that takes the user "QQ121102645" as the search prefix, and the scan returns Resultscanner all the row data related to that user, traversing the "degreeinfo" of each line Each column in the column cluster will be able to get all of the user's attention (played) in the Hand tour product.
The code about the HBase API operation is no longer detailed here, and interested readers can check the HBase SDK and familiarize themselves with the HBase table and put, scan, and delete codes.
The second question, "QQ121102645 whether the user has played 3CountryBattle (Three Kingdoms 3) Tour" is similar to the first, the client code can use Scan to find all rows of the QQ121102645 prefix, the returned result The collection can create an array list and traverse the list to check if the 3CountryBattles hand tour is present as a column name to determine if the user is interested in a tour, similar to the code in question 1 above:
Listing 3. The client determines whether the user is interested in a tour
Htablepool pool = new Htablepool (); Htableinterface ProdTable = pool.gettable ("Prodfocus"); Scan a = new scan (), a.addfamily (Bytes.tobytes ("Degreeinfo")), A.setstartrow (Bytes.tobytes ("QQ121102645")); Resultscanner results = Prodtable.getscanner (a); List<integer> degrees = new arraylist<integer> (); list<keyvalue> list = Results.list ();iterator<keyvalue> iter = List.iterator (); String gamenm = "3CountryBattle", while (Iter.hasnext ()) {KeyValue kv = Iter.next (), if (Gamenm.equals (bytes.tostring ( Kv.getkey ())) {return true;}} Prodtable.close ();
Code interpretation: Also by scanning the prefix "QQ121102645" scan to perform table retrieval operations, the returned list<keyvalue> array each key-value is a degreeinfo column family in each column of the key-value pairs, That is, the user concerned (played) the hand tour product information, determine whether its Key value contains "3CountryBattle" Game name information can know whether the user is concerned about the hand tour products.
It seems that this table design is simple and practical, but if we go on to the third and fourth business question "which users are following dtlegend (turret legend)?" "3CountryBattle (three 3) has the tour been followed? ”
As you can see, the existing table design is placed in multiple column fields for multiple hand tours, so when a user's preference for products tends to diversify (product Key-value key-value pairs can be a lot, meaning that a Rowkey table-column cluster grows longer, which is not a big problem in itself, But it affects the code pattern read by the client, which complicates the application of the code to the client.
At the same time, for the third and fourth issues, each additional Key-value key value of the hand tour, the client code must first read out the user's row row, and then traverse each column field in all rows and columns. From the principle of HBase index above and the mechanism of internal retrieval we know that the row health is the determinant of all Hbase indexes, and if you do not know the row health, you need to limit the scan to several hfile data blocks, and more troublesome, if the data has not been read from HDFS to block cache, read from the hard disk hfile is more expensive, from the time complexity analysis of hbase retrieval above, the current HBase table design pattern needs to retrieve each column in region, and efficiency is the number of columns *o (max (El B), which is theoretically the most complex data retrieval process.
The third fourth business issue is more concerned with the performance of the client's real-time analysis of the results of the analysis, so the design pattern should be designed to improve the retrieval efficiency of Hbase and reduce the overhead of accessing a wide line.
Wide table Design
The ease and flexibility of Hbase design patterns allows you to make a variety of optimizations that can greatly simplify client code and significantly improve retrieval performance without much work. Let's look at another design pattern for the Prodfocus table, where the previous table design is a wide table (wide table) pattern, where a row consists of many columns. Each column represents the heat of a particular tour. The same information can be stored in the form of high table (tall table), and the new high-form design of the product attention table structure is shown in table 2.
Table 2. Example of prodFocusV2 expression
|
|
CF (column cluster, date-time-stamped attention data) |
Rowkey: Being followed by a product $ A user |
|
3DARPG$QQ121102645 |
20141224:6 |
DTLegend$QQ121102645 |
20141216:1 |
DTLegend$WeiXin_295y603765de8 |
20141212:3 |
3CountryBattle$ChaoChenYY |
20141214 :2 |
Frus3D$QQ 5863976 645 |
20150906:2 |
Space Hunter:&hexaoyang |
20,140,907:1 |
3CountryBattle $HeXiaoYang |
20,140,907:2 |
Frus3D$Hex iaoYang |
20,140,907:1 |
DTLegend$ Fengke_tony |
20150216:1 |
Frus3D$ Junping_jeff |
20141204:2 |
Forget Xian$xiaofenxia |
20,150,716:3 |
Table Explanation: The product in one day by a user-focused relationship design to Rowkey, and its focus on data only with a key-value to store, Row Jian daqier_weixin1398765386465 concatenated two values, product name and user's account number, So the original table design of a user in one day of information is converted to a "product-concerned users" relationship, which is a typical high-table design.
The KeyValue object in hfile stores the name of the column cluster. Using short cluster names is helpful in reducing hard disk and network IO. This optimization can also be applied to row health, column names, and even units. The compact Rowkey storage business data means that the IO load is greatly reduced when the application is retrieved. This new design in the answer before the business concerns "which users focus on XXXX products?" "or" XXXX products have been concerned about it? "This kind of problem can be based on the use of a get () direct answer, there is only one cell in the column cluster, so there is no problem of multiple keyvalue traversal in the first design, access in Hbase it resides in the Blockcache a narrow line is the fastest read operation. From an IO perspective, scanning these lines executes the GET command on a wide line and then traverses all the cells, and the amount of data read from Regionserver is the same, but the index access efficiency significantly increases
For example, to analyze the "3CountryBattles (Three Kingdoms) Hand Tour is QQ121102645 users concerned?" , the client code example is as follows:
Listing 4. The client determines whether a particular tour product is concerned
Htablepool pool = new Htablepool (); Htableinterface ProdTable = pool.gettable ("prodFocusV2"); String usernm = "QQ121102645"; String gamenm = "3CountryBattles"; Get g = new Get (Bytes.tobytes (usernm+ "$" +gamenm)); G.addfamily (Bytes.tobytes ("Degreeinfo")); Result r = Prodtable.get (g); if (!r.isempty ()) {return true;} Table.close ();
Code explanation: Because ProdFocusV2 's rowkey design changed to be concerned about the product $ user Id of the high-table mode, hand-travel products and user information directly stored in the health, so the code to hand Tour product name "3CountryBattles" + "$" + user account " QQ121102645 "Byte data as a Get key value, perform a get operation directly on the table, determine whether the returned result set is empty to know whether the hand tour product is the user's attention.
We use the stress test to test the concurrency access performance in the two Hbase table design patterns, under the millions and Tens row data conditions, using the two design modes of wide table and high table, in the "users who pay attention to 3CountryBattles" query, get result The appropriate time to retrieve the results is shown in the following table:
Table 3. High table vs Wide table retrieval performance comparison
|
Wide table design mode (prodFocsV1) |
High table design mode (prodFocsV2) |
5 million rows of data |
0.237s |
0.079s |
10 million rows of data |
0.418s |
0.112s |
20 million rows of data |
0.83s |
0.283s |
It can be seen that the performance of the high table is more than 50% higher than the wide table in the product concern dimension of the customer's concern, which is the performance of the Rowkey and the cluster design affecting the successful use of HBase index retrieval in hbase design mode. It is important to master the mechanism of Hbase data storage and the mechanism of internal retrieval, which is a large part of the reason is that the use of this mechanism is an opportunity to improve performance.
Back to top of page
Other Tuning considerations
There are, of course, some other optimization techniques. You can use the MD5 value as the line health, so you can get a fixed length of rowkey. There are other benefits of using hash keys, which you can use MD5 to remove the "$" delimiter, which brings two benefits: the first is that the row keys are uniform in length and can help you better predict read and write performance. The second benefit is that after the delimiter is no longer required, the operation code of the scan makes it easier to define the start and stop key values. In this case, you can use the MD5 hash value based on the user + hand tour name to set the immediate starting line (StartRow and Stoprow) of the scan scan to find the latest hot information about the hand tour.
Using hash keys also helps to distribute the data more evenly across the region. In this case, if the customer's attention is normal (that is, every day different customers play different games), the distribution of data is not a problem, but it is possible that some customers are naturally inclined to focus (that is, a user is like a certain one or two products, the daily heat is on these one or two products), it will be a problem, You will encounter loads that are not apportioned across the Hbase cluster but focus on the region of a particular hotspot, which can be a bottleneck for overall performance, and if the daqier_weixin1398765386465 mode is computed MD5 and the result is a row key, You will achieve an even distribution on all region.
Examples of using MD5 hash prodFocusV2 table are as follows:
Table 4. Example of Rowkey MD5 expression
|
ROWKEY:MD5 (product followed by a user) |
CF (column cluster, date-time-stamped attention data) |
3b2c85f43d6410d6 |
20141224:6 |
82c85c2cdf16dcee |
20141216:1 |
8480986fd88c1a39 |
20141212:3 |
3671c0efbe01ae88 |
20141214 :2 |
baf933cac7dd2814 |
12141109:2 |
65ae48cfaae57972 |
20,140,907:1 |
732106051f4a2ef8 |
20,140,907:1 |
f3b59010d3f8fb2d |
20,140,907:2 |
402480df0adfbcf9 |
20150216:1 |
9171607fa5190507 |
20141204:2 |
296be556a86dd505 |
20,150,716:3 |
Back to top of page
Conclusion
This paper introduces the overall architecture and basic principle of nonsql typical platform-hbase under the Hadoop Big Data platform, and analyzes the design pattern of hbase table under the HBase physical model and retrieval working mechanism. And with a real hand tour of the company's customer case, describes the design Hbase table hour to customer access patterns and performance requirements of the skills, through the different design patterns in the code implementation and test comparison of the best practice reference case detailed.
Hbase Design and development combat