A detailed description of HBase, a simple database in Hadoop

Source: Internet
Author: User
The data model HBase database uses a data model that is very similar to Bigtable. You can store many data rows in a table. Each data row contains a sortable keyword and any number of columns. The table is sparse, so the rows in the same table may have very different columns, as long as you like this. Write operations are row-locked and you cannot perform one operation

The data model HBase database uses a data model that is very similar to Bigtable. You can store many data rows in a table. Each data row contains a sortable keyword and any number of columns. The table is sparse, so the rows in the same table may have very different columns, as long as you like this. Write operations are row-locked and you cannot perform one operation

Data Model

HBase databases use data models that are very similar to Bigtable. You can store many data rows in a table. Each data row contains a sortable keyword and any number of columns. The table is sparse, so the rows in the same table may have very different columns, as long as you like this.

Write operations are row-locked. You cannot lock Multiple rows at a time. All write operations on rows are atomic by default.

All database update operations have timestamps. For each data unit, HBase only stores the latest version of a specified number. The client can query "latest data from a certain time point" or obtain all data versions at a time.

Conceptual Model

A table is a set of rows. Each row contains a row keyword (and an optional timestamp) and columns that may have data (sparse ). The following example illustrates the problem:

Physical Model

In terms of concept, tables are a sparse row/column matrix, but physically they are stored by column. This is an important design consideration.

The above "conceptual" table is physically stored as follows:

Note that no empty cells are stored in the figure above. Therefore, if the query timestamp is t8, "content:" Will return null. Similarly, the query timestamp is t9, and the "anchor:" value is "my. look. "ca" also returns null.

However, if no timestamp is specified, the latest data values of the specified column should be returned, and the latest values are first found in the table because they are sorted by time. Therefore, querying "contents:" without specifying the timestamp will return data at the t6 time; querying "my." in "anchor. look. ca "without specifying the timestamp, the data at the t8 time will be returned.

Example

To demonstrate how data is stored on a disk, consider the following example:

The program first writes the line "[0-9]", the column "anchor: foo", and then the row "[0-9]" and the column "anchor: bar "; finally, the row "[0-9]" is written, and the column "anchor: foo" is written ". When memcache is flushed to the disk and the storage is reduced, the corresponding file may be in the following format:

Note that the column "anchor: foo" is stored twice (but the timestamp is different), and the new timestamp is listed first (so the latest one is always found first ).

HRegion (Tablet) Server

For users, a table is a set of data tuples and sorted by row keywords. Physically, tables are divided into multiple HRegion tables (that is, sub-tables, tablet ). A child table is identified by its table name and the "First/last" keyword pair. A subtable with the first or last keyword "and", "contains rows in the range. The entire table is composed of sub-tables. Each sub-table is stored in an appropriate place.

All physical data is stored on Hadoop DFS. Some sub-Table servers provide data services. Generally, one computer runs only one sub-Table server program. A sub-table is managed by only one sub-Table server at a certain time.

When the client needs to update the table, connect to the relevant sub-Table server and submit the change to the sub-table. The submitted data is added to the HMemcache of the sub-table and the HLog of the sub-Table server. HMemcache stores recent updates in memory and serves as the cache service. HLog is a log file on the disk and records all update operations. The client's commit () call is not returned until the update is written to the HLog.

When providing services, the sub-table first queries HMemcache. If not, check the HStore on the disk. Each column family in the subtable corresponds to an HStore, and an HStore contains HStoreFile files on multiple disks. Each HStoreFile has a structure similar to the B-tree, allowing quick search.

We call HRegion. flushcache () on a regular basis to write HMemcache content to the HStore file on the disk, so that a new HStoreFile is added to each HStore. Then clear HMemcache and add a special mark to HLog, which indicates that HMemcache is flushed.

At startup, each sub-Table checks whether there are write operations not applied in HLog after the last flushcache () call. If no, all the data in the subtable is the data in the HStore file on the disk. If yes, the subtable re-applies the updates in the HLog and writes them to HMemcache, then call flushcache (). At last, the sub-table will delete the HLog and start the data service.

Therefore, the less flushcache () is called, the less workload it takes, and the more memory space HMemcache will occupy. at startup, the more time it takes for HLog to restore data. If flushcache () is called more frequently, HMemcache occupies less memory and HLog Recovers Data faster. However, the consumption of flushcache () also needs to be considered.

The flushcache () call adds an HStoreFile to each HStore. Reading data from an HStore may require access to all its hstorefiles. This is time-consuming, so we need to merge multiple hstorefiles into one HStoreFile at a scheduled time by calling HStore. compact.

Google's Bigtable paper describe major tightening and secondary tightening. We only noticed two things:

1. flushcache () writes all updates from the memory to the disk. With flushcache (), we can shorten the log reconstruction time at startup to 0. Each flushcache () operation adds an HStoreFile to each HStore.

2. compact () converts all hstorefiles into one.

Different from Bigtable, Hadoop HBase can shorten the time period for updating "Submit" and "Write log" to 0 (that is, "Submit" must be written to the log ). This is not difficult to implement, as long as it does.

We can call HRegion. closeAndMerge () to merge two subtables into one. In the current version, both sub-tables must be in the "offline" status for merging.

When a sub-table is larger than a specified value, the sub-Table server needs to call HRegion. closeAndSplit () to split it into two new sub-tables. The new sub-table is reported to the master, and the master determines which sub-Table server takes over which sub-table. The splitting process is very fast, mainly because the new sub-table only maintains the reference to the HStoreFile of the old sub-Table. One references the First Half of the HStoreFile and the other references the second half. When the reference is created, the old sub-table is marked as "offline" and persists until the tightening operation of the new sub-Table clears all references to the old sub-table, the old sub-table is deleted.

Summary:

1. The client accesses the data in the table.

2. The table is divided into many sub-tables.

3. The sub-table is maintained by the sub-Table server. The client connects to the sub-Table server to access the row data within the scope of a sub-Table keyword.

4. Sub-tables include:

A. HMemcache: stores the latest memory buffer.

B. HLog: stores the latest logs.

C. HStore, a group of efficient disk files. Each columnfamily has an HStore.

Master server of HBase

Each sub-Table server maintains contact with the unique master server. The master server tells each sub-Table server which sub-tables should be loaded for service.

The master server maintains the active tags of the child table server at any time. If the connection times out between the master server and the sub-Table server:

A. The sub-Table server "kills" itself and restarts in A blank state.

B. The master server's dummy sub-Table server is "dead" and its sub-tables are allocated to other sub-Table servers.

Note that this is different from Google's Bigtable. Their child table server can continue to serve even if the connection to the master server is broken. We must "bind" the sub-Table server to the master server because we do not have an additional lock management system like Bigtable. In Bigtable, the master server is responsible for allocating sub-tables, and the lock manager (Chubby) ensures that sub-tables are accessed by the sub-Table server atom. HBase only uses one core to Manage Sub-Table servers: Master servers.

Bigtable does not have any problems. They all depend on a core network structure (HMaster or Chubby), and the entire system can run as long as the core is still running. Maybe Chubby has some special advantages, but this is beyond the scope of HBase's current goal.

When the sub-Table server "reports" to a new master server, the master server enables each sub-Table server to load 0 or several sub-tables. When the sub-Table server dies, the master server marks these sub-tables as "unallocated" and then tries to give them to other sub-Table servers.

Each sub-table is identified by its table name and keyword range. Since the keyword range is continuous and the first and last keywords are NULL, it is enough to use the first keyword to identify the keyword range.

However, the situation is not that simple. Because of merge () and split (), we may (temporarily) have two completely different subtables with the same name. If the system crashes at this unfortunate moment and the two sub-tables may exist on the disk at the same time, the arbitrator who determines which sub-table is "correct" is the metadata information. To distinguish different versions of the same sub-table, we add a unique region Id to the sub-table name.

In this way, the final form of our sub-Table identifier is: Table name + first keyword + region Id. The following is an example where the table name is hbaserepository, the first keyword is w-nk5YNZ8TBb2uWFIRJo7V =, and the region Id is 6890601455914043877, so its unique identifier is:

Hbaserepository, w-nk5YNZ8TBb2uWFIRJo7V ==, 6890601455914043877

Metadata table

Metadata tables may grow and can be split into multiple sub-tables. To locate each part of the metadata table, we store the metadata of all metadata sub-tables in the ROOT table. The root sub-table is always a sub-table.

At startup, the master server immediately scans the root sub-table (because there is only one root sub-table, its name is hard-coded ). In this case, you may need to wait for the root sub-table to be allocated to a sub-Table server.

Once the root sub-table is available, the master server scans it to obtain all the metadata sub-Table locations, and then the master server scans the metadata table. Similarly, the master server may have to wait for all metadata sub-tables to be allocated to the sub-Table server.

Finally, after the master server scans the metadata sub-tables, it knows the locations of all sub-tables and assigns them to the sub-Table server.

The master server maintains the set of currently available sub-Table servers in the memory. There is no need to save this information on the disk because the master server crashes and the entire system crashes.

Bigtable stores the ing information from "sub-tables" to "sub-Table server" in Google's Distributed Lock server Chubby. However, we store this information in the metadata table because Hadoop does not have anything equivalent to Chubby.

In this way, each row of the metadata and the root sub-table "info:" column family contains three members:

1. Info: regioninfo contains a serialized HRegionInfo object.

2. Info: server contains a serialized HServerAddress. toString () Output string. This string can be used by the HServerAddress constructor.

3. Info: startcode is a serialized long integer generated when the sub-Table server starts. The sub-Table server sends this integer to the master server. The master server determines whether the metadata and the information in the root table are outdated.

Therefore, the client does not need to connect to the master server as long as it knows the location of the root sub-table. The load on the master server is relatively small: It processes the expired sub-Table server, and scans the root sub-table and metadata sub-table at startup, and provide the location of the root sub-table (as well as the load balancing between the sub-Table servers ).

HBase clients are quite complex and often need to combine the root sub-table and metadata sub-table to meet the needs of users to scan a table. If a sub-Table server is down or the sub-table on it is missing, the client can only wait and retry. During startup or when the child table server crashes recently, the ing information from the child table to the child table server may be incorrect.

Conclusion:

1. The sub-Table server provides access to sub-tables. A sub-table is managed by only one sub-Table server.

2. The sub-Table server needs to "report" to the master server ".

3. If the master server goes down, the entire system goes down.

4. Only the master server knows the current sub-Table server set.

5. mappings between sub-tables and sub-Table servers are stored in two special sub-tables, which are allocated to the sub-Table server like other sub-tables.

6. The root sub-table is special, and the master server always knows its location.

7. Integrating these things is a client task.

Http://www.cnblogs.com/wycg1984/archive/2009/08/03/1537490.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.