Hbase-root-and. META. Table structure

Source: Internet
Author: User

In HBase, most of the operations are done in Regionserver, client side want to insert, delete, query data need to find the corresponding Regionserver first. What do you call a corresponding regionserver? is to manage the regionserver of the region you want to operate. The client itself does not know which Regionserver manages which region, then how does it find the corresponding regionserver? This article is on the basis of research source to uncover this process.

In the previous article "HBase storage Schema" We have discussed the basic storage architecture of hbase. On this basis we introduce two special concepts:-root-and. META. What is it? They are two built-in tables of hbase, and they are no different from the other hbase tables, from the point of view of the storage structure and methods of operation, and you can assume that this is two ordinary tables that apply to normal table operations. The difference is that HBase uses them to store an important system of information about the distribution of--region and the details of each region.

Well, since we're talking about -root- and . META. Can be thought of as two ordinary tables, then they should have their own table structure just like any other table. Yes, they have their own table structure, and the table structure of the two tables is the same, after analyzing the source code, I drew the table structure roughly:

-root-and. META. Table structure

Let's take a closer look at the structure, where each row records a region's information.

First of all, Rowkey,rowkey consists of three parts: TableName, Startkey and TimeStamp. Rowkey stores what we call the region's name. Oh, do you remember? As we mentioned in the previous article, the name of the folder used to store the region is the hash value of regionname, because regionname may contain some illegal characters. Now you should know why Regionname contains illegal characters, because Startkey is allowed to contain any value. The three parts that make up the rowkey are concatenated with commas to form the entire rowkey, where timestamp is represented by a decimal string of numbers. Here is an example of a rowkey:

Java code
    1. table1,rk10000,12345678

Then the main family:info,info in the table contains three Column:regioninfo, server, Serverstartcode. RegionInfo is the details of the region, including Startkey, EndKey, and information about each family, among others. The server stores the address of the Regionserver that manages this region.

So when the region is split, merged, or reassigned, it needs to modify the contents of the table.

So far we have learned the necessary background, and we will formally begin to introduce the entire process of client-side search for regionserver. I'm going to use an imaginary example to learn the process, so I first built the hypothetical-root-table and the. META. Table.

Let's look first. Meta. Table, assuming that there are only two user tables in HBase: Table1 and Table2,table1 are very large and are divided into a lot of region, so there are many row lines in the. META. Table to record these region. And Table2 is very small, just divided into two region, so in. Only two row records are used in META. The contents of this table look like this:

. META. Row record structure

Now suppose we're going to find a rowkey from the Table2 that is RK10000 data. Then we should follow the following steps:

1. From the. META. Table, query which region contains this data.

2. Get the Regionserver address that manages this region.

3. Connect the Regionserver to find this data.

OK, let's take the first step first. The problem is. Meta. is also an ordinary table, we need to know which regionserver management. Meta. Table, what to do? There is a way that we put management. META. The regionserver address of the table is put on the zookeeper, so everyone knows who's managing it. META:

The problem seems to be solved, but for this example we have a new problem. Because the Table1 is too big, its region is too much,. META. In order to store these region information, a lot of space is spent and you need to divide it into multiple region. This means that there may be multiple regionserver in management. META: What to do? Store all management. META. regionserver address in zookeeper let the client go through it by itself? HBase does not do so.

The practice of hbase is to use a different table to record. Meta. The region information is exactly the same as the. Meta. Record region information for the user table. This table is the-root-table. This also explains why-root-and. META. Have the same table structure because their principles are identical.

Assume. META. Table is divided into two region, then the content of-root-looks like this:

-root-Line Record structure

This will require the client to access the-root-table first. So you need to know the address of the regionserver that manages the-root-table. This address is present in zookeeper. The default path is:

Java code
    1. /hbase/root-region-server

Wait, what if the-root-table is too big to be divided into multiple region? Hey, HBase thinks the-root-table will not be big to that extent, so-root-will only have a region, this region of information is also exist in hbase internal.

Now let's start from the beginning, we want to query Table2 Rowkey is RK10000 data. The main code for the entire routing process is in org.apache.hadoop.hbase.client.HConnectionManager.TableServers:

Java code
  1. Private Hregionlocation locateregion (final byte[] tableName,
  2. final byte[] row, boolean UseCache) throws IOException {
  3. if (tableName = = Null | | tablename.length = = 0) {
  4. throw New IllegalArgumentException ("table name cannot be null or zero length");
  5. }
  6. if (bytes.equals (TableName, Root_table_name)) {
  7. synchronized (rootregionlock) {
  8. //This block guards against-threads trying to find the root
  9. //region at the same time. one'll go do the find and the
  10. //Second waits.  The second thread won't do find.
  11. if (!usecache | | rootregionlocation = = null) {
  12. this.rootregionlocation = Locaterootregion ();
  13. }
  14. return this.rootregionlocation;
  15. }
  16. } Else if (bytes.equals (TableName, Meta_table_name)) {
  17. return Locateregioninmeta (Root_table_name, TableName, Row, UseCache, Metaregionlock);
  18. } Else {
  19. // region isn't in the cache–have to go to the meta RS
  20. return Locateregioninmeta (Meta_table_name, TableName, Row, UseCache, Userregionlock);
  21. }
  22. }

This is the process of a recursive call:

Java code
    1. Get Table2,rowkey for RK10000 regionserver = Get. Meta.,rowkey for table2,rk10000, 99999999999999 of regionserver =-root-,rowkey for. meta.,table2,rk10000, 99999999999999,99999999999999 of regionserver =-root-'s Regionserver = Zookeeper get-root-from regionserver = > Rowkey the closest (less than) from the-root-table. meta.,table2,rk10000,99999999999999,99999999999999 of a row, and get. Meta. Regionserver = from the. Meta. Table to find Rowkey closest (less than) table2,rk10000, 99999999999999 of a row, and get Table2 Regionserver = > from Table2 to RK10000 's row

So far, the client has completed the entire process of routing regionserver, using the addition of the "99999999999999" suffix and finding the closest (less than) Rowkey method throughout the process. It's not hard to understand how you can take a closer look at this approach.

Finally, you should note two things:

1. Masterserver is not involved in the entire routing process, which means that HBase's daily data operations do not require masterserver and do not cause masterserver burden.

2. The client side does not do this entire routing process each time the data operation, many of the data will be cache up. As for how to cache, it is not within the scope of this article.

Original

Hbase-root-and. META. Table structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.