The keyvalue of HBase source code analysis

Source: Internet
Author: User

Within HBase, the cell cell is implemented as KeyValue, which is a cell in the memory of an hbase row of data, consisting of key length, value length, key, value, and four parts. The key is made up of row length, row, column Family length, column Family, column Qualifier, Time Stamp, key type seven parts. In the HBase1.0.2 version, its structure


From left to right, in turn:

1, Key length: Storage of key lengths, accounting for 4B;

2, Value length: The length of the storage value, accounting for 4B;

3, Key: by Row length, row, column Family length, column Family

3.1, row length: storage row lengths, that is, the length of Rowkey, accounting for 2B;

3.2, row: Storage row actual content, namely Rowkey, its size is row Length;

3.3, column Family length: storage column Family length, accounting for 1B;

3.4, column Family: Storage column Family The actual content, the size of column Family Length;

3.5, column Qualifier: Storage column Qualifier corresponding data, since key all the size of all other fields know, the size of the whole key is also known, then this Column Qualifier size is also clear, No longer need to store its length;

3.6, Time Stamp: storage timestamp Stamp, accounting for 8 B;

3.7, Key type: Store key types, accounting for 1b,type divided into put, Delete, DeleteColumn, Deletefamilyversion, deletefamily and other types, Mark the type of the keyvalue;

4. Value: The actual value that the cell cell corresponds to is stored.


Below, let's look at how KeyValue is implemented in HBase. In KeyValue, there are three very important variables, as follows:

  KeyValue core instance fields.  Keyvalyeh Core instance stores domain    //keyvalue related invariant byte[] array, storing keyvalue actual content  protected byte [] bytes = null;  An immutable byte array that contains the KV  //KeyValue in the starting position of the array bytes  protected int offset = 0;  Offset into bytes Buffer KV starts  at//keyvalue length after offset in array bytes from start position  protected int long = 0;  Length of the KV starting from offset.
The keyvalue content is stored in the byte[] array bytes, which is a constant byte[] array, and the starting position and length of the store are identified by offset and the lengths, respectively.

Below, we look at the keyvalue to get key length, value length, Row Length, Column Family, value and other related fields to verify the keyvalue structure we listed above.

1. Key Length

  /**   * @return Length of key portion.   *  /public int getkeylength () {      //from KeyValue bottom byte[] array Bytes position offset starts, gets an int, that is, 4B    return Bytes.toint ( This.bytes, This.offset);  }
The Getkeylength () method is used to get the key length in KeyValue, which starts at offset from the keyvalue bottom byte[] Array bytes, obtains an int, or 4B, which verifies what we said above, The first of the keyvalue is the key Length, and the size is 4 B.

2, Value Length

  /**   * @return Value length   *  /@Override public  int getvaluelength () {      //from KeyValue bottom byte[] Array Bytes offset+4 start, get an int, that is, 4B    //That is, key length followed by 4B is value length    int vlength = Bytes.toint ( This.bytes, This.offset + bytes.sizeof_int);    return vlength;  }
The Getvaluelength () method is used to get the value length of value in KeyValue, which starts at bytes in KeyValue bottom byte[] array offset+4, obtaining an int, or 4B, This also verifies that the key length that we said above is followed by 4B is the value length.

3. Key start position

  /**   * @return Key offset in backing buffer   . *  /public int getkeyoffset () {  ///Row_offset for key length, value length after    return this.offset + row_offset;< c6/>}
Row_offset is the position after the key length, Value length, as defined below:

  How far to the key the row starts at. First thing to read are the short  //So says how long the row is.  public static final int row_offset =    bytes.sizeof_int/*keylength*/+    bytes.sizeof_int/*valuelength*/;
The Getkeyoffset () method is used to get the starting position of a key in KeyValue, where the value is the starting position of the entire keyvalue OFFSET plus row_offset, and row_offset is key length and value The size of length, which verifies that key length and value length are followed by key.

4. Value start position

  /**   * @return The value offset   *  /@Override public  int Getvalueoffset () {  //key starting position, plus the length of key, is the starting position of value    int voffset = Getkeyoffset () + getkeylength ();    return voffset;  }
The Getvalueoffset () method is used to get the starting position of value in KeyValue, whose value is the starting position of the key obtained through the Getkeyoffset () method, plus the length of the key obtained through the Getkeylength () method, This also verifies that the KeyValue relay key length, value length, key, is value.

5. Row Length

  /**   * @return Row length   *  /@Override public short  getrowlength () {  //from KeyValue bottom byte[] The key starting position in the array bytes, gets a short, that is, 2b//getkeyoffset () Gets the start of the key length plus value length after the position//that is, key length followed by 4B is value length, and value length is the start of key,//and key in front of the 2B is the row length    return bytes.toshort (This.bytes, Getkeyoffset ());  }
The Getrowlength () method is used to get the row length in KeyValue, which starts at the start of key from the KeyValue bottom byte[] Array bytes and obtains a short, or 2B, which proves that the row Length is the first field in a key.

5. Row start position

  /**   * @return Row offset   *  /@Override public  int Getrowoffset () {      //key start position plus 2B, i.e. row Length is followed by row    return Getkeyoffset () + Bytes.sizeof_short;  }
The Getrowoffset () method is used to get the start position of the row in KeyValue, which takes the starting position of key and adds 2B, that is, row length is the row, consistent with the above!

6. Row

  /**   * primarily for use client-side.  Returns the row of this KeyValue in a new   * byte array.<p>   *   * If server-side, use {@link #getBuffer ( )} with appropriate offsets and   * lengths instead.   * @return Row in a new byte array.   *  /@Deprecated//Use Cellutil.getrowarray () public  byte [] GetRow () {    return cellutil.clonerow (this); c20/>}
The GetRow () method is used to get the row content, which passes through the Cellutil Clonerow () method, passes in the KeyValue instance itself, returns a byte[], and the Clonerow () method is as follows:

  public static byte[] Clonerow (cell cell) {  //output is a byte[] array of size row length    byte[] output = new byte[ Cell.getrowlength ()];        Copy row from cell to output    copyrowto (cell, output, 0);    return output;  }
As you can see, a row length of byte[] array output is constructed first, which means that the row size is determined by the value corresponding to the previous row length. Then, call the Copyrowto () method to copy the contents of the row store in KeyValue to the output array and return. The Copyrowto () method, bytes the byte[] array in the cell (i.e., keyvalue), starts at row offset from the start of the row, and copies to the target byte[] array destination (that is, output), starting with 0 , the length of the copied data is row length, which will fill the entire destination (output), the code is as follows:

  public static int Copyrowto (cell cell, byte[] destination, int destinationoffset) {    //cell byte[] Array bytes, from row offse Start at T, copy to target byte[] array destination, starting from 0, the length of the copied data is row length,//that will fill the entire destinationsystem.arraycopy (Cell.getrowarray (), Cell.getrowoffset (), Destination, Destinationoffset,      cell.getrowlength ());//Returns the end point of the data copy return    Destinationoffset + cell.getrowlength ();  }
7. Family Start position

  /**   * @return Family offset */  @Override public  int Getfamilyoffset () {      return Getfamilyoffset ( Getrowlength ());  }  /**   * @return Family offset   *  /private int getfamilyoffset (int rlength) {  // Get Family starting position: Entire KeyValue starting position OFFSET + row_offset (Key length + Value length) + 2B (row Length) + actual row size Rlength + 1B (Family L Ength)    return this.offset + row_offset + bytes.sizeof_short + rlength + bytes.sizeof_byte;  }
The Getfamilyoffset () method is used to get the starting position of family in KeyValue, which is the entire keyvalue starting position OFFSET, plus row_offset, which is the size of key length, Value length, Then add the row length to occupy size 2B, and the actual row size obtained by the Getrowlength () method Rlength, and finally add 1B, that is, the size of family length. This also shows that the key in the row length, row after the family length and family, and family length of size accounted for 1B.

8, Family Length

  /**   * @return Family length */  @Override public  byte getfamilylength () {    return getfamilylength ( Getfamilyoffset ());  }  /**   * @return Family length   *  /public byte getfamilylength (int foffset) {  //Family start position minus 1, This 1 B is family length    return this.bytes[foffset-1];  }
The Getfamilylength () method is used to obtain the length of family family in KeyValue, which is obtained by reducing the getfamilyoffset position obtained by the family () method, consistent with the validation obtained above, Family front 1B is family Length.

9. Qualifier Start position

  /**   * @return Qualifier offset   *  /@Override public  int Getqualifieroffset () {    return Getqualifieroffset (Getfamilyoffset ());  }  /**   * @return Qualifier offset   *  /private int getqualifieroffset (int foffset) {  // Family start position plus family length family long    return foffset + getfamilylength (foffset);  }
The Getqualifieroffset () method is used to get the starting position of the qualifier in the KeyValue, which is actually through the starting position of the family plus the family length of family, This also shows that after family is qualifier.

10, qualifier length

  /**   * @return Qualifier length   *  /@Override public  int getqualifierlength () {    return Getqualifierlength (Getrowlength (), getfamilylength ());  }  /**   * @return Qualifier length   *  /private int getqualifierlength (int rlength, int flength) {//Key length minus row length , Family length, Row length length, Family length, time stamp length, Key type length    return getkeylength ()-(int) Getkeydatastructuresize (rlength, flength, 0);  }
The Getqualifierlength () method, which is used to obtain qualifier length in KeyValue, does not directly store qualifier length in KeyValue, Instead, the total length of the key, minus the length of the rest of the key except qualifier, is actually a process of calculation, minus the length of the row, the Family length, the row length length, Family length, and time Stamp length, Key type length, and.

11. Timestamp start position

  /**   * @return Timestamp Offset */public  int Gettimestampoffset () {    return Gettimestampoffset ( Getkeylength ());  }  /**   * @param keylength Pass If you had it to save on a int creation.   * @return Timestamp Offset   *  /private int gettimestampoffset (final int keylength) {//key start position plus the length of key, minus time Stamp and key TYPE occupy the size    return Getkeyoffset () + keylength-timestamp_type_size;  }
The Gettimestampoffset () method is used to get the starting position of time stamp in keyvalue, which is calculated by adding the length of key at the start of key and subtracting the size of time stamp and key type. This means that in key, time stamp is in the second-to-last position, that is, after qualifier, before the key type, and the key type is final.
12. Get timestamp

  /**   *   * @return Timestamp   *  /@Override public  long Gettimestamp () {    return Gettimestamp ( Getkeylength ());  }  /**   * @param keylength Pass If you had it to save on a int creation.   * @return Timestamp   *  /Long gettimestamp (final int keylength) {  //Get Timestamp start position tsoffset    int Tsoffset = Gettimestampoffset (keylength);        Start reading a long, i.e. 8 B    return Bytes.tolong (This.bytes, Tsoffset) from the tsoffset position in Bytes;  }
The Gettimestamp () method is used to obtain timestamp in KeyValue, which first acquires the timestamp start position tsoffset, and then reads a long, that is, 8 B, from bytes position in Tsoffset. This is consistent with the above mentioned timestamp of 8 B.

13. Get Key Type

  /**   * @return Type of this KeyValue.   */  @Deprecated public  byte GetType () {    return gettypebyte ();  }  /**   * @return keyvalue.type byte representation   *  /@Override public  byte Gettypebyte () {  // The position of the entire keyvalue offset + key Length-1 + key long occupies the length and value of length and/or the key type is located in the last 1B return this.bytes[this of the entire key    . Offset + getkeylength ()-1 + row_offset];  }
The GetType () and Gettypebyte () methods are used to get the Key type value in KeyValue, which is passed in bytes, from the entire keyvalue position offset + key Length-1 + key length accounted for and value The length occupies a byte and is obtained at the position, that is, the key type is located at the last 1B of the entire key, which is consistent with the above mentioned.

14. Get Value

  /**   * Returns value in a new byte array.   * Primarily for use client-side. If server-side, use   * {@link #getBuffer ()} with appropriate offsets and lengths instead to   * Save on allocations.< c5/>* @return Value in a new byte array.   *  /@Deprecated//Use Cellutil.getvaluearray () public  byte [] GetValue () {    return Cellutil.clonevalue (this );  }
The GetValue () method is used to derive value from the KeyValue, which is the actual contents of the entire cell, and is obtained by Cellutil's Clonevalue () method, which is passed into KeyValue's own power. Let's take a look at this clonevalue () method:

  public static byte[] Clonevalue (cell cell) {  //create byte[] array with value length size output    byte[] Output = new byte[ Cell.getvaluelength ()];        Copy the value in cell to output    copyvalueto (cell, output, 0);    return output;  }
The Clonevalue () method first creates a value of length byte[] array output, and then calls the Copyvalue () method to copy the value in the cell to output. The Copyvalue () method is simple, copying the value length size from the value offset of the bytes array to destination, the code is as follows:

  public static int Copyvalueto (cell cell, byte[] destination, int destinationoffset) {    //start copy VA from Value offset of bytes array Lue length size, to Destinationsystem.arraycopy (Cell.getvaluearray (), Cell.getvalueoffset (), Destination, Destinationoffset,        cell.getvaluelength ());    return destinationoffset + cell.getvaluelength ();  }






















The keyvalue of HBase source code analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.