LEVELDB Source Analysis One: coding

Source: Internet
Author: User
Leveldb default is small-endian storage, low-bit bytes are emitted at the lower address of the memory, high-order bytes are emitted at the upper address of the memory.
The coding is divided into two kinds of variable-length encodevarint and fixed-size encodefixed, each of which is 32-bit and 64-bit.
I. encodefixed
void EncodeFixed32 (char* buf, uint32_t value) {
#if __byte_order = = __little_endian
  memcpy (buf, &value, sizeof (value));
#else
  Buf[0] = value & 0xFF;
  BUF[1] = (value >> 8) & 0xFF;
  BUF[2] = (value >>) & 0xFF;
  BUF[3] = (value >>) & 0xFF;
#endif
}
void EncodeFixed64 (char* buf, uint64_t value) {
#if __byte_order = = __little_endian
  memcpy (buf, &value, sizeof (value));
#else
  Buf[0] = value & 0xFF;
  BUF[1] = (value >> 8) & 0xFF;
  BUF[2] = (value >>) & 0xFF;
  BUF[3] = (value >>) & 0xFF;
  BUF[4] = (value >> +) & 0xFF;
  BUF[5] = (value >> +) & 0xFF;
  BUF[6] = (value >>) & 0xFF;
  BUF[7] = (value >>) & 0xFF;
#endif
}
These two functions are very simple, the function is to determine whether the system is a small-end storage, if so, the value is copied directly to buf, if not, then the value is converted to a small endian stored in the BUF.
Decoding functions DecodeFixed32 and DecodeFixed64 are the inverse of the coding process and are very simple.
two. Encodevarint
Why do you encode an integer (int) into a variable-length integer (varint)? is to save as much storage space as possible.
Varint is a compact way of representing numbers, which represent a number in one or more bytes, and the smaller the number, the less the number of bytes used. For example, a int32 type of number, typically requires 4 bytes. However, with Varint, a small number of int32 types can be represented by 1 bytes. Of course everything has good and bad side, using varint notation, large numbers may require 5 bytes to represent. From a statistical point of view, generally not all the numbers in the message are large numbers, so in most cases, with varint, you can use a smaller number of bytes to represent the digital information.
The highest bit of each byte in the varint has a special meaning, if the bit is 1, the subsequent byte is also part of the number, and if the bit is 0, the end. The other 7 bits (bit) represent numbers. The maximum number that a 7-bit can represent is 127, so numbers less than 128 can be represented in a single byte. A number greater than or equal to 128, say 300, is represented in memory with two bytes:
Low height
1010 1100 0000 0010
The implementation process is as follows:
The binary of 300 is 100101100, the lower 7 bits is 010 1100 in the memory low byte, because the second byte is also part of the number, so the highest bit of memory low byte is 1, then the full memory low byte is 1010 1100. 300 of the high 2 bits is 10 put into the high byte of memory, because the number to the end of the byte, so that the byte includes the highest bit of the other 6 bits are filled with 0, the full memory high byte is 0000 0010.
Normally, an int needs 32 bits, and Varint uses the highest bit of a byte as the identity bit, so a byte can only store 7 bits, and if the integer is particularly large, it may take 5 bytes to hold (5*8-5 (flag bit) >32), and the fifth branch of the IF statement below is handling the case.
char* EncodeVarint32 (char* DST, uint32_t v) {//Operate on characters as unsigneds unsigned char* ptr = Reinterpret_c
  ast<unsigned char*> (DST);
  static const int B = 128;
  if (v < (1<<7)) {//if V is less than * (ptr++) = V; } else if (v < (1<<14)) {//if V is less than 16384, if v=300 (0000 0001 0010 1100), first and 128 (0000 0000 1000 0000) bitwise OR, get 0000 00
    01 1010 1100, put low 8 bits (1010 1100) to memory low byte. * (ptr++) = v |
    B              * (ptr++) = v>>7;
  Shift the 300 (0000 0001 0010 1100) right to 7 bits to get 000 0000 0000 0001 0, which gives the memory a high byte. } else if (v < (1<<21)) {* (ptr++) = v |
    B * (ptr++) = (v>>7) |
    B
  * (ptr++) = v>>14; } else if (v < (1<<28)) {* (ptr++) = v |
    B * (ptr++) = (v>>7) |
    B * (ptr++) = (v>>14) |
    B
  * (ptr++) = v>>21; } else {* (ptr++) = v |
    B * (ptr++) = (v>>7) |
    B * (ptr++) = (v>>14) |
    B * (ptr++) = (v>>21) |
    B
  * (ptr++) = v>>28; } return Reinterpret_cAst<char*> (PTR); }
For a 64-bit integer, we need up to 10 bytes (10*8-10 (flag bit) >64), if you write code like ENCODEVARINT32, you need 10 if branches, Daniel certainly not so diligent. In fact, EncodeVarint32 can also be written like EncodeVarint64.
char* EncodeVarint64 (char* DST, uint64_t v) {
  static const int B = n;
  unsigned char* ptr = reinterpret_cast<unsigned char*> (DST);
  while (v >= B) {
    * (ptr++) = (V & (B-1)) | B;
    V >>= 7;
  }
  * (ptr++) = static_cast<unsigned char> (v);
  return reinterpret_cast<char*> (PTR);
}

The following function calculates the length of the integer encoding, that is, the length of the varint.

int Varintlength (uint64_t v) {
  int len = 1;
  while (v >=) {
    v >>= 7;
    len++;
  }
  return len;
}

three. Varint decoding

The principle of coding, and then to see the decoding is very easy, directly call the GETVARINT32PTR function, the function to handle the case of value < 128, that is, varint only one byte, for the case of varint greater than one byte, Getvarint32ptr calls Getvarint32ptrfallback to handle.

Inline const char* GETVARINT32PTR (const char* p,
                                  const char* limit,
                                  uint32_t* value) {
  if (P < limit) {
  
   uint32_t result = * (reinterpret_cast<const unsigned char*> (p));
    if (Result & + = = 0) {
      *value = result;
      return p + 1;
    }
  }
  Return Getvarint32ptrfallback (p, limit, value);
}
  
In the Getvarint32ptr and Getvarint32ptrfallback functions, the parameter p is a pointer to a string containing varint, and the limit is assigned limit= p + 5 when called, because Varint consumes up to 5 bytes. Value is used to store the returned int value.
Const char* getvarint32ptrfallback (const char* p,
                                   const char* limit,
                                   uint32_t* value) {
  uint32_t result = 0;< C3/>for (uint32_t shift = 0; shift <= && p < limit; shift + + 7) {
    uint32_t byte = * (reinterpret_cast& Lt;const unsigned char*> (p));
    p++;
    if (byte & +) {
      //More bytes is present
      result |= ((Byte & 127) << shift);
    } else {
      res Ult |= (byte << shift);
      *value = result;
      Return Reinterpret_cast<const char*> (p);
    }
  }
  return NULL;
}
The 64-bit decoding is similar to the 32-bit.


Reference Link: http://www.360doc.com/content/17/1218/10/51031951_714135151.shtml

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.