PHP implementation of an efficient database (file storage, NOSQL)

Source: Internet
Author: User
Tags file size fread hash pack prev strlen unpack

Read and write in a file, one file is an index file, and the other is a real data file.
The index file is divided into 2 parts, the first part is all pointers, the position of the second part, and the second part is the index record. All index pointers: A pointer to a key that records all the same hash values, which is a list structure that is recorded at the location of the data file and the next value of the same key.
Index records: Each record has four parts, the first part is 4 bytes, is the offset of the next index; the second part is the key,128 byte of the record; The third part is the data offset, 4 bytes; The last part is the data record length, 4 bytes.
We set the maximum storage limit for 262,144 files.

The lookup process is as follows:

1, according to the key to calculate the hash value, get the hash value of the list in the index file in the first part (all the pointer area) position.
2, according to the position of step one, get the value, Time complexity O (1);
2. According to the value in step one, find the position of the second part of the index file (index record), that is, the linked list of all pointers with the same hash value of key. Follow the list to find the key, get the key in the list of data stored, the data only contains the key in the index file location, time complexity of O (n);
3, according to step two of the key obtained in the index file location, the index file to store the key information. Information is contained in a real data file where the real data is stored.
4. According to the location obtained from step three, get the data in the real data file and return it to the application.

Test Result: Insert 10,000 time consuming: 793ms. Find 10,000 time consuming: 149ms. Although this efficiency is only one-tenth of Redis ... But please don't mind the details ...

The code makes a comment, the above text is a bit messy. The code implements only three methods, one to insert (skipped if present), one to find, and one to delete.

Source of Ideas: PHP core technology and best practice book. Respect the author, reprint please keep the title.

The code is as follows Copy Code

<?php
The number of element pointers in the hash table, each pointer is int, and the file offset for storing the hash list
Define (' Db_bucket_size ', 262144);
The length of the key for each record
Define (' Db_key_size ', 128);
Length of an index record
Define (' Db_index_size ', db_key_size + 12);

Success-Return code
Define (' Db_success ', 1);
Failure-Return code
Define (' Db_failure ',-1);
Key repeat-Return code
Define (' Db_key_exists ',-2);

Class db{
Private $idx _FP;
Private $dat _FP;
Private $closed;

/**
* Description: Open Database
* @param $pathName Data File storage path
* @return Mixed
*/
Public function Open ($pathName) {
$idx _path = $pathName. '. Idx ';
$dat _path = $pathName. '. Dat ';
if (!file_exists ($idx _path)) {
$init = true;
$mode = "W+b";
}else{
$init = false;
$mode = ' r+b ';
}
$this->idx_fp = fopen ($idx _path, $mode);
if (! $this->idx_fp) {
return db_failure;
}
if ($init) {
Convert 0x00000000 into binary with unsigned long integer
$elem = Pack (' L ', 0x00000000);
for ($i =0; $i < db_bucket_size; $i + +) {
Fwrite ($this->idx_fp, $elem, 4);
}
}
$this->dat_fp = fopen ($dat _path, $mode);
if (! $this->dat_fp) {
return db_failure;
}

return db_success;
}

/**
* DESCRIPTION:TIMES33 Hash algorithm
* @param $key
* @return int
*/
Private Function Times33hash ($key) {
$len = 8;
$key = substr (MD5 ($key), 0, $len);
$hash = 0;
for ($i =0; $i < $len; $i + +) {
$hash + + $hash + ord ($key [$i]);
}
0X7FFFFFFF: A hexadecimal number is a 4bit,8 of 32 bits, or 4 bytes, as large as an int. And F is 1111,7 is 0111, then this hexadecimal number is the head is 0, the rest is 1, the first is the sign bit, that is to say 7FFFFFFF is the largest integer.
& 0x7FFFFFFF can guarantee that the number returned is a positive integer
return $hash & 0x7fffffff;
}

/**
* Description: Inserting records
* @param $key
* @param $value
*/
Public function Add ($key, $value) {
$offset = ($this->times33hash ($key)% db_bucket_size) * 4;

$idxoff = Fstat ($this->idx_fp);
$idxoff = Intval ($idxoff [' size ']);

$datoff = Fstat ($this->dat_fp);
$datoff = Intval ($datoff [' size ']);

$keylen = strlen ($key);
$vallen = strlen ($value);
if ($keylen > Db_key_size) {
return db_failure;
}
0 means that this is the last record and there are no more records for that chain.
$block = Pack (' L ', 0x00000000);
Key value
$block. = $key;
If the length of the key value does not reach the maximum length, fill with 0
$space = db_key_size-$keylen;
for ($i =0; $i < $space; $i + +) {
$block. = Pack (' C ', 0x00);
}
The offset of the file where the data resides
$block. = Pack (' L ', $datoff);
The length of the data record
$block. = Pack (' L ', $vallen);
Although Seek_set is the default value, it is not feared that the authorities will change the-.-
Fseek ($this->idx_fp, $offset, Seek_set);
Detects if a hash value for this key exists
$pos = @unpack (' L ', fread ($this->idx_fp, 4));
$pos = $pos [1];
If key does not exist
if ($pos = = 0) {
Fseek ($this->idx_fp, $offset, Seek_set);
Fwrite ($this->idx_fp, pack (' L ', $idxoff), 4);

Fseek ($this->idx_fp, 0, seek_end);
Fwrite ($this->idx_fp, $block, db_index_size);

Fseek ($this->dat_fp, 0, seek_end);
Fwrite ($this->dat_fp, $value, $vallen);

return db_success;
}
If key exists
$found = false;
while ($pos) {
Fseek ($this->idx_fp, $pos, Seek_set);
$tmp _block = fread ($this->idx_fp, db_index_size);
$cpkey = substr ($tmp _block, 4, db_key_size);
Returns 0 when $cpkey = = $key, less than return, greater than return positive
if (!strncmp ($cpkey, $key, $keylen)) {
$dataoff = Unpack (' L ', substr ($tmp _block, Db_key_size + 4, 4));
$dataoff = $dataoff [1];
$datalen = Unpack (' L ', substr ($tmp _block, Db_key_size + 8, 4));
$datalen = $datalen [1];
$found = true;
Break
}
$prev = $pos;
$pos = @unpack (' L ', substr ($tmp _block, 0, 4));
$pos = $pos [1];
}

        if ($found) {
             return db_key_exists;
       }
        fseek ($this->idx_fp, $prev, Seek_set);
        fwrite ($this->idx_fp, pack (' L ', $idxoff), 4);
        fseek ($this->idx_fp, 0, seek_end);
        fwrite ($this->idx_fp, $block, db_index_size);
        fseek ($this->dat_fp, 0, seek_end);
        fwrite ($this->dat_fp, $value, $vallen);
        return db_success;
   }

   /**
     * Description: Query a record
     * @param $key
     */
    public function Get ($key) {
        //Calculate offset, key hash value for index file size modulus, and multiply by 4. Because each list pointer size is 4
        $offset = ($this->times33hash ($key)% db_bucket_size) * 4;
       //seek_set is the default
        Fseek ($this->idx_fp, $offset, Seek_set);
        $pos = unpack (' L ', fread ($this->idx_fp, 4));
        $pos = $pos [1];

$found = false;
while ($pos) {
Fseek ($this->idx_fp, $pos, Seek_set);
$block = Fread ($this->idx_fp, db_index_size);
$cpkey = substr ($block, 4, db_key_size);

if (!strncmp ($key, $cpkey, strlen ($key)) {
$dataoff = Unpack (' L ', substr ($block, Db_key_size + 4, 4));
$dataoff = $dataoff [1];

                $datalen = Unpack (' L ', substr ($block, Db_key_size + 8, 4));
                $datalen = $ DATALEN[1];

                $found = true;
                break;
           }
            $pos = unpack (' L ', substr ($block, 0, 4));
            $pos = $pos [1];
       }
        if (! $found) {
             return null;
       }
        fseek ($this->dat_fp, $dataoff, Seek_set);
        $data = fread ($this->dat_fp, $datalen);
        return $data;
   }

/**
* Description: Delete
* @param $key
*/
Public Function Delete ($key) {
$offset = ($this->times33hash ($key)% db_bucket_size) * 4;
Fseek ($this->idx_fp, $offset, Seek_set);
$head = Unpack (' L ', fread ($this->idx_fp, 4));
$head = $head [1];
$curr = $head;
$prev = 0;
$found = false;
while ($curr) {
Fseek ($this->idx_fp, $curr, Seek_set);
$block = Fread ($this->idx_fp, db_index_size);

$next = Unpack (' L ', substr ($block, 0, 4));
$next = $next [1];

$cpkey = substr ($block, 4, db_key_size);
if (!strncmp ($key, $cpkey, strlen ($key)) {
$found = true;
Break
}
$prev = $curr;
$curr = $next;
}
if (! $found) {
return db_failure;
}
Deletes the index file.
if ($prev = = 0) {
Fseek ($this->idx_fp, $offset, Seek_set);
Fwrite ($this->idx_fp, pack (' L ', $next), 4);
}else{
Fseek ($this->idx_fp, $prev, Seek_set);
Fwrite ($this->idx_fp, pack (' L ', $next), 4);
}
return db_success;
}

Public function Close () {
if (! $this->closed) {
Fclose ($this->idx_fp);
Fclose ($this->dat_fp);
$this->closed = true;
}
}
}
?>


Test, test add 10,000 and find 10,000:

  code is as follows copy code

<?php
//include the above class first. If it's not in the same file.
//Test
$db = new db ();
$db->open ('/var/www/data/');

$startTime = Microtime (true);

//Insert Test ... Insert 10,000: Successful, time-consuming: 793.48206520081ms
//for ($i =0 $i <10000; $i + +) {
//    $db->add (' key '. $i, ' Value '. $i);
//}

//Find Test ... Find 10,000: Successful, time-consuming: 149.08313751221ms
for ($i =0 $i <10000; $i + +) {
    $db->get (' key '. $i);
}

$endTime = Microtime (true);
Echo ' success, time consuming: '. (($endTime-$startTime) *1000). ' MS ';
$db->close ();
?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.