[Original] key-value small database tmdb released: principles and implementation

Source: Internet
Author: User

 

[Original] key-value small database tmdb released: principles and implementation

The key-value database is a typical old-fashioned database, from DBM in the early UNIX era, gdbm in the later GNU version, and ndbm, sdbm, CDB and the powerful Berkeley dB (bdb), as well as qdbm, which has been booming over the past two years, are all typical examples. In fact, the key-value database is not a strict database, but a simple and fast data storage function.

Tmdb is also similar to this key-value small data storage (DBM). It sets the storage data target level to 10 million, and the performance is not very good. It is a small experimental product. (Tmdb download)

About itBasic Features:

  • The storage data volume is, which leads to a severe reduction in performance.
  • Because of the storage features, it is more suitable for storing read-only data. Of course, it can also delete and modify data, but it is a waste of space.
  • The key length cannot exceed 64 bytes, and the Data Length cannot exceed 65536 bytes.
  • The row-Level Lock is not used, and it is a global lock. Therefore, concurrent read/write performance is not good.
  • The index file and data file are separated. All data files must be backed up during Backup.
  • The interface API is basically set according to the traditional dBm API. The entire library file is small and can be directly statically compiled into the program.

 

A Brief Introduction of the general design ideas, the design scheme is basically a simple and easy way to operate (mainly self-Lazy), roughly speaking about implementation, implementation is relatively bad, please do not hesitate to advise.

 

[Storage structure]

The index uses static indexes, and the length of the hash table cannot be dynamically expanded. The default value is 65535 hash bucket. In case of conflict, the zipper method is used. If the conflict is severe or the data volume is large, naturally, the time for searching a record is greatly increased, so the performance is better when the data volume is small and the keys are evenly distributed (all hash operations are so good ).

As mentioned above, indexes and data files are separated mainly for dynamic resizing without too much data migration and location computing.

Data storage is a single file with 256 bytes remaining in the header, followed by data storage. All data is stored in append mode, and one data is deleted, the index is only used to modify the flag space. No actual data migration or idle data space linked list records (lazy) are performed, so the structure is relatively simple.

Let's take a look at the storage structure of the index file:

Index file struct:
+ ----------- + ---------------------- + -------------- + ---------------- +
| Header | key PTR buckets | key record 1 | key record 2 .. |
+ ----------- + ---------------------- + -------------- + ---------------- +
256 bytes 262144 bytes (256kb) 76 bytes 76 bytes

 

A 256-byte header space is reserved for subsequent expansion, and then 256 kb is used to store the hash bucket to a key pointer position (Key record ), the value is set to 65536*4 = 256 kb. Therefore, the entire index file cannot exceed 2 GB. Otherwise, a single 4-byte pointer space cannot be stored (^_^ ).
A key record is a record that stores a key. The structure of a key is as follows:

 

Index key record
+ ------- + -------------- + ---------- +
| Flag | key | data PTR | next PTR |
+ ------- + -------------- + ---------- +
4 bytes 64 bytes 4 bytes 4 bytes

Flag 4 bytes indicates whether to delete or not. The key is a fixed 64-byte value, and the data PTR is a Data Pointer. It refers to the record location of the specific value in the data file, which is also 4 bytes, therefore, the size of the data file cannot exceed 2 GB (1__^). Next PTR stores the key record pointer of the next record with the same hash value (what a simple design, is a memory hash table ).

No matter how long the key is, it is a fixed-length square storage to ensure performance. Therefore, if there are many keys, the waste is serious. In actual use, if the data value is short, generally, the index file is larger than the data file. (-_-!)

Let's take a look at the storage structure of the data file:

Data File structure:
+ ---------------- + ----------------- +
| Header | data record 1 | data record 2 .. |
+ ---------------- + ----------------- +
256 bytes dynamics length dynamics Length

 

The 256-byte reserved header is followed by each indefinite data record, which is arranged one by one. Let's look at the structure of a single data record:

 

Data Record
+ -------- + ------- + ------------------ + ---------- +
| Flag | Len | data | next PTR |
+ -------- + ------- + ------------------ + ---------- +
4 bytes 4 bytes dynamics length 4 bytes

 

4-byte flag (Reserved), 4-byte storage of the actual data length, and then the Data Pointer of the next record.

The entire storage structure is relatively simple and clear, because the separation of index files and data files is used, so many methods are simple to implement, but opening a file will open one more file descriptor. (^_^)

 

[Hash algorithm]

Bkdr hash algorithm is used, mainly for better performance. In fact, both sdbm hash and times33 are good, but I think this is better, so I chose it.

/** <Br/> * hash core function <br/> * @ DESC bkdr hash <br/> */<br/> tdbhash _ db_hash (TDB * dB, const char * Str) {<br/> tdbhash seed = 131; // 31 131 1313 13131 131313 Etc .. <br/> tdbhash hash = 0; <br/> while (* Str) {<br/> hash = hash * seed + (* STR ++ ); <br/>}< br/> return (hash & 0x7fffffff) % db-> nhash; <br/>}

Comparison of related hash algorithms: http://blog.csai.cn/user3/50125/archives/2009/35638.html

 

 

[Performance test for tmdb]

Basically, the tests below are implemented in Linux. Generally, the CPU is dual-core or multi-core, the kernel is 2.6, and the file system is basically ext3. However, good file systems and good system configurations are found in actual tests, the performance difference is obvious.

  • Centos 5.4 Testing

 

  • SuSE 11 Testing

 

 

  • Fedora 7 testing

 

 

  • Cygwin Test

 

 

Insert data-average record (ignore cygwin test ):
10 W: 3.69 seconds
50 W: 22.44 seconds
100 W: 49.14 seconds

Read data-average record (ignore cygwin test ):
10 W: 2 seconds
50 W: 13.26 seconds
100 W: 32.82 seconds

 

[Tmdb usage]

Download tmdb-0.0.1: Http://heiyeluren.googlecode.com/files/tmdb-0.0.1.zip

It is easy to use. It can be compiled into the so shared library or a static library, and then used directly after containing the header file. Currently, the built-in makefile will compile and output to the output directory to generate libtmdb. so and libtmdb. A And tmdb. h header file. Generally, I personally recommend that you use a static library without dependency and ensure that the version is normal. Currently, the downloaded package contains the tmdb_test.c file, which is the code for testing the performance. You can refer to it. By default, make will compile this file to the output directory for Direct Testing and execution.

Built-in API:
TDB * tdb_open (const char * path, char * mode); <br/> void tdb_close (TDB * dB); <br/> char * tdb_fetch (TDB * dB, const char * Key); <br/> Status tdb_store (TDB * dB, const char * Key, const char * value, int mode ); <br/> Status tdb_delete (TDB * dB, const char * Key); <br/> void tdb_rewind (TDB * dB ); <br/> char * tdb_nextrec (TDB * dB, char * ret_key );

When you open a database, the tdb_open mode parameter can only have three values: R/c/W:
R: read only,
C: Create/truncate dB,
W: read/write

Sample Code: (db_test.c)

# Include <stdio. h> <br/> # include <string. h> <br/> # include <time. h> <br/> # include "tmdb. H "</P> <p> int main () {<br/> char * df =" DB "; <br/> TDB * DB = tdb_open (DF, "C"); <br/> If (! DB) {<br/> printf ("tdb_open () % s fail. /n ", DF); <br/> return-1; <br/>}< br/> printf (" tdb_open () % s success. /n ", DF); </P> <p> int s; <br/> char * ret; <br/> char * Key =" test_key "; <br/> char * val = "test_value"; <br/> S = tdb_store (dB, key, Val, tdb_insert); <br/> If (tdb_success = s) {<br/> printf ("tdb_store () % s success. /n ", key); <br/> ret = tdb_fetch (dB, key); <br/> If (null! = RET) {<br/> printf ("tdb_fetch () % s success, value: % S. /n ", key, RET); <br/>}< br/> tdb_close (db ); <br/> printf ("Close dB done/N"); </P> <p> return 0; <br/>}

Remember to add the library path during compilation: (L is the library path, L is the library name, And I is the header file path)
$ Gcc-O db_test db_test.c-L.-ltmdb-I.
$./Db_test
Tdb_open () dB success.
Tdb_store () test_key success.
Tdb_fetch () test_key success, value: test_value.
Close dB done

 

[End]

Basically, tmdb is just a small dBm that is relatively simple and easy to understand. Its performance is just a lab product. You are welcome to come up with more ideas and corrections.

More open source code: Http://heiyeluren.googlecode.com
Tmdb download: Http://heiyeluren.googlecode.com/files/tmdb-0.0.1.zip

References:
Http://www.apue.com/apue/
Haah algorithm comparison: http://blog.csai.cn/user3/50125/archives/2009/35638.html
Bdb/gdbm/TC/sqlite3 performance tests: http://dieken-qfz.spaces.live.com/Blog/cns! 586d665c0deb512d! 548. Entry

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.