Embedded Database System Berkeley DB

Source: Internet
Author: User
Tags compact

As an embedded database system with a long history, berkeleydb is mainly used in Unix/Linux operating systems. Its design philosophy is simple, compact, reliable, and high-performance. This article is an entry-level guide for DB development. It focuses on the core data structure and Data Access Algorithm of dB, and demonstrates how to use DB through actual code. At last, I would like to give a brief summary of the database and give some suggestions on the tool selection.
Preface
There are many database types on Unix/Linux platforms. Most of these types are listed in reference 1. Generally, when designing applications on the Unix/Linux platform, if there are many data types and the relationship between data is complex, some large enterprise-level database systems will be used, for example, DB2, Oracle, Sybase, etc. If the software is not large, small and medium databases such as MySQL and PostgreSQL are preferred. For example, using PHP/perl + MySQL/PostgreSQL to design websites is a common practice. However, when application software manages less data types (note: this does not mean that the amount of data to be managed is small), data management is not complex and requires high efficiency in data operations, berkeleydb, developed by the famous Berkeley (UC Berkeley), may be a wise choice.


DB Overview
DB was initially developed to replace the old hsearch function with a large number of DBM implementations (such as at&t's dBm, Berkeley's ndbm, and GNU project's gdbm) with the new hash access algorithm ), the first release of DB appeared in 1991 and included the B + tree data access algorithm. In 1992, the bsdunix 4.4 release included db1.85. Basically, this is the first official version of DB. In the middle of 1996, sleepycat was founded to provide commercial support for dB. Since then, DB has been widely used. The latest version is 4.3.27.

DB supports almost all modern operating systems, such as Linux, UNIX, and windows. It also provides a wide range of application interfaces, supports C, C ++, Java, Perl, TCL, Python, PHP, and so on. DB is widely used and can be seen in many well-known software. For example, in Reference 2, the author talked about using dB to implement a kernel-level File System in Linux. In reference 3, the actual test data shows that dB improves the efficiency of OpenLDAP. In Linux, the Software Package Manager RPM also uses dB to manage software package-related data. You can use the command file to view the files under the rpm data directory/var/lib/RPM. The output format is as follows:

Dirnames: Berkeley dB (btree, version 9, native byte-order)
Filemd5s: Berkeley dB (hash, version 8, native byte-order)

It is worth noting that dB is an embedded database system, rather than common relational/Object-based databases. It does not support SQL languages and does not provide common advanced functions of databases, such as stored procedures, triggers.

DB Design Philosophy
The design philosophy of DB is simple, small, reliable, and high performance. If some mainstream database systems are large and complete, databases can be called small and refined. DB provides a series of application interfaces (APIS), which are simple to call. The application and database provided by DB are compiled together into executable programs. This method greatly improves the efficiency of the database. First, the database and application run in the same address space. There is no expensive network communication overhead between the client program and the database server, and there is no communication between local host processes. Second: SQL code decoding is not required, and data access is straightforward.

DB has a simple view of the data to be managed. DB databases contain several records, each of which consists of keywords and data (key/value. Data can be a simple data type or a complex data type, such as a structure in C language. DB does not explain the data type, and is completely handled by the programmer. A typical "free" style of C language pointer. If the record is regarded as a table with N fields, the first 1st fields are the primary key of the table, and the second-N fields correspond to other data. DB applications usually use multiple databases. In a sense, multiple tables in a relational database are used. The database is very compact and does not exceed 500 kb, but can manage up to TB of data.

The Database Design fully embodies the Unix tool-based philosophy, that is, the combination of several simple tools can achieve powerful functions. Each basic function module of DB is designed to be independent, which means that its application field is not limited to DB itself. For example, the locking subsystem can be used for common operations of non-DB applications, and the memory shared buffer pool subsystem can be used for page-based file Buffering in the memory.

DB Core Data Structure
Database handle structure DB: contains several parameters describing database properties, such as the database access method type, logical page size, and database name. At the same time, the DB structure contains a large number of database processing function pointers, most of which are (* dosomething) (DB *, arg1, arg2 ,...). The most important functions are open, close, put, get, and so on.

Database record structure DBT: the records in the DB are composed of keywords and data, and the keywords and data are represented by the structure DBT. In fact, we can regard keywords as special data. The two most important fields in the structure are void * Data and u_int32_t size, which correspond to the data itself and the data length respectively.

Database cursor structure DBC: cursor is a common concept in database applications. It is essentially a traversal tool for specific records. Note that dB supports multiple records (duplicate records), that is, multiple records have the same keywords. Using a cursor is the easiest way to process multiple records.

Database environment handle structure db_env: The environment is an advanced feature in db. In essence, the environment is a package for multiple databases. When one or more databases are opened in the environment, the environment can provide various subsystem services for these databases, for example, multi-line/process processing support, transaction processing support, high-performance support, and log recovery support.

The core data structure in dB needs to be initialized before use. Then, you can call the functions (pointers) in the structure to complete various operations. Finally, you must close the data structure. From the perspective of design ideas, this design method is a model for implementing Object-Oriented Programming Using process-oriented languages.



DB Data Access Algorithm
In the database field, the data access algorithm corresponds to the data storage format and Operation Method on the hard disk. When writing an application, selecting an appropriate algorithm may increase the computing speed by one or more orders of magnitude. Most databases use B + Tree algorithms, and DB is no exception. It also supports the hash algorithm, recno algorithm, and queue algorithm. Next, we will discuss the features of these algorithms and how to choose based on the characteristics of the data to be stored.

B + Tree Algorithm: B + tree is a balance tree that stores keywords in sequence and its structure can be dynamically adjusted as data is inserted and deleted. To simplify the code, DB does not compress the prefixes of keywords. The B + tree supports constant-level data query, insertion, and deletion speeds. The keyword can be any data structure.

Hash Algorithm: Extended Linear hash algorithm (Extended linear hashing) is actually used in dB, which can be adjusted according to the growth of hash tables. The keyword can be any data structure.

Recno algorithm: each record must have a logical record number, which is generated by the algorithm itself. In fact, this is the same concept as the logical primary key of a relational database as the int auto type. Built on the B + tree algorithm, recho provides an interface for storing ordered data. The record length can be set to a fixed length or an indefinite length.

Queue algorithm: similar to the recno method, but the record length is fixed. Data is stored in the queue as a fixed-length record. The insert operation inserts records into the end of the queue, which is the fastest insert speed.

The algorithm selection should first look at the type of the keyword. If it is a complex type, you can only select B + tree or hash algorithm. If the keyword is a logical record number, you should select recno or queue algorithm. B + tree algorithm is suitable when the key words of the work set are ordered. If the work set is large and the key words are basically random distributions, select the hash algorithm. The queue algorithm can only store records with a fixed length. In the case of high concurrent processing, the queue algorithm is more efficient. In other cases, select the recno algorithm, the recno algorithm stores data in a flat file format.

Examples of common dB Functions



# Include <dB. h>
# Include <stdio. h>
# Include <stdlib. h>
# Include <pthread. h>

/* If the DB function is executed successfully, 0 is returned. Otherwise, the database fails */
Void print_error (int ret)
{
If (Ret! = 0)
Printf ("error: % s/n", db_strerror (RET ));
}

/* Data Structure DBT should be initialized before use; otherwise, the compilation may pass, but a parameter error is reported during running */
Void init_dbt (DBT * Key, DBT * Data)
{
Memset (Key, 0, sizeof (DBT ));
Memset (data, 0, sizeof (DBT ));
}

Void main (void)
{
DB * DBP;
DBT key, data;
U_int32_t flags;
Int ret;

Char * fruit = "apple ";
Int number = 15;

Typedef struct customer
{
Int c_id;
Char name [10];
Char Address [20];
Int age;
} Customer;
Customer Cust;
Int key_cust_c_id = 1;

Cust. c_id = 1;
Strncpy (Cust. Name, "javer", 9 );
Strncpy (Cust. Address, "Chengdu", 19 );
Cust. Age = 32;

/* First create a database handle */
Ret = db_create (& DBP, null, 0 );
Print_error (RET );

/* Create a database flag */
Flags = db_create;

/* Create a database named single. DB and use the B + tree Access Algorithm. This section of code demonstrates the processing of simple data types */
Ret = DBP-> open (DBP, null, "single. DB", null, db_btree, flags, 0 );
Print_error (RET );

Init_dbt (& Key, & data );

/* Assign values to keywords and data respectively and specify the length */
Key. Data = fruit;
Key. size = strlen (fruit) + 1;
Data. Data = & number;
Data. size = sizeof (INT );

/* Write records to the database. records with the same keywords cannot be overwritten */
Ret = DBP-> put (DBP, null, & Key, & Data, db_nooverwrite );
Print_error (RET );

/* Manually refresh the cached data to the hard disk file. In fact, when the database is closed, the data is automatically refreshed */
DBP-> sync ();

Init_dbt (& Key, & data );

Key. Data = fruit;
Key. size = strlen (fruit) + 1;

/* Query records with the keyword apple from the database */
Ret = DBP-> get (DBP, null, & Key, & Data, 0 );
Print_error (RET );

/* Pay special attention to the fact that the data field data in the data structure DBT is of the void * type, so the necessary type conversion is required when assigning values to data and values. */
Printf ("the number = % d/N", * (int *) (data. Data ));

If (DBP! = NULL)
DBP-> close (DBP, 0 );

Ret = db_create (& DBP, null, 0 );
Print_error (RET );

Flags = db_create;

/* Create a database named complex. DB and use the hash Access Algorithm. This section of code demonstrates the processing of complex data structures */
Ret = DBP-> open (DBP, null, "complex. DB", null, db_hash, flags, 0 );
Print_error (RET );

Init_dbt (& Key, & data );

Key. size = sizeof (INT );
Key. Data = & (Cust. c_id );

Data. size = sizeof (customer );
Data. Data = & Cust;

Ret = DBP-> put (DBP, null, & Key, & Data, db_nooverwrite );
Print_error (RET );

Memset (& Cust, 0, sizeof (customer ));

Key. size = sizeof (INT );
Key. Data = & key_cust_c_id;

Data. Data = & Cust;
Data. Ulen = sizeof (customer );
Data. Flags = db_dbt_usermem;

DBP-> get (DBP, null, & Key, & Data, 0 );
Print_error (RET );

Printf ("c_id = % d name = % s address = % s age = % d/N ",
Cust. c_id, Cust. Name, Cust. Address, Cust. Age );

If (DBP! = NULL)
DBP-> close (DBP, 0 );
}

DB cursor usage example
The cursor depends on the database handle. The application code framework is as follows:



/* Define a cursor variable */
DBC * cur;
/* First open the database and then open the cursor */
DBP-> open (DBP ,......);
DBP-> cursor (DBP, null, & cur, 0 );

/* Do something with cursor */

/* Close the database first */
Cur-> c_close (cur );
DBP-> close (DBP, 0 );

After a cursor is opened, You can traverse a specific record in multiple ways.



Memset (& Key, 0, sizeof (DBT ));
Memset (& Data, 0, sizeof (DBT ));

/* Because the key and data are empty, the cursor traverses the entire database record */
While (ret = cur-> c_get (cur, & Key, & Data, db_next) = 0)
{
/* Do something with key and Data */
}

If you want to query records corresponding to a specific keyword, you should assign a value to the keyword and set the flag in the cur-> c_get () function to db_set. For example:



key.data = "xxxxx";
key.size = XXX;
While((ret = cur->c_get(cur, &key, &data, DB_SET)) == 0)
{
/* do something with key and data */
}

There are many other functions of the cursor, such as querying multiple records, inserting, modifying, and deleting records.

DB environment example
The environment is a database package and provides a variety of advanced functions. The application code framework is as follows:



/* Define an environment variable and create it */
Db_env * dbenv;
Db_env_create (& dbenv, 0 );

/* Before the environment is enabled, you can call several functions in the form of dbenv-> set_xxx () to set the Environment */
/* Notify the dB to use the Rijndael encryption algorithm (reference 4) to process the data */
Dbenv-> set_encrypt (dbenv, "encrypt_string", db_encrypt_aes );
/* Set the database cache to 5 MB */
Dbenv-> set_cachesize (dbenv, 0, 5*1024*1024, 0 );
/* Set the database directory for Searching database files */
Dbenv-> set_data_dir (dbenv, "/usr/javer/work_db ");

/* Open the database environment. Note that the last four marks indicate the DB startup log, lock, cache, and transaction processing subsystem */
Dbenv-> open (dbenv, home, db_create | db_init_log | db_init_lock | db_init_mpool | db_init_txn, 0 );

/* After the environment is enabled, several databases can be opened, and the processing of all databases is under Environment Control and Protection. Note that the second parameter of the db_create function is the environment variable */

Db_create (& dbp1, dbenv, 0 );
Dbp1-> open (dbp1 ,......);

Db_create (& dbp2, dbenv, 0 );
Dbp1-> open (dbp2 ,......);

/* Do something with the database */

/* First shut down the opened database and then shut down the Environment */
Dbp2-> close (dbp2, 0 );
Dbp1-> close (dbp1, 0 );
Dbenv-> close (dbenv, 0 );

Install and compile dB Software
Slave from DB's official site.



../dist/configure
make
make install

Run make uninstall to uninstall the installed dB software.

By default, the database and header files are installed under the/usr/local/berkeleydb.4.3/directory in dB, and GCC test is used. c-ggdb-I/usr/local/berkeleydb.4.3/include/-L/usr/local/berkeleydb.4.3/lib/-LDB-lpthread can compile the program correctly. If the operating system of the testing host is redhat9, the installed DB version may be 4.0. Note that the libraries of these two versions are incompatible. For example, open the database function DB-> open (). In analyticdb 4.0, there are 6 input parameters, and in analyticdb 4.3, there are 7 input parameters (you can compare the header databases of two databases by yourself. h ). In DB-related applications, open functions are basically executed, so if the function and version do not match, compilation will definitely fail. After compilation, run the LDD command to view the dependency of the database.

Summary
DB is an industrial embedded database system with high data processing efficiency. The stability of DB functions has been tested for a long time, which is proven in the use of a large number of applications. It can be imagined that, under the same quality of code, the number of bugs in the software is proportional to the length of the Code, compared to dozens of megabytes, hundreds of megabytes of large database software, the DB size is less than kb!

From the perspective of implementation functions, DB is a lightweight database system, or can be called a "very" lightweight database system. However, in my opinion, it is meaningless to compare the quality of tools from an absolute perspective, the key lies in the selection and application of tools (it seems that you can refer to the idea of extreme programming ). Perhaps, the correct "expression Paradigm" should be: in the current application background, selecting this tool is the most appropriate.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.