BDB (Berkeley DB) database Brief introduction (reprint)

Source: Internet
Author: User

In the near future to use dbd, so searched the relevant information, first put a science-related bar:

Transfer from http://www.javaeye.com/topic/202990

DB Overview
DB was originally developed to replace the old Hsearch function with a large number of DBM implementations (such as the gdbm of the NDBM,GNU Project of Dbm,berkeley) with the new hash access algorithm, and the first release of DB appeared in 1991, when it included B + Tree data access algorithm. In 1992, the BSD UNIX 4.4 release included version DB1.85. Basically think this is the first official version of DB. In the middle of 1996, Sleepycat software company was established to provide commercial support for DB. After this, the DB has been widely used, the current version number is 4.3.27.

DB supports almost all modern operating systems, such as Linux, UNIX, Windows, etc., but also provides a rich application interface, support C, C + +, JAVA, PERL, TCL, PYTHON, PHP and so on. DB is widely used and can be seen in many well-known software. For example, in reference 2, the author talks about using DB to implement kernel-level file system under Linux, and in reference Data 3, the actual test data shows that DB improves the efficiency of OPENLDAP. The package Manager rpm under Linux also uses the DB management package-related data and can use the command file to view the files under the RPM Data folder/var/lib/rpm, then there are forms such as the following output:

Dirnames:berkeley DB (Btree, version 9, native Byte-order)
Filemd5s:berkeley DB (Hash, version 8, native Byte-order)

It is important to note that DB is an embedded database system, not a common relational/object database, does not support SQL language, and does not provide advanced features such as stored procedures, triggers, etc. that are common to the database.

The design idea of DB
The design idea of db is simple, small, reliable and high performance. Assuming that some mainstream database system is chatty, then db can be called small and fine. DB provides a range of application interfaces (APIs) that are very easy to invoke, and that the libraries provided by the application and DB are compiled together into executable programs. This approach greatly improves the efficiency of the db from both sides. First: DB libraries and applications execute in the same address space, there is no expensive network traffic between the client program and the database server, and there is no communication between the local host processes; second: there is no need to decode the SQL code and access to the data is straightforward.

DB views the data that needs to be managed very easy,db the database consists of several records, each consisting of keyword and data (key/value). Data can be simple data types, and can be complex data types, such as the structure of a C language. DB does not do any interpretation of the data type, completely by the program ape itself, the typical C language pointer "free" style. Assuming that the record is treated as a table with n fields, the 1th field is the primary key of the table, and the 2--n field corresponds to the other data. DB applications typically use multiple DB databases, in a sense, that is, multiple tables in a relational database. The DB Library is very compact and does not exceed 500K, but is capable of managing data volumes up to 256T.

The design of DB fully embodies the Unix tool-based philosophy that a combination of several simple tools enables powerful functionality. Each of the basic functional modules of DB is designed to be independent, meaning that its use is not confined to the DB itself. For example, the lock subsystem can be used for general operations of non-DB applications, and the memory-sharing buffer pool system can be used to buffer the page-based files in memory.

DB Core Data structure
Database handle structure DB: includes several parameters describing the properties of the database, such as database access method type, logical page size, database name, etc. at the same time, the DB structure includes a large number of database processing function pointers, mostly in the form of (*dosomething) (DB *, arg1, Arg2, ...). The most important of these are open,close,put,get functions.

The records in the database record structure dbt:db are composed of keyword and data, keyword and data are represented by structure DBT. In fact, it is entirely possible to think of keyword as special data. The two most important fields in the structure are void * data and u_int32_t size, respectively, corresponding to the data itself and the length of the data.

Database cursor structure DBC: Cursors are a common concept in database applications and are essentially a walker of a particular record. Note that DB supports multiple records (duplicate records), that is, multiple records have the same keyword, and using cursors is the easiest way to handle multiple records.

Database environment handle Structure DB_ENV: The environment is an advanced feature in DB, essentially, the environment is a wrapper for multiple databases. When one or more databases are opened in an environment, the environment can provide a variety of subsystem services for these databases, such as multi-line/process support, transactional support, high-performance support, log recovery support, and so on.

The core data structures in DB are initialized before they are used, then the functions (pointers) in the structure can be called to complete various operations, and finally the data structure must be closed. From the aspect of design thought, this design method is a model of object-oriented programming using the process language.

DB data Access algorithm
In the field of database, the data access algorithm corresponds to the storage format and operation method of the data on the hard disk. When writing an application, choosing the right algorithm may increase the speed of the operation by 1 or more orders of magnitude. Most databases use B + Tree algorithm, DB is no exception, at the same time also support hash algorithm, RECNO algorithm and queue algorithm. Next, we will discuss the characteristics of these algorithms and how to select them based on the characteristics of the data that needs to be stored.

B + Tree algorithm: B + tree is a balanced tree, keyword ordered storage, and its structure can be dynamically adjusted with the insertion and deletion of data. For the simplicity of the code, the DB does not implement prefix code compression on the keyword. The B + Tree supports constant-level speeds for querying, inserting, and deleting data. Keyword can be a random data structure.

Hash algorithm: The actual use of the extended linear hash algorithm (extended linear hashing) in db, can be adjusted according to the growth of the hash table. Keyword can be a random data structure.

RECNO algorithm: Requires that each record has a logical record number, and the logical record number is generated by the algorithm itself. In fact, this is the same concept as the logical primary key in the relational database, which is usually defined as an int auto type. Recho is built on the B + Tree algorithm and provides an interface for storing ordered data. The length of the record can be fixed or indefinite.

The queue algorithm: is close to the Recno way, only the length of the record is fixed long. The data is stored in the queue in a fixed-length record, and the insert operation inserts the record into the tail of the queue, in contrast to the fastest insertion speed.

The choice of the algorithm first depends on the type of keyword, assuming that the complex type, you can only select B + tree or hash algorithm, assuming that keyword is a logical record number, you should choose Recno or queue algorithm. The B + Tree algorithm is more appropriate when the working set is keyword, and the hash algorithm is chosen when the working set is larger and basically keyword to random distribution. The queue algorithm can only store a fixed length of records, in high concurrency processing, the queue algorithm is more efficient; assuming otherwise, the RECNO algorithm is chosen, and the RECNO algorithm stores the data as a flat file format.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.