I. Introduction to Berkeley DB
(1) Berkeley DB is an embedded database that is suitable for managing massive, simple data. If Google uses it to save account information, Heritrix uses it to save Froniter.
(2) Key/value is the basis on which Berkeley DB is used to manage data, and each key/value represents a record.
(3) Berkeley DB is implemented using B-tree at the bottom, which can be regarded as a hashmap that can store large amounts of data.
(4) It is a product of Oracle company, the C + + version of the latest appearance, and then Java and other versions are also emerging. It does not support SQL statements, and the application operates on the database through the API.
The following content is reproduced to Baidu Library
Berkeley DB is a library of open-source embedded databases developed by the United States Sleepycat software, which provides scalable, high-performance, transaction-protected data management services for applications. Berkeley DB provides a concise set of function call API interfaces for data access and management.
It is a classic c-library model of the toolkit, providing programmers with a wide range of function sets, designed for application developers to provide industrial-strength database services. Its main features are as follows:
Embedded (Embedded): It is directly linked to the application and runs in the same address space as the application, so the database operation does not require interprocess communication whether it is between different computers on the network or between different processes on the same computer.
Berkeley DB provides API interfaces for a variety of programming languages, including C, C + +, Java, Perl, TCL, Python, and PHP, all of which occur within the library. Multiple processes, or multiple threads of the same process can use the database at the same time, as they are used separately, the underlying services such as locking, transaction logs, shared buffer management, memory management, and so on, are executed transparently by the library.
Lightweight and flexible (portable): It can run on almost all UNIX and Linux systems and their variant systems, Windows operating systems, and a variety of embedded real-time operating systems. It operates on both 32-bit and 64-bit systems and has been used by a number of high-end Internet servers, desktops, PDAs, set-top boxes, network switches, and many other application areas. Once the Berkeley DB is linked to the application, the end user generally does not feel that there is a database system present at all.
Scalable (Scalable): This is manifested in many ways. The database library itself is very streamlined (less than 300KB of text space), but it can manage databases up to 256TB in size. It supports high concurrency, and thousands of users can manipulate the same database at the same time. Berkeley DB can run on tightly constrained embedded systems with a small enough footprint, and can consume several gigabytes of memory and several terabytes of disk space on high-end servers.
Berkeley DB is better than relational and object-oriented databases in embedded applications, with the following two reasons:
(1) because the database library runs in the same address space as the application, the database operation does not require interprocess communication. The cost of process communication between different machines in a machine or in a network is much larger than the cost of a function call;
(2) Because Berkeley DB uses a set of API interfaces for all operations, there is no need to parse a query language or generate execution plans, which greatly improves operational efficiency.
BERKELEYDB System Structure
Berkeley DB consists of five major subsystems. Includes access management subsystem, memory pool management subsystem, transaction subsystem, lock subsystem, and log subsystem. The Access management subsystem is the internal core component of the Berkeley DB database process package, while the other subsystems exist outside the Berkeley DB database process package.
Each subsystem supports different levels of application.
1. Data Access Subsystem
The data Access Methods subsystem provides a variety of support for creating and accessing database files. Berkeley DB provides the following four file storage methods:
Hash files, b-trees, fixed-length records (queues), and variable-length records (based on the simple storage of record numbers), the application can choose the most appropriate file organization structure.
When a programmer creates a table, you can use either structure, and you can mix files of different storage types in the same application.
In the absence of transaction management, the modules in this subsystem can be used alone to provide fast and efficient data access services for applications.
The data access subsystem is intended for applications that do not require transactions for fast format file access.
2. Memory Pool Management Subsystem
The memory pool subsystem effectively manages the shared buffers used by Berkeley DB. It allows multiple threads that concurrently access the database to share a cache, and is responsible for writing the modified page back to the file and allocating memory space for the newly paged page. It can also be used independently of the Berkeley DB system to allocate memory space for its own files and pages, separately from the application. The Memory pool management subsystem is intended for applications that require flexible, page-oriented, and buffered shared file access.
3. Transaction subsystem
The transaction (Transaction) subsystem provides transaction management capabilities for Berkeley DB. It allows you to think of a set of modifications to a database as an atomic unit, which either does it all or does nothing. By default, the system will provide strict acid transaction properties, but the application can choose not to use the isolation guarantees made by the system. The subsystem uses two-segment lock technology and write-ahead log policy to ensure the correctness and consistency of database data. It can also be used by the application alone to protect its own data updates. The transaction subsystem is suitable for applications that require transaction assurance data modification.
4. Lock Subsystem
The Lock (Locking) subsystem provides a locking mechanism for Berkeley DB, providing multi-user reads to the system and single-user modification of the shared control of the same object. This subsystem can be used by the data access subsystem to gain read and write access to the page or record, and the transaction subsystem uses the locking mechanism to implement concurrency control for multiple transactions. The subsystem can also be used by the application alone. The lock subsystem is suitable for a flexible, fast, configurable lock manager.
5. Log Subsystem
The log (Logging) sub-system uses a first-write-log policy to support the transaction subsystem for data recovery and data consistency. It is unlikely that the application will be used alone, only as a calling module of the transaction subsystem. The above parts constitute the entire Berkeley DB database system. The relationship between the parts is as follows:
In this model, the application directly calls the data access subsystem and the transaction management subsystem, which in turn calls the lower memory management subsystem, the lock subsystem, and the log subsystem.
Because several subsystems are relatively independent, the application can specify at the beginning which data management services will be used. Can be used in all or only part of it. For example, if an application needs to support multi-user concurrency, but does not require transaction management, it can
Use only the lock subsystem and not the transaction. Some applications may require a fast, single-user, non-transactional B-tree storage structure, so applications can invalidate the locking subsystem and the transaction sub-system, which reduces overhead.
berkeleydb Storage features overview   
Berkeley db the logical organizational unit of data that is managed is a number of independent or relational databases (database), each of which consists of several records, all of which are represented as (Key,value) forms. If a set of related (key,value) pairs is also considered a table, then each database is allowed to hold only one table, which is different from the general relational database. In fact, the "database" referred to in Berkeley DB is equivalent to a table in a general relational database system, whereas a "key/data" pair is equivalent to a row in a relational database system (rows), Berkeley DB does not provide direct access to columns in a relational database, but in the key/ Data pair in the data item to encapsulate the field (column) through the actual application.
In a physical organization, each database can be created with an application that chooses an appropriate storage structure based on its data characteristics. The four file storage structures to choose from are hash files, B-trees, fixed-length records (queues), and variable-length records (simple storage based on record numbers).
A physical file can hold only a single database, or several related or unrelated databases, and these databases can be organized in any different way than the queue, and the database organized by the queue can only be stored separately in one file. Cannot be mixed with other storage types.
A file can theoretically store any number of databases in addition to the maximum file length and storage space constraints. Therefore, locating a database typically requires two parameters-"file name" and "Database name", which is Berkeley DB different from the
General relational database. The
berkeley DB Storage System provides an array of interface functions for the application to manage and manipulate the database. These include the following:
(1) database creation, opening, closing, deletion, renaming and so on, as well as the retrieval of data and additions and deletions to the operation;
(2) Provide some additional functions, such as reading the database status information, reading the information of the file, reading the information of the database environment, emptying the contents of the database, synchronous backup of the database, version upgrade, error message and so on;
(3) The system also provides a cursor mechanism for accessing and accessing groups of data, as well as correlating and equivalent connection operations to two or more related databases;
(4) The system also gives some interface functions to optimize the access policy configuration, such as the application can set their own B-tree sorting comparison function, the minimum number of keys per page, the filling factor of the hash bucket, the hash function, the maximum length of the hash table, the maximum length of the queue, the database storage byte order,
The size of the underlying storage page, the memory allocation function, the size of the cache, the size of the fixed-length record and the padding bit, the delimiter used to change the length of the record, and so on.
II. Application of Berkeley DB
1, from the official website http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/ index.html Download the installation files for Berkeley DB and the Java Development Kit.
2. Install Berkeley DB in Windows and follow the next step. For ease of development, the Windows version is installed, and the Linux version should be used for formal operation. (Error setting path, need to run Setup as Administrator).
3. Put the jar files in the Java development package into BuildPath. Mainly includes Je-6.0.11.jar, Jejconsole.jar, Epjejconsole.jar three packs.
Test procedure:
Package Com.ljh.test;import static Org.junit.assert.*;import Org.junit.before;import Org.junit.test;public class berkeleydbutiltest {private Berkeleydbutil dbutil = null, @Beforepublic void Setup () {dbutil = new Berkeleydbutil ("D:/tmp" );} @Testpublic void Testwritetodatabase () {for (int i = 0; i < i++) {dbutil.writetodatabase (i+ "", "Student" +i, True);}} @Testpublic void Testreadfromdatabase () {String value = Dbutil.readfromdatabase ("2"); Assertequals (Value, "Student 2");} @Testpublic void Testgeteveryitem () {int size = Dbutil.geteveryitem (). Size (); Assertequals (size, 10);} @Testpublic void Testdeletefromdatabase () {dbutil.deletefromdatabase ("4"); Assertequals (9, Dbutil.geteveryitem (). Size ());} public void Cleanup () {dbutil.closedb ();}}
Basic operation of Berkeley DB:
Include the following sections
(1) Open the database
(2) Writing data to the database
(3) reading a certain data according to the key value
(4) Read the full data list
(5) Delete a data based on key value
(6) Closing the database
Note: Because individual operations may correspond to the same database, is it necessary to use singleton mode?
Package Com.ljh.test;import Java.io.file;import Java.io.unsupportedencodingexception;import java.util.ArrayList; Import Com.sleepycat.je.cursor;import Com.sleepycat.je.cursorconfig;import Com.sleepycat.je.database;import Com.sleepycat.je.databaseconfig;import Com.sleepycat.je.databaseentry;import Com.sleepycat.je.Environment;import Com.sleepycat.je.environmentconfig;import Com.sleepycat.je.lockconflictexception;import Com.sleepycat.je.lockmode;import Com.sleepycat.je.operationstatus;import Com.sleepycat.je.transaction;import Com.sleepycat.je.transactionconfig;public class Berkeleydbutil {//Database environment private Environment ENV = null;//Database Private static database Frontierdatabase = null;//db name private static String DbName = "Frontier_database";p ublic berkeleydbutil (S Tring homedirectory) {//1, create environmentconfigenvironmentconfig envconfig = new Environmentconfig (); Envconfig.settransactional (True); Envconfig.setallowcreate (true);//2, using Environmentconfig configuration environmentenv = new Environment (New File (homedirectory), envconfig);//3, creating databaseconfigdatabaseconfig Dbconfig = new Databaseconfig (); Dbconfig.settransactional (True);d bconfig.setallowcreate (TRUE);//4, Use environment with Databaseconfig to open databasefrontierdatabase = Env.opendatabase (null, DbName, dbconfig);} /* * Write records to the database and determine if duplicate data can be available. Incoming key and value * If you can have duplicate data, then use put () directly, if there is no duplicate data, then use Putnooverwrite (). */public boolean writetodatabase (String key, String value, Boolean Isoverwrite) {try {//Set Key/value, Note that the databaseentry is used within the bytes array Databaseentry thekey = new Databaseentry (key.getbytes ("UTF-8"));D atabaseentry thedata = New Databaseentry (Value.getbytes ("UTF-8")); Operationstatus status = NULL; Transaction Txn = null;try {//1, Transaction configuration transactionconfig txconfig = new TransactionConfig (); Txconfig.setserializableisolation (true); txn = Env.begintransaction (null, txconfig);//2, write data if (isoverwrite) {status = Frontierdatabase.put (Txn, Thekey, thedata);} else {status = Frontierdatabase.putnooverwrite (Txn, thekey,thedata);} Txn.commit (); if (status = = operationstatus.success) {System.out.println ("to database" + DbName + "write:" + key + "," + value "), return true; else if (status = = Operationstatus.keyexist) {System.out.println ("writes to database + DbName +": "+ key +", "+ Value +" fails, the value already exists " ); return false;} else {System.out.println ("write to Database" + DbName + ":" + key + "," + Value + "failed"); return false;}} catch (Lockconflictexception lockconflict) {txn.abort (); System.out.println ("Write to Database" + DbName + ":" + key + "," + value+ "with lock exception"); return false;} catch (Exception e) {///error handling System.out.println ("Write to Database" + DbName + ":" + key + "," + value+ "error"); return false;} /* * Read data from the database incoming key returns value */public string Readfromdatabase (string key) {try {databaseentry Thekey = new Databaseentry (k Ey.getbytes ("UTF-8"));D atabaseentry thedata = new Databaseentry (); Transaction Txn = null;try {///1, configuration Transaction related information transactionconfig txconfig = new TransactionConfig (); Txconfig.setserializableisolation (true); txn = Env.begintransaction (null, txconfig);//2, reading data operationstATUs status = Frontierdatabase.get (Txn, Thekey,thedata, Lockmode.default); Txn.commit (); if (status = = operationstatus.success) {//3, convert bytes to stringbyte[] Retdata = Thedata.getdata (); String value = new String (Retdata, "UTF-8"); SYSTEM.OUT.PRINTLN ("from database" + DbName + "read:" + key + "," + value "; return value;} else {System.out.println ("No Record found for key '" + key + "'."); Return "";}} catch (Lockconflictexception lockconflict) {txn.abort (); SYSTEM.OUT.PRINTLN (read from database + DbName +: "+ key +" Lock exception appears "); return" ";}} catch (Unsupportedencodingexception e) {e.printstacktrace (); return "";}} /* * traverse all records in the database, return list */public arraylist<string> Geteveryitem () {//TODO auto-generated method StubSystem.out.println ("=========== Traverse database" + DbName + "all data =========="); Cursor mycursor = null; arraylist<string> resultlist = new arraylist<string> (); Transaction Txn = null;try {txn = this.env.beginTransaction (null, NULL); Cursorconfig cc = new Cursorconfig (); cc.setreadcommitted (true); if (mycursor = = null) MyCursor = Frontierdatabase.opencursor (Txn, cc);D atabaseentry foundkey = new Databaseentry ();D atabaseentry Founddata = new Databaseentry ();//Use the Cursor.getprev method to traverse the cursor to get the data if (Mycursor.getfirst (Foundkey, Founddata, Lockmode.default) = = operationstatus.success) {string thekey = new String (Foundkey.getdata (), "UTF-8"); String thedata = new String (Founddata.getdata (), "UTF-8"); Resultlist.add (Thekey); System.out.println ("Key | Data: "+ Thekey +" | "+ thedata+" "); while (Mycursor.getnext (Foundkey, founddata, lockmode.default) = = Operationstatus . SUCCESS) {Thekey = new string (Foundkey.getdata (), "UTF-8"), Thedata = new String (Founddata.getdata (), "UTF-8"); Resultlist.add (Thekey); System.out.println ("Key | Data: "+ Thekey +" | " + Thedata + "");}} Mycursor.close (); Txn.commit (); return resultlist;} catch (Unsupportedencodingexception e) {e.printstacktrace (); return null;} catch (Exception e) {System.out.println (" Geteveryitem processing Exception "); Txn.abort (); if (mycursor! = null) {Mycursor.close ();} Return null;}} /* * Delete a record in the database based on the key value */public Boolean deletefromdatabase (String key) {Boolean success = False;long Sleepmillis = 0;for ( int i = 0; I < 3; i++) {if (Sleepmillis! = 0) {try {thread.sleep (sleepmillis);} catch (Interruptedexception e) {e.printstacktrace ();} Sleepmillis = 0;} Transaction Txn = null;try {///1, use Cursor.getprev method to traverse cursor Fetch data TransactionConfig txconfig = new TransactionConfig (); Txconfig.setserializableisolation (true); txn = Env.begintransaction (null, txconfig);D atabaseentry thekey;thekey = new Databaseentry (Key.getbytes ("UTF-8"));//2. Delete data and submit operationstatus res = Frontierdatabase.delete (TXN, Thekey); Txn.commit (); if (res = = operationstatus.success) {System.out.println ("Removed from database" + DbName + ":" + key); SUCCESS = True;return Success;} else if (res = = Operationstatus.keyempty) {System.out.println ("not found in database" + DbName + "):" + key + ". Cannot delete ");} else {System.out.println ("delete operation failed due to" + res.tostring ());} return false;} catch (Unsupportedencodingexception e) {e.printstacktrace (); return false;} catch (Lockconflictexception lockconflict) {System.out.println ("delete operation failed with lockconflict exception"); sleepmillis = 1000; Continue;} finally {if (!success) {if (TXN! = null) {Txn.abort ();}}}} return false;} public void Closedb () {if (frontierdatabase! = null) {Frontierdatabase.close ();} if (env! = null) {Env.close ();}}}