Research on lightweight Memory Database

Source: Internet
Author: User
1. background with the rapid development of informatization in all walks of life in China since the reform and opening up, our military informatization has also made great progress, with the ever-increasing informatization of all military services from military daily to wartime, from the back to the front, from superiors to the grass-roots, and the increasingly electronic and streamlined command functions, the status of the existing primary large disk database is

1. background with the rapid development of informatization in all walks of life in China since the reform and opening up, our military informatization has also made great progress, with the ever-increasing informatization of all military services from military daily to wartime, from the back to the front, from superiors to the grass-roots, and the increasingly electronic and streamlined command functions, the status of the existing primary large disk database is

1. Background

With the rapid development of informatization in all walks of life in China since the reform and opening up, our military informatization has also made great progress, with the ever-increasing informatization of all military services from military daily to wartime, from the back to the front, from superiors to the grass-roots, and the increasingly electronic and streamlined command functions, currently, the storage of large disk databases cannot fully meet the needs of electronic warfare, unmanned warfare, and lightning warfare for large volumes of data and fast access. A mechanism for fast data access is urgently needed.

At the micro level, at present, applications may have no need to use databases, but the data is stored in text files. However, as the requirements become more complex and the data grows, the processing logic of applications in data storage will become more and more complex, and the running efficiency will be lower and lower, therefore, if there is a data storage mechanism that can be as fast, convenient, and transparent as accessing local files, it can solve the problem of application development.

Based on this, this paper designs a lightweight memory database system that can quickly respond, and describes the key technologies and the design scheme of the system.

2 Overview

2.1 lightweight memory database definition

Lightweight memory databases are a new area of research. Currently, there is no authoritative and accepted definition for lightweight memory databases. Its main feature is its primary copy or "working version" resident memory. Lightweight memory databases are developed based on the traditional disk database technology. They store databases in the memory to avoid indirect access to data through the buffer manager, make the lightweight memory database have better response time and throughput than the traditional disk database system. At present, there are already lightweight memory database products in foreign countries, but none of them can be recognized and widely used in the industry like Oracle, Sybase, SqlServer, and DB2, most lightweight memory databases are still in the laboratory research phase, and the application of lightweight Memory Database technology is even less extensive, because there are still many technical issues that need further research.

2.2 advantages of lightweight memory databases

The features of traditional disk-based relational databases have been widely accepted by the industry. Lightweight memory databases are new transactions and cannot replace traditional relational databases, however, in some specific scenarios, it features smaller sizes, lower management costs, and faster efficiency than traditional relational databases. The following describes the main advantages of lightweight memory databases over traditional relational databases:

1) because of the large gap between memory and disk access efficiency, the access speed of the primary memory of the memory database is in nanoseconds while that of the disk is in milliseconds;

2) Because I/O calls for disk access are saved, the memory database mainly uses pointer operations in the memory, which greatly reduces the number of calls to CPU commands;

3) No I/O is required for transaction processing, which greatly improves system performance;

4) The buffer manager is no longer needed, eliminating the overhead of Data Copying between the disk and memory;

5) Pointers are widely used in data organization and management, which simplifies memory management and reduces space overhead.

3 lightweight Memory Database Design

A lightweight memory database is a type of database. Its main functions are the same as those of traditional relational databases. It also provides the ability to organize, access, and maintain data. The system is designed to ensure the efficient utilization of CPU and memory space. It has the main management functions of conventional databases, including DML, DDL, database transactions, backup, and recovery of data; in terms of data standards, follow the standard SQL92 standard; have certain security; support the vast majority of data types; in terms of interfaces and use, like common databases, you can use ADO programming; in terms of performance, the memory database directly maps the entire database to their local address space and directly accesses data through user processes, this avoids the interaction between expensive remote process calls and buffer manager in typical disk commercial databases.

3.1 Architecture

The system architecture of a lightweight memory database is shown in:

3.2 function module description

3.2.1 Application Layer

The Application Layer mainly includes all applications that use lightweight memory databases as data processing mechanisms. These applications may have used large relational databases to store data, however, the operational efficiency of large databases cannot meet the needs of applications. It may have been originally stored as files, but the file storage method is complicated and the application itself needs to process the data storage logic. Using the lightweight memory database in this article can solve these problems well, that is, it can improve efficiency and reduce the logic of data storage.

3.2.2 external interface layer

To improve the availability of lightweight memory databases, you must provide both C and C ++ access interfaces. Currently, the simplest and most convenient interface for database access is the ADO interface. Therefore, this memory database must provide the ole db interface. Based on the ole db interface, the DELPHI, Visual C ++, and vbprograms can also conveniently use the memory database.

3.2.3 Transaction Management

Transactions in a lightweight memory database include transaction pre-analysis, concurrency control, scheduling management, and recovery mechanisms. An important feature of a memory database is high efficiency, in addition to the transaction logic consistency and time sequence consistency function, the transaction processing must be highly efficient.

(1) transaction scheduling. The memory database system is a transaction processing system used to handle high-efficiency workloads. It mainly indicates that the transaction duration must be met. This includes scheduling various resources by transactions, CPU usage by each transaction, and scheduling of resources such as data, I/O, and memory between multiple concurrent running transactions.

(2) concurrency control. Similar to traditional databases, transactions in memory databases run concurrently and shared data is accessed. As a result, mutual interference may occur between them, resulting in problems such as update loss, data inconsistency, and cascading rollback. Concurrency Control is to control the interaction between concurrent transactions so that the consistency of real-time databases is not damaged.

The concurrency control of the system is based on the lock mechanism. The size of the lock object is called the lock granularity. Generally, the entire database, Link, page, or record can be locked. If the lock granularity is small, the concurrency is high, and the lock maintenance overhead is also large. The lock granularity is large, the concurrency is low, and the lock maintenance overhead is small. The lock types include shared Lock S and exclusive lock X. Some systems introduce the intention lock concept to solve the multi-granularity locking problem. When locking any node, they must first apply the intention lock to its upper node, the intention share lock IS and intention exclusive lock IX are correspondingly added. For ease of use, the shared lock and intention exclusive lock are combined to form a SIX lock. Apply for locking in the order of top-down; release the lock in the order of bottom-up.

(3) Restore management. Attempts to restore the database to the correct state when the transaction fails

3.2.4 data storage layer

The main function of the data storage layer is to process access to data, including the status of the memory database and the data of applications, it includes data definition, data access, data processing, log management, configuration file management, and T-tree indexes.

(1) Data Definition: The main function of this module is to manage the data dictionary of the memory database. It includes the insert, delete, and modify operations of the data dictionary;

(2) Data Access: This module is responsible for quick access to user data, including adding, deleting, modifying, and querying user data;

(3) data processing: This module is used for user data type conversion, data validity detection, primary key uniqueness check, and constraint conflict;

(4) log management: The main purpose of introducing log management is to improve the reliability of lightweight memory databases so that they can be restored in the event of an application crash, therefore, all operations and data performed by the memory database are recorded in the log.

(5) configuration file management: the configuration file in the system stores the database operation policy parameters.

(6) T-tree index management: T-tree is an index structure suitable for memory database systems. It is an index proposed to adapt to the storage characteristics of primary storage. T-tree is a balanced binary tree that combines the features of B-tree and AVL-tree. The T-tree has three different types of nodes. Nodes with two Subtrees are called internal nodes; nodes with one child are called semi-leaf nodes; nodes without children are called Leaf nodes. For each internal node N, in addition to some elements and some control information, there is always a left (right) subtree pointed by the left (right) Child pointer. When nodes in the T tree are arranged in ascending order, the leftmost element of the node is the smallest element, the rightmost element of the node is the largest element, and any left (right) each element in a subtree node is smaller than (greater than or equal to) the smallest (larger) element of the node. The system uses the T tree for index management and storage. Therefore, it can greatly improve the memory database access efficiency.

3.2.5 Resource Management Layer

Resource management includes CPU management, time management, memory management, and I/O scheduling management. Because database transactions are the core theme of applications accessing databases, the specific functions of resource management will also be interspersed in various functions of transaction processing. Here, resource management includes an I/O scheduling management because the memory database also contains an operation log file and the memory database does not lose data when it is not active, these files will be stored in the disk and stored in the physical disk. Therefore, the local memory database also has the I/O scheduling problem.

3.2.6 data entity Layer

A data entity is the presence of all data information, including the memory data dictionary stored in the memory and memory data, as well as the log files stored in the disk, data files, and memory database configuration files.

3.2.7 Operation Control

Running Control refers to the functions required for the normal operation of the memory database, including loading, detaching, and thread management of the memory database.

(1) Database loading: the memory database does not require all data to be stored in the memory, but only requires the "working version" of the data to be stored in the memory. Data is loaded to ensure the normal operation of transactions. Some parameters must be used for data loading.

(2) database uninstallation: the function of this module is to solidify the memory data into the disk storage after the memory database finishes running the task.

(3) thread management: the threads in the memory database include the global state monitoring thread, user Transaction Thread, and log thread. Memory databases generally exist in the form of components and dynamic link libraries. Instead of running processes separately, they are embedded into other applications for running. Therefore, this requires you to manage the threads of the memory database based on the events.

4 Key Technologies of lightweight memory databases

Because lightweight memory databases involve many technologies in the process of research and implementation, including data organization, data protection, indexing in the memory database, log policies, log I/O bottlenecks, checkpoint policies, database restart loading, concurrency control, SQL parsing and assembly, etc, due to the relationship between the length and layout, we will not detail them here. Below we will only describe the loading of the memory database and the T-tree index.

4.1 lightweight Memory Database loading technology

The system uses the synchronous loading technology to synchronize transaction processing with most data loads. In this way, the startup wait time for memory databases can be greatly reduced.

The synchronous loading policy combines the traditional database technology with the memory database technology to enable the memory database to respond to transaction requests as soon as possible, reducing the time required to wait for the database to load but not to process transactions, this allows the system to respond to user requests after loading part of the data, just like the disk-based database system. After the whole database is synchronized and loaded, it can have efficient access to the memory of the memory database. That is, the technology inherits the fast loading feature of the disk-based database, and retains the advantages of the memory database for fast response to transactions.

The following is the thread loading algorithm.

1) apply to obtain the lock for the database to load bitmap.

2) Check the database load bitmap in sequence. If (some consecutive) pages are not in the memory, repeat the following operations until these consecutive pages are in the memory.

A. Release the database to load bitmap locks.

B. Obtain the load lock.

C. For these consecutive pages, when there are still unloaded pages.

A) apply for the database to load bitmap locks.

B) Re-check the loading status of the continuous page to obtain the page range that is not actually loaded.

C) release the database to load bitmap locks.

D) if there are still pages not loaded, load them sequentially.

D. Apply for the database to load bitmap locks.

E. update the database load bitmap to reflect the latest loading status.

F. Release the database to load bitmap locks.

G. Release the load lock

H. Apply for Database loading bitmap lock

3) release the database to load bitmap locks.

Implementation Technology of 4.2 T-Tree Index Structure

As mentioned above, the system uses the T tree to manage the storage space of lightweight memory databases.

We define a TTree class to manage the space of a lightweight memory database. The management scope includes all the database spaces after the 32nd database pages. Every free space in this area is an element in a node of the T tree (Class fTtreeNode. The elements of the entire TTree are arranged in ascending order based on the starting address of the free space. In addition, the control information in each fTtreeNode also contains the left subtree, right subtree, and the longest free space of the current node, which is used for convenient space allocation. The fTtreeNode member variables are as follows:

Offs_t left; // pointer to the left subtree

Offs_t leftMaxLength; // maximum free space size in the left subtree

Offs_t right; // pointer to the right subtree.

Offs_t rightMaxLength; // maximum free space size in the right subtree

Offs_t currMaxLength; // maximum space of all elements of the node

Nat2 nItems; // Number of valid elements of the node

Int1 balance; // The balance based on the node, which is the number of layers in the right subtree more than that in the left subtree

FTTreeItemitem [FREE_TREE_PAGESIZE]; // array of Elements

Left and right are the database offsets. The left (right) Sub-root node can be found based on the starting address of the memory where the database is located and the offset; FREE_TREE_PAGESIZE indicates the maximum number of fttreeitems that each fTtreeNode can possess.

TTree is constructed based on the idea of T tree, so there are also balanced operations, but there are special features. First, it stores elements as the starting position of free space and the length of free space, rather than the record's RID (or object id oid ); in addition, because the T-tree is only used to manage free space in the database, it only requests space allocation and space release operations, so it does not have a search operation, but only insert and delete operations. Its insertion and deletion operations are different from the general T-tree. The insertion is because the Database releases a certain space (with the start position and length) and inserts the corresponding elements of the space into the TTree, during the insertion process, if the front and back areas adjacent to the free space are already in the free space tree, these free areas need to be merged, three elements may be merged into one, it is also possible to combine the two elements into one.

5 Implementation of lightweight memory databases

Due to the special requirements of users in this project (that is, the vast majority of applications are connected directly to the Oracle database, and the vast majority of applications are based on the C/S architecture, and ADO accesses the Oracle fat client applications, in order to save the cost and time as much as possible when introducing a memory database, we must consider the following issues when designing a lightweight memory database: convenient transplantation of memory databases and Oracle, consistency between data types of memory databases and Oracle, consistency between SQL syntax supported by memory databases and Oracle, and consistency between memory database access interfaces and Oracle. The following describes the implementation of these problems:

5.1 data type implementation

Lightweight memory databases implement most Oracle data types, including: Number, Int, Integer, Varchar, Varchar2, char, Date, Time, DateTime, Boolean, and string.

Of course, as memory databases with high efficiency requirements, there are also unsupported types. Here we mainly refer to four data types: Binary Big Data Blob, Long Raw, Raw, and BFile, because the processing of these Binary large objects is time-consuming, and because the memory database's physical memory capacity is fixed, it is not suitable for supporting Binary large object types, the actual requirement is that if you use these types of data, you will use the Oracle database instead.

5.2 SQL92 support

Lightweight memory databases support the vast majority of Functions of standard SQL92: supported object types include table, indexe, trigger, and view; supported database DML database operation languages include INSERT, DELETE, UPDATE, for DDL data definition, SELECT supports Alter Table, Create Table, Drop Table, Alter View, Create View, Drop View, Alter Index, Create Index, and Drop Index; in terms of transactions, BEGIN Transaction, COMMITTransaction, ROLLBACK Transaction, and other commands are supported. In terms of constraints, keywords such as UNIQUE, NOTNULL, CHECK, and Primary Key are supported.

5.3 Implementation of the ole db Interface

Lightweight memory databases support General ADO functions by implementing a standard ole db Provider Program, and ole db is a memory database. The following shows the implemented ole db interface and interface diagram:



5.4 performance testing:

After testing a large volume of data, the lightweight database uses the test tool LoadRunner7.8 to insert, delete, query, and change 100,000,000 records. The operation results are as follows:

Insert operation 1 record: 23 microseconds;

Select Operation 1 record: 16 microseconds;

Update operation 1 record: 10 microseconds;

One Delete operation record: 93 microseconds;

It can be seen that the efficiency of the local memory database has increased by more than an order of magnitude than that of the Oracle database in seconds.

6 conclusion

Through the study of the lightweight memory database project, we have a deep understanding of the principles of the memory database. Through the implementation of this database, we have a comprehensive grasp of the key technologies of the memory database. Tests and usage show that compared with other memory databases, this memory database has great advantages in terms of functions, performance, and ease of use. At present, the memory database has been used in the Command application software as the core of the data buffer. At the same time, the virtual database project of this Unit also uses this memory database as the database of the virtual data view.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.