Design and implementation of Multi-model multi-mode database engine

Source: Internet
Author: User
Tags postgresql relational database table

Nowadays, with the development of "Internet" and "intellectualization" of business and the development of architecture "microservices" and "cloud", application system puts forward new standards and requirements for data storage management, the diversity of data has become a challenge for database platform, and the database field has also spawned a new mainstream direction.

Database multimode multi-model means that the same database supports multiple storage engines, which can meet the unified management needs of the application for structured, semi-structured, unstructured data.

1. Database cloud requirements spawned Multi-model multimode
Enterprises using cloud Database docking more and more applications, the traditional approach is to provide more than 10 different database products in Dbpaas to meet various needs, such a method after the system increases, the overall maintainability and data consistency management cost is high, will affect the use of the whole system.


"Multimode" for cloud databases

In order to realize the unified management and data fusion of business data, the new database needs the ability of multi-mode (Multi-model) data management and storage. Typically, structured data refers specifically to the data storage structure of a form type, typical applications include traditional services such as banking core transactions, while semi-structured data is used massively in scenarios such as user portrait, IoT device log acquisition, and application clickstream analysis, while unstructured data corresponds to massive images, videos, and document processing have grown rapidly under the development of fintech.

The multi-mode data management capability enables the database to unify the data storage and management across departments and services, realize multi-service data fusion and support diversified application services. In architecture, Multimode Multi-model is also for cloud database requirements, so that the database uses a set of data management system can support a variety of data types, so support a variety of business models, greatly reducing the cost of use and operation.

2.multi-model Storage Engine Architecture
Databases are at the heart of many existing business systems. With the rapid development of data generation and acquisition technology, the data volume is exploding and the data structure is more and more flexible. The traditional database management system based on relational theory, facing the real arrival of big data and artificial intelligence, has met the challenge in cost, performance, expansibility, fault-tolerant ability and so on.
Faced with multiple types of structured data, semi-structured data, unstructured data, modern applications present different storage requirements for different data, and the database needs to adapt to this multi-type data management requirement.
The two popular solutions are: Hybrid persistence (Polyglot persistence) and Multimode database (Multi-model).
1) Mixed Persistence Polyglot persistence
The idea of hybrid persistence is that the user chooses to use the appropriate database according to the different needs of the work, so that in a complete system, many different databases may be running simultaneously.

Fig. 1 Polyglot Persistence
One notable advantage of hybrid persistence is the performance improvement of a single process, but the drawbacks are equally obvious: challenges in deployment, use, and maintenance, at the expense of increased complexity and learning costs.

2) Multimode Multi-model
Multi-model multimode database is another way of thinking, in the same database has multiple data engine, the various types of data for centralized storage and use. Many different types of applications, while simultaneously accessing a database and managing it within the same distributed database, greatly simplifies application development and later maintenance costs.


Figure 2: Multi-mode Database engine architecture

The graph is multi-mode Multi-model database, we can see in the same storage engine with relational data, JSON semi-structured data, object data and full-text search engine, and so on a number of data engines, unified provided to the. This architecture greatly reduces the development and operation of the difficulty, the application of unified connection to the database, the data within the database partition, isolation and management, for the application only need to connect to the database, do not need to set up a corresponding data background for each application.

3. Storing Data structures
For the requirements of multi-mode database, the storage data structure of distributed database also has new innovation. The following is sequoiadb in the multi-model aspect, the data storage structure and the access design and implementation, can serve as a good reference for the Multimodel database.

3.1 Structured, semi-structured data storage
Structured data is characterized by a fixed structure, with the attributes of each row being the same, such as data in a traditional relational database table. Semi-structured data is a self-describing structure that contains related tags for separating semantic elements and layering records and fields, such as Xml,json.
Storage structure
How do you manage both structured and semi-structured data in the data engine? SEQUOIADB uses the JSON data model to store structured and unstructured data in the collection as documents, using the Bson format within the database.
BSON (binary JSON) is a binary encoded data format for JSON, which, like JSON, supports embedded document and array BSON. A BSON is stored as a single entity by several key-value pairs, which are called documents. BSON contains the data types in JSON and extends some data types that are not in JSON, such as Date,bindata. A simple example of the BSON structure is shown in.

Figure 3:bson Structure Example
BSON has several features: lightweight (lightweight), ergodic (traversable), and high efficiency (efficient). Because the Bson structure contains enough self-describing information, it is a schema-less form of storage.

Sequoiadb Bson as a record storage structure, because of its good flexibility, do not need to define the structure of the collection beforehand, each record contains the field information can be the same, can be different, and can be modified at any time, In this way, both the structure and the semi-structured data can be uniformly stored and accessed in a consistent manner.

The data management model in SEQUOIADB is shown in Figure 4.

Figure 4:SEQUOIADB Data management model architecture diagram

The data is ultimately stored in a disk file, and the three concepts associated with it are as follows:
? File: A physical file on disk that is used to persist collection data, indexes, and LOB data.
Page: A page is a basic structure for organizing data in a database file, using pages in SEQUOIADB to manage and assign space in a file.
Data Block (Extent): consists of several pages for storing records.

In this model, the three core logic concepts related to structural/semi-structured data storage include:
? Collection Space: An object that is used to store a collection, physically corresponding to a file on a set of disks.
Collection (Collection): The logical object that holds the document.
Document: A record stored in a collection, stored in a BSON structure.

A collection contains several extent, all of which are concatenated using a list of extent. When you insert a document into the collection, you need to allocate space from the extent. If there is not enough space for the current extent, assign a new extent (extend the file if necessary), hang it on the extent linked list of the collection, and insert the document into it. The records within each extent are also organized in a linked list so that all records in the block can be read sequentially when the table is scanned.

Data access
1) SQL
Currently, a large number of database-based applications use SQL for database access, so SQL support is an essential capability for a database. SEQUOIADB supports standard SQL interfaces, fully compatible with PostgreSQL and MySQL syntax and protocols, and existing applications can smoothly switch the storage system to sequoiadb to gain the scalability, performance, and reliability of the distributed storage System.

2) API
SEQUOIADB provides a rich API interface for managing the entire cluster and operational data in structured data, providing a variety of mainstream compiler language drivers.

Data compression
For the Json/bson data structure, because of its nested structure, in the flexible storage structure, it can also cause data expansion. The expansion of JSON data storage is also an important reason for the performance bottleneck of JSON database such as MongoDB in early stage.

SEQUOIADB the data compression mechanism in the data engine in order to avoid the excessive expansion problem when using Json/bson as the data storage structure. Currently, the SEQUOIADB engine provides two kinds of compression methods: Row compression and table compression. Row compression uses the snappy algorithm, which is a fast compression mechanism that does not require a dictionary. Table compression uses the LZW algorithm, which is a dictionary-based compression mechanism.

Data compression mechanism, on the one hand, save space and cost from storage, on the other hand, improve the efficiency of unit I/O. In the query scenario with very high IO throughput, the deep compression mechanism based on the data dictionary can greatly reduce the IO cost and effectively improve the query efficiency.

3.2 Non-structured data storage
Storage structure
Unstructured data is data that has no fixed structure, such as documents, pictures, audio/video, etc., and this type of data is becoming more and more significant in many of today's businesses. In sequoiadb, this type of data is managed using large objects (Lob,large object).
A large object is attached to a common set, and when a user uploads a large object, the system assigns it a unique OID value, and subsequent operations on that large object can be specified by that value.
Large objects are fragmented when they are stored, and the hash algorithm is used to store the shards in the appropriate partition group, which is consistent with the hash space of the owning collection. The Shard size is the LOB page size, specified when the collection space is created, and defaults to 512KB.
To efficiently store and manage LOB data, SEQUOIADB internally abstracts LOB data into metadata and the data itself, and uses two files to store the data: LOBM files are used to store the metadata for lob shards, and LOBD files are used to store true LOB data shards. Their logical structure is as shown.

Figure 5:lob File Logical Structure
Among them, LOBM files mainly include:
? File header: Contains some metadata information for the file.
? Space management Segment (SME): Used to mark the use of a page.
Bucket Management Segment (BME): A page occupied by a shard with the same hash value hangs on a bucket in the form of a doubly linked list.
Page: Corresponds to page one by one in Lobd, records the collection information that the page belongs to, OID and sequence values, and so on.

LOBD files mainly include:
? File header: Contains some metadata information for the file.
? True data page: used to store LOB shards. The LOB also has its own metadata, which is stored in a shard of sequence 0, including the size of the LOB data, the creation time, the version number, and so on.

Data access
1) write to LOB
When the LOB data needs to be written, the LOB data is fragmented on the coordination node, and each shard is assigned a sequence value that represents the order of the shards in the original LOB data. Therefore, the OID of the LOB and the sequence value of the Shard uniquely identify the Shard.
When storing a LOB shard, the hash value is computed using its OID + sequence. First, use the partition hash function of the collection to calculate which partition group The Shard is to be stored on, then use the hash function of the LOB shard to calculate which bucket it is attached to, then allocate the data page in the LOBD and LOBM files, complete the data write, and the pages in the LOBM are attached to the corresponding bucket.

2) Read LOB
When fetching LOB data, you need to specify its OID value. The engine obtains a shard with a sequence value of 0 based on the OID value, reads out the metadata information for the LOB, then computes the Shard, determines all shard information, and sends the request to all the partition groups that contain the shards.
When the coordination node receives the Shard data returned at all levels, the LOB data is consolidated and restored in sequence order to obtain the full LOB data.

3) standard POSIX file system interface
In addition to the LOB API, the Sequoiafs file system is currently available, which is based on a set of file systems implemented by Fuse in Linux, supporting a common file manipulation API. Sequoiafs uses the Sequoiadb collection to store file and directory property information, and the LOB object stores the data contents of the file, thus implementing a similar NFS distributed network File system. A user can attach a collection of remote sequoiadb to a local node by mapping it so that files and directories can be manipulated through the common File system API under the target directory of the Mount node.

4. Summary
According to Gartner, Multi-model Multimode is one of the main technical directions in the field of database in recent years, which represents a new concept of multi-type data management under the cloud architecture, and is also a new choice to simplify operation and maintenance and save development cost.
SEQUOIADB's Multi-model database products, which are now in use in many industries, prove that the market is slowly accepting this new database architecture. We also see that databases such as Mysql,postgresql are beginning to support multi-type formats such as JSON, as well as in the direction of Multi-model. The future believes that the products will continue to maintain innovation, more Multi-model database products appear.

Design and implementation of Multi-model multi-mode database engine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.