Real-time Memory Database Data Management

Source: Internet
Author: User
Document directory
  • Real-time Data Security
  • Real-time Data Storage Technology
  • Real-time Data Warehouse Data grouping
  • Loading and exchange of data stored in the Real-time Data Base
  • Recovery of Real-time Data Warehouse
Real-time Data Security

 

----

1. Data Security requirements should be used in real time

---- It should be used in real time, and the operations before the event are compiled (Operation Type, sort order, etc) the data set and its structure, line as well as the intertemporal correlation among others can be pre-analyzed. However, for the disk data database, the I/O of data is a key factor that is inaccurate when the transaction is executed and inaccurate in the pre-Report. Therefore, the primary database of the data base needs to store the large internal storage as the actual data base, so that there is no I/O for a task during the active phase, in order to reach a more accurate pre-report, from the full real-time business decision-making. However, the two questions need to be solved, that is, the appropriate data security and the appropriate internal and external memory exchange.

---- 2. Reflect the reason for data security and the policy of data security in real time

---- Data on different storage layers has different read, modify, and write requirements, the primary cause of data security is the characteristics of data and affairs.

---- (1) data characteristics and impacts

---- Real-time should be used in the environment, and there is a "out-of-band Validity Period" associated with each data ", data security must take into account the characteristics of this type of practice. Real-time data can be classified into "long time limit" and "short time limit". The real-time data must be stored in the internal storage.

---- Data with high frequency should be stored in the memory.

---- Permanent data is a long-period reverse use and long-period valid data, temporary or temporary data is only stored in the inner until the expiration time.

---- Refers to the importance of data processing. In order to ensure the high performance of its business (especially when it is like the cutoff period of the actual business ), the key data is most securely stored in the internal storage.

---- (2) event characteristics and impacts

----Event-type shadow"Write-only" is the "Data Collection" in the current process control system or the work schedule, these tasks are short, weekly, and urgent (they cannot be blocked or waiting for), because their data should be stored in the memory. "Read-only" is generally a "control system" task in the current generation, this kind of business can be changed to the external environment state by thing before submission, however, it is impossible to resume the Undo under the meaning of the rule, and the effect of the "compensation" event through the operation will be eliminated, therefore, data cannot be transferred to external storage. New things are the same as general things.

----First-level EffectThe first level of event optimization represents the closeness of the event. In this case, the data of the first level event must be stored frequently and cannot be exchanged.

----Considerations for resuming affairsSimilar to data, the characteristics of the log and its security strategy are the main factors that affect the smooth turning of the event-restarting and moving the log, for the real-time data base, the "memory-type" log is required.

Real-time Data Storage Technology

 

----

What does it mean to store data in the database? It's not the same, but we recognize it, the definition of an in-memory data base should not be related to the size and size of the in-memory data, the amount of I/O required for data storage, and how the data is imported. to stay in the internal storage and so on, while only the data warehouse containing data is often stored in the internal storage (not a magnetic disk), affairs (not a system) the data storage is only related to the internal storage. The internal data warehouse is the best technique for supporting real-time affairs, its feature is its "master copy Bay" or "Work version" resident, that is, the active event is only handed in with the internal copy bay of the real-time data warehouse. Obviously, it requires a large amount of internal storage, but it does not require any time when the entire data base can be stored in the internal storage, that is, the internal data warehouse system still needs to handle I/O. Although this is true, it is no longer an overview of the data base of the traditional magnetic disk, the data structures applicable to the data warehouse of data transfer, the calculation and optimization of event processing methods, and the control and re-recovery techniques are not applicable to the internal data warehouse A fixed combination.

---- Therefore, the design of the in-memory data base should break the design concept of the data base of the traditional magnetic disk, consider the features of direct connection and fast storage, the high-efficiency usage between the CPU and memory space is used as the goal to re-design and develop various strategies and calculation methods, techniques, methods, and machine systems.

---- In real time, we need to ensure that the system can accurately pre-report the operation time of the event, but for the data database of the disk, the delay of waiting and lock, such as data transfer from disk storage, internal and external storage, management of slow-moving area, queuing, etc. the difference between the execution time and the estimated worst condition is very large, if the entire data warehouse or its main "work" is put into the internal storage, when I/O is not included in the execution process of each event, it is the operation time between the system's more accurate estimation and the arrangement, so that a good dynamic state can be provided for support by force, at the same time, it also laid the foundation for the time limit system of real-time tasks.

Real-time Data Warehouse Data grouping

 

----

1. Empty data warehouse structure

---- Using the internal data warehouse technology, the data warehouse storage space is a layer-4 structure: easy to lose memory M1, easy to lose memory M2 (Non-Volatile RAM), magnetic disk storage M3 and file type magnetic tape storage M4.

---- M1 stores the work data supporting various tasks, so it is called the "Work version" O-DB of the real-time data base. It is directly stored by a task, and a general task only communicates with it.

---- M2 is the extension of M1. It stores active temporary data and is called "temporary version" T-DB. O-DB and T-DB are collectively referred to as the "internal storage version" (M-DB) of the real-time data library ).

---- M3 is used to store the data warehouse points that are not stored in the data warehouse. However, the data warehouse backups must be stored and used for recovery. This part of the data library is called the real-time data library of the "External Storage version" (S-DB ).

---- M4 is like a deactivating magnetic belt, which is used to store images in the entire state at a time in the previous data database, the "back-to-aid version" A-DB, known as the real-time data database, is only for the purpose of security protection and for the long-period storage of files.

---- This kind of real-time data warehouse structure is based on the internal data warehouse technology, considering the application of various types of data, it is reasonable to use Semantic Features and system functions.

---- 2. Management Data grouping

---- The physical organization of the Data Warehouse stored in real time is the basis for its overall physical design, its Storage Structure, cable reference structure, and medium data storage structure must consider the special feature of direct storage, this describes two ways of material management constructor.

---- (1) area-Section

---- Zone-segment grouping is based on the relational data model. It divides the data storage space into zones ", each shard stores a correlation system, which is organized by the "segment" of RuO Gan. A segment is a persistent connection zone in the inner storage. It is regarded as a "page" and is a single bit of I/O in the inner and outer storage, it is also the configuration of the inner storage space and the recovery of the inner storage data warehouse.

---- (2) shadow inner Storage

---- It divides the empty Data Warehouse into two parts: the primary copy PDB and the "shadow" Copy SM. During the task operation period, each query is performed on the SM first. If the task is not successful, then the PDB operation is performed. All new operations are performed in SM and recorded in active motion logs. Every time a task is submitted, the "back-to-image" produced by it in SM is copied to PDB.

---- 3. Cable reference structure

---- The data base of a disk has multiple efficient cable structures. AVL Tree, B-tree, and B +-tree are the most representative.

---- For the real-time data warehouse, they all have a common key gap, it is because the efficiency usage and utilization rate of storage are very low, and the search performance is not as good as that in the disk. For this reason, we have developed a kind of inner storage cable that has both aVL and B-trees and is unable to meet the requirements to construct the Sb-tree. The Search Class is similar to the binary tree. The difference is that the ratio of the primary node to each knot is not the element value of the needle, the largest (I .e. the rightmost) and the smallest (I .e. the leftmost. Sb-tree maintenance operations are similar to AVL trees, but are constructed by their unique knots, therefore, the insertion and deletion of specific knots are different.

Loading and exchange of data stored in the Real-time Data Base

 

----

1. Main Cause for loading and changing data

---- There are two sides of the film due to the effect of M-DB data installation and exchange strategy, that is, the data of the body and the characteristics of the event.

----Data VariabilityThe variable speed. Different data types have different rate of changing data, which requires frequent propagation and updating.

----Data ReplicationMemory sampling rate. It is necessary to ensure that the active data volume is more accessible.

----Data streamingIt refers to the new and time. The number of data streams must be consistent with the real world before the real world.

----Data correlationIt refers to the process in which multiple data is often used. When data is loaded or exchanged, data with strong correlation should be loaded or exchanged at the same time.

----Features of Event ManagementIn this case, we only consider the specific characteristics of data loading and exchange. First, the data of parent and child affairs is shared in Embedded cases. Therefore, when the data is stored and exchanged with the external store, this must be noted. This is a real-time event. The ordering and timing of data loading must be guaranteed to meet the preset time limit. In addition, the data of the premium event should be stored in the internal storage and cannot be exchanged.

---- 2. Initial installation

---- When the internal data warehouse is initially installed, the first concern is the priority level of the event. The first-level and higher-level tasks are first loaded into the internal storage, or the first-level tasks are not classified into the internal storage according to the tuning policy; the next is the streaming of data, and the next is the active leap, data with a high sampling rate is usually the data to be stored first. Data with close confidentiality is considered for use at the same time.

---- The basic idea of initial installation is to divide the full family of Data databases into sub-sets based on their memory acquisition rate and affinity, then, find the highest level of storage Optimization for each sub-set, and then calculate the base level based on the internal storage volume, load the sub-sets with the higher priority of the plus-weight storage into the internal storage.

---- 3. Exchange of internal and external data

---- In real time, there is no need to find all the data in the database, therefore, the system still needs to provide an internal and external storage data exchange strategy to support the implementation of the internal storage data database, data exchange strategies must consider the following factors:

  • High variable real-time data must be often stored in the internal O-DB and can not be exchanged out.
  • Live or high frequency data should be stored in the O-DB, a general should not be exchanged out.
  • The data of the vertical stream Row cannot be exchanged before the first processing.
  • Data of a high-priority event cannot be exchanged during the active phase of the event, especially when the event is a weekly event, the data should be normally stored in O-DB.
  • For non-permanent data and key data, it is best not to replace them. Non-permanent data does not need to be exchanged; when the key data is critical, it must be guaranteed to be effective.

---- A single bit of data that is exchanged in the first row is usually a metagroup set (page or block ).

Recovery of Real-time Data Warehouse

 

----

1. Restore the system

---- For the real-time data warehouse, restoring and re-displaying is more of a critical key, which is not very big as the restoration technique of traditional systems, the main requirement is to present in aspects such as the log application, verification point technology, and data re-installation strategy. Generally, the recovery model should be based on the following principle:

  • The focal point of recovery is internal storage instead of the disk data base.
  • Recover the body to ask for full should not take the sacrifice of the event of the M-DB storage can be used as the price, this means that the recovery and event should be able to perform different steps I/O.
  • Restoration can be used to maintain the overall fitness of the entire system, it is necessary to develop restoration techniques and tools suitable for real-time data databases.

---- The logs recorded in the real-time data base can be set:

  • The Shuzhi slow-moving area and the work area of the event are combined and saved in the province.
  • Only the "back-to-back image" is recorded in the log slowing down area, that is, only redo-only logs are required, and undo logs are not required.
  • Each active event has its own Server Load balancer (stable log buffer), and then it is clustered into pages by data library, write the disk log one page at a time.

---- The purpose of the checkpoint is to reduce the workload of minor recovery, it only processes the tasks starting from the last checkpoint, the task is to replace the changes made by these tasks to the data warehouse. There are three strategies for preparing copies (inspection points:

----Event-induced backup (checkpoint)It originally reflected a task activity, so the standby or all of them were post-ying images (the task was submitted) or the operator is always a forward image ).

----Active and standbyIt originally reflected every activity (not the whole task), so the backup can have both the front and back images, however, there will be no record for the reversed part to change to a more state.

----Paste preparation (check point)It does not guarantee the original sub-nature of the event or activity.

---- Repair and change of active tasks when two kinds of mandatory checkpoints are to be verified during the period of inspection, there are multiple calculation methods to achieve this goal. The representative is static checkpoint, black checkpoint, and new checkpoint.

---- 2. Data warehouse re-installation, recovery, and recovery

---- Re-installation is by the Data Warehouse external storage version of The S-DB and log recovery data warehouse storage version of The M-DB. There are two types of reinstallation: full reinstallation and partial re-installation. After the full re-installation is the fault caused by electrical disconnection, the initial installation strategy can be used here. In this case, the exchange strategy is only applicable to the situations in this case. It is only the case that the data to be exchanged is selected.

---- A sort-wise re-installation method with priority level is more effective for the real-time data database. It takes into account the priority level, install the required data first, so that the system can restart and run the data as quickly as possible, and then load the data step by step as needed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.