Database principle and application-DBMS of database management system

Source: Internet
Author: User

2018-02-20 14:35:34

Database management System ( English: DatabasesManagementsystem, abbreviation :DBMS) is an object-oriented database, A large-scale computer software management system designed to manage the database. Representative data management systems are:Oracle, Microsoft sql Server,Access, MySQL, and PostgreSQL.

First, the DBMS kernel

Parser: Compiler, or parser

Grant checking: Authorization checker to check permissions issues for a specific user

Semantic Analysis and query treatment (DDL QL DML DCL): Semantic parsing and query processing, core in core, function implementation of SQL statements

Access management: Accessing management, mapping tables and files

Concurrency Control: Concurrency controls

Recovery mechanism: Recovery mechanism

Ii. the process structure of the DBMS

1. Single Process structure

2, multi-process structure, MULTI processes structure

An application process corresponds to a DBMS core process, and whenever an application process needs to access the database, a connection request is initiated, and the system generates a DBMS core process and creates a pipeline for interaction.

3, multi-threaded structure, MULTI threads structure

Creating a process in the operating system consumes a lot of system resources, and if you use a multi-process structure, each time a new DBMS core process is created for the new application, the resources of the operating system are quickly exhausted. Here, you can use the multithreaded structure for optimization. Threads can be seen as a lightweight process in which a process can contain multiple threads, and each thread consumes much less system resources than the process.

Multiple threads in the same process share the process's data, with the following shared content in the DBMS core process:

DAEMON: used to listen for requests from the application, set up a thread for each new application's connection request;

Catalog : A directory That records information about tables in a database;

Lock table : lock tables, which play a role in concurrency control;

Buffer : buffers;

Third, Database Access Management

Database access management is done by translating access to the database into file objects in the operating system. Database access management, that is, the file structure definition on the physical layer and the access path of the file will directly affect the database query speed. No one file structure can be once and for all, so we need to consider a few questions:

    1. type of access;
    2. Document structure;
    3. Indexing technology;
    4. Access to the original language;

1) Access type, Access Types

Query all or most records of a file (>15%): The number of tuples in the query accounted for more than 15% of the files, which we think is this type, generally this large number of queries can be stored using heap files;

Query some special record

Query Some records (<15%)

Scopequery: Range queries

Update

2) file storage structure, files Organization

Heap files : Sequential scan for querying, suitable for querying more than 15% of the query operations

Hash file : High query efficiency, suitable for querying special query operations

index file : Heap file and B + Tree index, most widely used, also most commonly

Dynamic Hash: The mapping space changes with the size of the data volume

Raw Disk: Note that the logical order and physical order of a file are two concepts, in general, the physical order is controlled by the operating system. Using raw disk, you can determine where the file is stored on disk, apply for continuous storage space at a time, store continuously by attribute value, also known as Clustering index, cluster index.

3) Indexing Technology

Four, query optimization

The relational database was objected to when it was first raised, because the use of a relational database is inefficient when queried, such as a 10,000-tuple table and a 50,000-tuple table to connect, which would result in a 500 million-tuple, which is unacceptable. But why is the relational database becoming mainstream now? Because of its query optimization technology has made great progress.

Query optimization will rewrite the user's SQL language, generate more efficient Query language query, the fundamental goal is to use as few resources as possible to return the results of the query to the user.

Can be divided into two-step optimization:Algebra optimization (algebraic optimization, is rewriting); Operation Optimization (Operation optimization)

For example: Calculate x^2 + 2xy + y^2, which can be converted into equivalent (x+y) ^2 by algebraic optimization, so-called Operation optimization is to choose the appropriate method of addition and multiplication.

    • Algebra optimization

The basic goal of algebraic optimization is to rewrite the user's query operations and generate better forms. The basic principle is to push the unary operation down as far as possible; look for merging common sub-expressions .

    • Operation optimization

The goal of operational optimization is to implement a better algorithm for each operation, such as projection, connection operation, etc. The following is an example of a join operation.

The method of connection operation one: nested loops, the algorithm complexity is O (nm).

Method of connection Operation two: Merge scan, the premise according to the value of the connection attribute is sorted out.

The method of connection operation three: based on the B + index cycle, through the B + tree to find, rather than simply traverse.

Iv. Recovery mechanisms

The database recovery mechanism refers to: prevent the database error , after the failure, can be restored, the database into a stable state .

There are two basic principles of the recovery mechanism: redundancy is necessary, that is, to perform the required backup work, and to be able to predict all possible failure conditions.

How to implement the recovery mechanism:

1) Periodical Dumping

periodical Dumping: Backs up the entire database once every once in a while.

One drawback to this approach is that a full backup database is not always possible, so errors in two backups can result in a lot of missing updates.

The workaround is simply to add incremental backup plus Incremental dumping on a discontinuous backup basis. Backup operations can be performed more than once due to the small amount of incremental data. Although there are still missing updates, the missing content is less because of the short interval. Early use, suitable for small databases.

2) Backup + Log

Backup plus log, the so-called log, is actually a running account, record backup one, the user to make all changes to the database. the issue of missing updates is not generated .

V. Transaction processing mechanisms

Operations on the database are run as a transaction, and if no indicated transaction are displayed, each SQL is a transaction by default .

A transaction (Transaction) is a set of SQL statements that has the following characteristics of acid:

    1. atomicity (Atomic): either all is executed, or none of them are executed;
    2. consistency (consistent): When a transaction is complete, it must keep all data in a consistent state. In a related database, all rules must be applied to transaction modifications to maintain the integrity of all data. At the end of the transaction, all internal data structures (such as B-tree indexes or doubly linked lists) must be correct;

    3. isolation (insulation): Concurrently running transactions cannot interfere with each other;

    4. Persistence (Duration): Once a transaction is complete, the operation is permanently reflected on the database.

VI, concurrency control

Concurrency refers to the multi-user database management system, allowing multiple transactions to access the database at the same time.

Benefits of Concurrency: increased system utilization and reduced response time, and because transactions are likely to access different contents of the database, concurrency can greatly improve efficiency .

1) Three types of conflicts that may occur in concurrency

    1. write–write: Write conflicts can cause problems with missing updates. Write conflicts must be avoided .
    2. write–read: Read-write conflicts can cause dirty reading of data.
    3. read–write: Read and write conflicts, which can cause duplication of data read conflict problems.

2) The theory of serializable

The serializable theory is the basis for judging whether the result of parallel operation is correct. The goal of concurrency control is that the concurrency results are serializable.

Serializable means {T 1, T 2,... T n} The concurrency result of these n transactions is equal to one of the results of its serial run (n! Any of them), then we assume that the result of this concurrency execution is correct.

3) Blockade law

Locking mechanism is the most common concurrency control strategy, the core idea is: in the concurrent transaction to the database data access needs to be locked in advance, if there is no conflict, then each lock their own resources, if there is a conflict, then the first to grab the lock first run method to execute, In fact, it is a method of forcing serialization by means of locking mechanism, so that the concurrency result can be serialized.

(a) X locks

There is only one lock in the system, which is an exclusive lock, and the transaction must apply for an exclusive lock to read or write the data.

theorem : Parallel scheduling is well-formed + 2PL, it can be proved that the parallel results are serializable.

(well-formed: Every transaction is very well-behaved, that is, each time you read and write data before the lock will be requested)

(2PL: The transaction before reading and writing data will be unified to apply for locks, and finally unified release of the lock, that is, after releasing the lock resources no longer continue to apply for locks)

(b) (s,x) locks

We know that there is no conflict between multiple transactions reading one data at a time, so the system will be inefficient if it is simply an exclusive lock. In order to further improve the efficiency, the (s,x) locks was proposed.

S Locks: Read operation request

X Locks: Write operation request

(c) (s,u,x) locks

We know that in the update operation, in many cases it is first to read the data and then modify it, and finally write back. We can define a U lock before writing back, and then upgrade to x lock when we write back. That is, Jianfengchazhen, as much as possible to postpone the lock time of the X lock, allowing read operations during the update, can further improve efficiency.

4) Deadlock and live lock

Deadlock (Dead Lock): There is a cyclic wait between multiple concurrently running transactions, that is, each transaction has a partial lock resource, and is eager to obtain the other's lock resources, so that no one transaction can get all the resources to complete the entire transaction, called a deadlock.

For example, the following example, TA applied for R1 Lock, TB applied for R2 Lock, the two are no conflict, you apply for your, I apply for my. TA also applied for R2 lock, TB and applied for R1 lock, this time is very embarrassing, because both will enter the waiting, waiting for their lock was released, so entered the loop wait, two transactions can not get all the resources.

Live Locks (live Lock): Although other transactions have acquired lock resources for a limited period of time, due to the problem of system scheduling, a transaction has not yet obtained the resource, known as a live lock, for long.

For example, the following example, for the data r,t1,t2 application got S lock, read operation, then t to apply for X lock, found that there is a s lock, so wait, then there are t3,t4 ... To read operations, according to the definition can continue to apply to the S lock, so t in the next re-application will still find that R still have S lock, and continue to wait, this is called a live lock, also known as starvation phenomenon.

Solution:

For the live lock, the solution is relatively simple, can adopt FIFO, first-out scheduling scheme to solve.

For deadlocks, there are two types of solutions, as follows.

(1) Solving (permit it occurs, but can solve it)

    1. TimeOut, that is, the wait time for each transaction set a constant, if the wait time exceeds the threshold, it is considered that a deadlock occurs, the transaction is killed, and later re-executed, in a small system to use;
    2. Wait Graph method , according to the transaction and wait for the construction of the waiting diagram, the waiting graph can be found to determine whether there is a deadlock, check whether there are two main options for the timing of the ring, one is the new addition of the edge of the time to carry out the sentence; , then kill it and finally re-execute the victim;

(2) Prevention (don ' t let it occur)

    1. all locks are required to be fully applied during the initialization phase : This either obtains all the locks, executes the statements, or none of them, and does not have a partial lock resource. (Not very realistic in the database system)
    2. to sort resources, you must apply from high to low as required : So for example two transactions to apply for R1,R2 lock, the first transaction to R1, the second transaction is no way to apply to the R2 lock, because the order must wait for the R1 lock is released, apply to the R1 lock To apply for a R2 lock. (Not very realistic in the database system)
    3. Transaction Reset Method : A more practical method in the database. Specifically, you define a timestamp for each transaction (time Stamp), which can be used either as a tid or as a comparison of the ages of two transactions. In the event of a conflict, we can compare the age of two transactions, this time there are two strategies: i) Wait-die: old age waiting for the young, that is, if the current transaction is younger, kill yourself, and then automatically re-execute the original timestamp for a period of time, if you are older, go to wait. II) Wound-wait: Young waits for old age, namely encounters the collision, if the current transaction compares the old age, then kill the other side, if oneself younger, then enters Waits. Note that both scenarios are kill-young transactions, and because of the single-direction nature of the execution of the transaction, no deadlock occurs. In addition, because the transactions before the current transaction are more than a few, there is no permanent wait for the live lock phenomenon.

Database principle and application-DBMS of database management system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.