Why do we need a business?
In the database (ii), the origin of the database we refer to the transaction.
In addition to the database of queries and other operations are abstracted, another important function is the transaction . Why do we need a business? Because when we manipulate the data, we may encounter problems that multiple threads can manipulate data at the same time, or we may encounter the problem of sudden database failure, which can result in inconsistent data. So the business is to ensure consistency .
The first means of ensuring consistency is the lock , which is to handle multiple connections to the database at the same time. Because we might assign a thread to each connection, and those threads might be able to manipulate the same piece of data at the same time, there would be inconsistencies . So we have to add locks when we write, which means that only one thread can access the data.
We also encounter database crashes, so we ask that a transaction must be atomic , that is, either all or not at all. Bob, for example, turns 100 bucks to Smith, or Bob has 100, or Smith has 100, and there's no middle state.
For stand-alone transactions, you need to ensure that
- Atomic Nature
- Consistency
- Isolation of
- Durability
The so-called acid, let's describe how they are implemented in turn.
Atomic undo Log
The so-called atomicity means either success at the same time or failure at the same time. For example, Bob has 100 dollars in his account and $0 in Smith's account, and now we want Bob to turn 100 to Smith.
The so-called atomicity is either Bob successfully transferred to the SMITH100 block, at this time Bob has 0 yuan, Smith has 100 pieces. Or failed, Bob still has 100 bucks, Smith is 0 yuan. There's no way Bob turned the money out, and Smith didn't get the money.
Now let's think about what we want to do with this business.
Lock Bob Account
Lock Smith Account
Check to see if Bob has 100 bucks, and if so, cut 100 from the account.
Add 100 bucks to Smith's account.
Unlock Bob and Smith in turn
But the execution of the transaction will not always be smooth sailing, there may be unexpected , such as Bob or Smith account does not exist what to do? It doesn't matter, we can roll back to the previous state.
But it is not possible for the database to record every state, which requires us to record the previous state before the transfer.
For example, we look at the Middle State of the transfer operation.
- bob:100,smith:0
- bob:0,smith:0 (transfer now)
- bob:0, smith:100 (transfer successful)
We can insert two undo segments, they are recorded in the log.
- bob:100,smith:0
- bob:0,smith:0 (transfer now)
- The previous status is: bob:100,smith:0
- bob:0, smith:100 (transfer successful)
- The previous status is: bob:0,smith:0
If you want to roll back, you only need to backtrack the log to achieve. This
Another possibility is that the transaction is not finished, and the system crashes. after the system restarts, you have to do a recovery operation. How did it recover, also through the log . We have to write down what we have to do before we can really do it.
We will write before the start of the transaction:
Bob original 100 Yuan, Smith original 0 yuan
If the transaction executes half of the power outage, then after the restart we can follow the log to recover , and then still is * * Bob has 100 Yuan, Smith has 0 yuan . Even if the recovery is 100 times, this is still the result, this is called idempotent * *, so the recovery process is also a power outage, we can still follow the log to recover.
Now there is one more question unsolved, how do you know that a transaction has not been completed?
We can also do this by logging the logs. For example, when we recorded, not only the balance of the record, but also the beginning of the transaction and the end of the two actions marked.
Like what
[Start transaction T1] [Transaction t1:bob original 100]
[Transaction t2:smith original 0] [Commit TRANSACTION T1]
This way, if you see the COMMIT transaction T1 or rollback transaction T1in the log, we know that the transaction is over. If you see only the start of a transaction T1, you have to recover. For example, the following should be restored.
[Start transaction T1] [Transaction t1:bob original 100]
[Transaction t2:smith original 0]
Also, after the recovery , you need to add a row in the log file to Roll back the transaction T1, so that the next recovery will not have to consider T1 this transaction, because it has already returned to the previous state.
time when the Undo log was written to a file
The above discussion in fact we have deliberately overlooked a problem, that is, the undo log needs to be loaded into memory to read and write, but if the log is not written well, what if the power outage?
In fact, we just have to grasp the time to write the log file is OK.
The easiest thing to think about is writing a log to a file in the first place, just like writing a draft before you write it, and then just copy it once by following the draft.
However, the reality is, in the beginning, we do not know which field the program to operate, how to log it, of course, can not write to the file. So it must be one side of the undo log in memory, while looking for the opportunity to write to disk.
For example, the above transfer operation, we can actually modify and write the log.
Operation |
Data Buffers |
Log Buffers |
Start Transaction T1 |
|
[Start transaction T1] |
Bob = Bob-100 |
Bob's New Balance is 0 |
[Transaction T1,bob original balance is 100] |
Write a log to a file |
|
Note that the buffer is emptied after the log is written to the file |
Write Bob balances to file |
|
|
Smith = Smith + 100 |
|
[Transaction T1,smith original balance 0] |
Write a log to a file |
|
Note that the buffer is emptied after the log is written to the file |
Write Smith balance to file |
Smith's New Balance is 100 |
|
Commit TRANSACTION T1 |
|
[Commit TRANSACTION T1] |
Write a log to a file |
|
Note that the buffer is emptied after the log is written to the file |
To summarize, it is
When the balance changes, the previous balance is recorded
Before the balance is written to the hard disk, the log must be written to the file before the log buffer is emptied.
The log that commits the transaction must be written only after all balances have been written to the hard disk
That is, the balance changes during the transaction, after the balance is formally written to the hard disk, it is equivalent to a done deal, so we also need to write the log to the hard disk.
When all the balances are steady on the disk, we should naturally drop the logs on the disk.
Then we can practice the offensive.
If Bob's balance is written to the hard drive, Smith hasn't changed it yet. In this case, only Bob's original balance in the log is:
[Start transaction T1] [Transaction t1:bob original 100]
When recovering, the transaction is not over, so Bob's balance is restored.
Similarly, if both Bob and Smith's balance are dropped, but no transaction is committed, the log is
[Start transaction T1] [Transaction t1:bob original 100]
[Transaction t2:smith original 0]
You can still recover the balance of two accounts.
Even if the latest balance of two accounts is dropped, the transaction is committed, but if the log crashes before it is written to disk, the undo log is
[Start transaction T1] [Transaction t1:bob original 100]
[Transaction t2:smith original 0]
The balance will also be restored as it is.
Where atomicity cannot be done.
Now it is said that the atom is finished, but only the atomicity is not enough, why? Because it does not guarantee consistency when multiple threads are accessing data.
For example, in the 2nd step, another transaction added the Smith account to 300 dollars,
- bob:100,smith:0
- bob:0,smith:0------------->bob:0,smith:300 (another business)
- The previous status is: bob:100,smith:0
- bob:0, smith:100 (transfer successful)
- The previous status is: bob:0,smith:0
If there is another transaction at step 2 of the time to add the Smith account to 300 dollars, at this time if the rollback, will change Smith to 0, then the addition of 300 pieces will be lost. Then we need consistency.
Consistency
In the previous chapter, we mentioned that if there is another transaction in the middle of a transaction that suddenly intervenes to modify the data, there will be inconsistencies in the data if there is a fallback.
So how do you solve this problem? If one of our transactions is done with the data, another transaction goes in again, so there is no scramble and inconsistent data. So the core is locking .
Like what
- bob:100,smith:0
- bob:0,smith:0------------->bob:0,smith:300 (another business)
- The previous status is: bob:100,smith:0
- bob:0, smith:100 (transfer successful)
- The previous status is: bob:0,smith:0
Locks and unlocks are performed at the beginning and end of the transaction, respectively. In this way, other transactions do not know what is inside the transaction. It is visible only after the transaction unit has been fully successful.
By now we "as if" has solved the concurrency, the consistent two big problem, but the new problem also comes, after locking , other transactions cannot access to the data, then the system concurrency is not to come, this is the following isolation the problem to be resolved.
Isolation of
The so-called isolation , in fact, performance as a justification, in the breach of consistency . Why is it? Because if you want to ensure strong consistency, the best way is to read and write, all queued, so there will be no data inconsistency.
However, at this time can not be high concurrency, performance will not go. So we just have to make some compromises, such as write-only locks and no read locks.
We first need to look at two transaction units for the same data, what kinds of concurrency patterns, and then define different isolation levels to see what concurrency patterns can be implemented for each isolation level.
4 Kinds of possibilities
Also, let's take an example to illustrate
Now T1:bob will give Smith 100 bucks, and T2:smith will give Joe 100 bucks.
This is two transactions, as shown, to ensure consistency, the Smith account is locked by two transaction units. That is, two transactions have shared data, Bob in the transfer of money to Smith, another transaction can not operate on the Smith account, and can not go up.
At this time, only two transaction unit t1,t2 between read-write concurrency, write-read concurrency, read-read concurrency, write concurrency 4 possible.
Because there is no shared data, it is entirely possible to write and write parallel, that is, writing is not locked.
Read and read parallel
That is, the read operation is not locked, so read and read can be parallel operation, because the read will not modify the data, so read can be assured of parallelism, without worrying about the consistency of the problem.
Read and write parallel
That is, when you read it, you can write it concurrently. We know that the write operation modifies the data, but the write is locked, so we cannot read the result of the write uncommitted. So although the data read two times is not the same, not repeatable read , but each read the data is correct, there is no inconsistency.
Write-read parallel
That is, when writing, you can also read concurrently. Because the data is constantly changing, it is likely to read in the middle of the state, if the system crashes at this time, the restart will revert to the value before the change, there will naturally be confusion.
So are we not able to achieve parallel writing and reading? Not, you can copy on Write. How do you do it specifically? Copy the data into log and modify it in log before each write operation.
The fact is to copy the original data, and then modify. So read operation is the original data, and write the role of backup data, non-interference.
This method is also called (Mvcc,multi version content control, multi-versioning). What is the meaning of so many versions .
We know that the data has been copied out of a copy, it may be modified many times, then the next read should read the modified version of the data? at this time, we can add a version number to the log. For example, the data version number that is written now is 10, and if you want to read data with a version number of 5, you can go ahead until you find the corresponding location.
So if the read occurs after the write operation, the read version number must be greater than the written version number. This ensures that you can read the data you want.
Four levels of isolation
It says two transaction units have 4 concurrency possibilities for a piece of data, and then we continue to discuss the isolation level . Different isolation levels can be read-write parallel, write-read parallel, read-read parallel, write parallel one or several.
Serialization:
It is not allowed to write when reading, when writing is not allowed to read, so as to ensure strong consistency of data, but the lowest performance. This is the way SQLite is used by default.
Repeatable reading, that is, can only achieve read and write parallel , reading and writing, reading , writing and so on can not be achieved.
So in the two are read, no read lock, all other situations need to lock.
MySQL is this way by default.
Read Committed (Committed):
At this point, when the data is added to the read lock, one writes in, the write lock replaces the read lock, that is, the read lock can be upgraded to a write lock.
So if the transaction T1 read the data, then the transaction T2 the data, because the transaction T2 is also locked, so it commits, then the transaction T1 read This data, the original data has changed. This is not repeatable reading.
At this time can be read and write parallel, read and read parallel, do not read and write parallel
This mode is used by Oracle, PostgreSQL, and SQL Server.
READ UNCOMMITTED: As the name implies, it is possible to read non-committed content
The lowest level of isolation, at which point only write and read are unlocked . Because the data is constantly changing, it is likely to read in the middle of the state, if the system crashes at this time, the restart will revert to the value before the change, there will naturally be confusion.
To solve the problem of read-write parallelism , you can use the above mentioned copy on write, which has the greatest benefit of guaranteed write-read parallelism, while the isolation level is high
Durability
Now let's talk about the last acid persistence , which means that whenever a transaction commits, whether it is a crash or an error, the data must be written to disk
So what happens when the data is lost?
The first is disk corruption. So we can use RAID redundant disk arrays to ensure reliability. See "Big Talk Store" study notes (4, 5 chapters), RAID
There is memory if the power, the data inside the inevitable loss, durability is not guaranteed. However, if each commit operation is completed, the in-memory data is synchronized to the hard disk, resulting in frequent write hard drives, performance will degrade. So persistence and latency cannot be combined.
We just have to compromise, for example, simply by committing the data to memory , returning to success immediately, and then packaging the request for a time to disk. This avoids the disk being written on every commit
Reference
- Mu class net
- If someone asks you how the database works, ask him to read this article. If someone asks you how the database works, ask him to read this article.
Database (v), transaction