SQL Server 2014 offers many exciting new features, but one of the most anticipated features is a memory database, and here's a quick introduction to some of the features of SQL Server 2014 's memory database
I believe that we are not unfamiliar with the concept of memory database, before there are a number of Daniel introduced the creation of SQL memory database, I have read carefully, have a general understanding, but still have a lot of details are not clear, such as:
(1) Does the in-memory database put the entire database into memory?
(2) The data is in memory, the outage or power outage, the data is not gone?
(3) According to how the memory is stored, or according to the way of the page, the size of a row is limited?
(4) The memory database is called lock-free design, how does SQL handle concurrency conflicts?
I believe these questions are also the problems that we often encounter when thinking about the memory database, the following will be the veil for everyone to uncover these issues, if there is something wrong, also ask you crossing help me point out.
How is the memory database stored, only in memory? Is it to put the entire database in memory?
Answer: No.
SQL Server 2014 offers a number of exciting new features, but one of the most anticipated features I want to count is the memory database. Last year, when I was in Seattle for the opening of SQL PASS Summit 2012, Microsoft announced that the next SQL Server version would be accompanied by a Hekaton-coded memory database engine. Now, with the advent of 2014CTP1, we can finally get a glimpse of its face.
In-memory database
In traditional database tables, because of the physical structure of the disk, the table and index structure is b-tree, which makes the index very weak in the large concurrency OLTP environment, although there are many ways to solve such problems, such as optimistic concurrency control, application caching, distributed and so on. But the cost is still slightly higher. With the development of hardware over the years, it is not uncommon for servers to have hundreds of g of memory, and because of the maturity of the NUMA architecture, the memory database has been eliminated because of the bottleneck of multi-CPU access memory.
The scientific name of the memory is called the Random Access Memory (RAM), so as its characteristics are random access, so for the memory, the corresponding data structure will be hash-index, and the concurrent isolation is also corresponding to the MVCC, Therefore, the memory database can handle more concurrency and requests under the same hardware resources, and will not be blocked by locks, and SQL Server 2014 is a powerful feature that does not require an additional fee for Oracle's TimesTen, so combined with SSD as Buffer The effect of the pool feature will be well worth looking forward to.
Representation of SQL Server memory databases
The Hekaton engine in SQL Server consists of two parts: a memory-optimized table and a locally compiled stored procedure. Although Hekaton is integrated into the relational database engine, the method of accessing them is transparent to the client, which also means that from a client application perspective, the presence of the Hekaton engine is not known. As shown in 1.
Figure 1: Client app does not perceive the presence of the Hekaton engine
The first memory-optimized table will no longer have the concept of lock (although the previous version has the concept of optimistic concurrency control with snapshot isolation, but snapshot isolation still needs to be locked while modifying the data), and the memory-optimized table Hash-index structure allows random read and write to be greatly improved. In addition, the memory-optimized table can be set as a non-persistent memory-optimized table, thus there is no log (suitable for ETL intermediate result operation, but there is the danger of data loss)
In this article, I would like to highlight the following two information:
(1) The memory database is actually to put the specified table into memory, not the whole database;
(2) The memory database organizes the data files on the disk in the way of file stream;
I'll add one more message.
(3) The data files of the memory database are divided into the file and delta file, and they are in pairs;
1, the memory database is actually put the specified table into memory, not the entire database;
The process of creating a memory database is essentially storing the table in memory, not the entire database. The syntax for creating a memory-optimized table is shown, and the red box marks the place where the memory differs from the traditional table creation syntax.
Memory-Optimized tables do not just store data in memory, otherwise there is no difference from traditional data caches. In the in-memory database, memory-optimized tables are also called "natively compile memory-optimized tables", which translates to locally compiled memory-optimized tables, which are built at the same time that the memory-optimized tables are compiled and cost-loaded machine code into memory. The Local machine code contains machine instructions that can be executed directly by the CPU, so access and operation of the memory-optimized table will be very fast.
Memory-optimized tables are divided into two categories, persistent tables and non-persistent tables, and changes to the persistence table record the log, even if the database restarts, the data is not lost, and the operations on non-persistent tables are not logged, and the results remain in memory and the data is lost after the database restarts.
The above is just a case of creating a new table, in a normal business environment we can not be to a business system database of every table to create, that for existing tables, there is no configuration method? The answer is probably not very satisfying, currently SQL does not support migrating existing tables into memory, so in order to use the in-memory database, the existing business data tables must be recreated.
2. The memory database organizes the data files in the disk in the way of the file stream
In the in-memory database, the data files stored on the disk are not stored in the extents or pages, but are based on file stream storage. One of the features of file stream storage is the ability to support fast read operations, which can improve efficiency when the data in a file stream is load into memory when the database is restarted.
3. The data files of the memory database are divided into the file and delta file, and they are in pairs;
The inserted, updated, and deleted data in the in-memory database is physically stored separately, using the data file and the delta file.
(1) Data file
The data file is used to hold the "insert" or "Update" rows, and the order in which the rows of data is stored is strictly organized in the order in which the transactions are executed, such as data from the first row in data file from transaction 1, and the second row of data from Transaction 2, which can be data from the same table. It can also be data for different tables, depending on whether the memory-optimized tables for both successive transactional operations are the same. The benefit of this approach is to ensure the continuity of disk IO and avoid random io.
The size of the data file is fixed, 128MB, and when a data file is fully written, SQL automatically creates a new data file. Because the order in which the data is saved in the database file is in the order in which the transactions are executed, the data rows of one table (from multiple transactions) may span multiple data files, and when the update operation is performed on multiple rows, the write can be assigned to more than one file, and at the same time, This can speed up the efficiency of the update. (This will be described below for the delta file)
For example, there are 4 data files (light blue), the first data file has a transaction scope of 100-200, and the second data file has a transaction scope of 200-300 ... (100, 200 means timestamp)
In the data file, if a row is deleted or updated, the row is not removed from the data file, but the deleted row is marked with the Delta file (yellow box) (The essence of update is the collection of delete and insert. So there will be a delete action when you perform the update, which eliminates unnecessary disk IO.
If data file is never deleted, will the file not be unlimited and the backup is not a large disk? Of course not, SQL in dealing with this problem with the method is actually very simple-"merge", according to the merging strategy, the combination of multiple data file and delta file, according to the contents of the delta file to delete the data file redundant records, and then the multiple data Files are combined into one file, reducing the amount of disk space that the data file occupies.
(2) Delta file
Each data file has a matching delta file that refers to a transaction-scoped match that records data on the same transaction (including one or more transactions), and the delta file records the tag of the deleted row in the data file. This tag is actually an associative information {inserting_tx_id, row_id, deleting_tx_id}. It is the same as data file, and it is strictly in the order of transaction operations to save the deleted row information.
For example, the in-memory database has 5 data file, each of which contains 100-200, 200-300, 300-400, 400-500, and 500 transactions. If a transaction with a timestamp of 501 needs to delete the data generated by the firm with a timestamp of 150, 250, and 450 and add some new data, the corresponding IO request will be assigned to the delta file on 1th, 2, 4, and the 5th data file. The delete operation can be assigned to multiple files, and it can be done at the same time, thus speeding up the efficiency of the deletion.
Second, the data are in memory, that outage or power outage, the data is not gone?
Answer: No.
The In-memory database guarantees the persistence of data in two ways: transaction logs and Chcekpoint.
(1) Transaction log
In the memory database "write log" and "Write Data" in a transaction, during the transaction execution, SQL will "write the data" and then "write the Log", this is different from the traditional database, in the traditional database, whether in memory or disk, "write data" always after "write log", That is what is commonly referred to as Wal (Write-ahead Transaction Log). However, there is no difference between a memory database and a traditional database on a write log when a transaction commits: The log is written to disk before the data.
Therefore, even if the server is down or powered down, the next time the database restarts, it will redo the business (redo) by the transaction log that has been saved on disk, so do not worry about the data being lost.
In addition, it should be added that the in-memory database will only save the committed object log to disk on the persistence table. The benefits of doing so can reduce the number of times the disk is written. Memory database support Frequent, rapid increase, delete, change operations, this intensity is much higher than the traditional database, the database needs to write a log for each operation, which will generate a lot of disk IO, write log operations will likely become a performance bottleneck, do not log the uncommitted transaction log to reduce the number of write logs, This can improve the performance of the database.
Some students want to, do not log uncommitted transactions will result in inconsistent data?
Certainly not, because it is not possible to write "dirty data" to disk before the log is written to disk (the reason is explained in the checkpoint below).
(2) CheckPoint
In the in-memory database, the primary purpose of checkpoint is to write "data" in memory to disk to reduce the time of data recovery when the database crashes or restarts. The database is not required to read all of the logs to recover data. By default, checkpoint is periodic, and a new round of checkpoint is triggered when the log is increased by 512M after the last checkpoint.
In a traditional database this way, checkpoint can flush uncommitted data into the disk's MDF file, which does not occur in the in-memory database because the memory database will only log the transactions that have been committed, and it is not possible to write the data to disk before writing the log (to disk). It is therefore guaranteed that the data written to the disk must be the data of the committed transaction.
Third, the data in memory is how to store, or according to the way of the page, the size of a row is limited?
Answer: Not by page, the limit size of a row is 8060Bytes.
Memory-Optimized tables are stored based on row versions, with multiple versions of the same row in memory, and the storage structure of a memory-optimized table can be thought of as a collection of multiple row versions of all rows in the table.
Rows in a memory-optimized table are not the same as the row structure of a traditional database, and describe the data structure of a row in a memory-optimized table:
In a memory-optimized table, there are two large parts of a row: The row header and the row body,
The row header records the validity of the line (start timestamp and end timestamp) and index pointer
The row body records the actual data for a row.
In a memory-optimized table, the number of row versions is determined by how many operations are made on that row, such as: Each update, a new row is added, a line version is created, the new row has a new start timestamp, and when the new row is generated, the original data line is automatically populated with the end timestamp, which means the line has expired.
Note: There are actually only 3 rows, the 1th row has 3 row versions, the 2nd row has 2 row versions, and the 3rd line has 4 row versions.
Since there are so many row versions of the same row in memory, how is the database controlled during access?
In a traditional database, each row in a table is unique, and a transaction can be found by file number, page number, and slot, if it wants to find a row.
In the in-memory database, where each row has multiple row versions, one transaction cannot manipulate each row version once, in fact, a thing can only manipulate one row version of the same row, as to which row version it can operate, depending on whether the transaction execution time is between the two timestamps of the line. Other line versions are not visible to the transaction.
Because there may be more than one row version of a row, one might ask this question: there are so many lines in each row, a table with millions of rows, and that's enough memory. Don't worry, as described earlier, each row actually has a timestamp, and for a row that has an end timestamp and no active transaction access, SQL Server reclaims the memory it consumes through the garbage collection mechanism, thereby saving memory. So don't worry about enough memory.
Four, the memory database is known as the lock-free design, if there is a concurrency conflict, how does SQL handle the conflict?
Answer: The memory database uses row versions to handle conflicts.
An important function of locks is to avoid multiple processes modifying data at the same time, resulting in inconsistent data. Common conflict phenomena include read-write interlock and write-write interlocking. How does the in-memory database solve both of these locking phenomena through the row version?
(1) Read/write interlock
In the in-memory database, all transaction isolation to the memory-optimized table is snapshot-based, and is accurately a row-based snapshot. As you can see from the structure above, each line of the outfit includes the start timestamp and the end timestamp, and the key to whether a transaction can access this line is whether the transaction's start time is within the two timestamps of the line.
If a transaction is modifying a row (snapshot) but has not yet been committed to a memory-optimized table, that is, the "New row" has no end timestamp, it reads or is the original row (snapshot) for read transactions, so there is no dirty read.
(2) Write and write interlock
Write interlock occurs when two transactions update a row at the same time.
The probability of a memory database conflict is much smaller than that of a traditional database, but if you do encounter a conflict, you can only adjust the application, add "Retry logic" in the application (wait a while, and then re-initiate the transaction) to resolve.
Perhaps some students think this way does not seem to have any major performance changes. In fact, for example, in a traditional database, a lock may have the entire table in the pipeline, during the table lock can only wait for the transaction to be done to perform other transactions, and actually this transaction may only modify a small number of rows, because the existence of table locks, other rows that do not need to be the transaction operation of the row. However, write conflicts in the memory database always occur at the row level, which is much smaller and less significant.
SQLSERVER2014 Memory Database Features Introduction