The myth and misunderstanding of Hekaton

Source: Internet
Author: User

Recently, I spent a lot of time to better understand the new memory table technology in Hekaton--sql Sever 2014. I read a lot of articles and learned about Haktaon's various internal data storage structures (mostly hash indexes and bw-tree). I also read a lot of lectures on this topic.

But more than once, there are a lot of false positives, myths and misunderstandings, people's understanding of Hektaton has been wrong. From everyone to Hekaton's question can be seen, we need to collate hekaton knowledge, to everyone to re-convey its relevant knowledge, so that everyone better understand Hekaton, in the Hekaton suitable scene to better use it.

Here are just a few of the questions I've heard about Hekaton:

    • "Hekaton is memory storage technology, does that mean the data is no longer in the eternal?" ”
    • "Hekaton can only be run on a particular architecture's CPU?" ”
    • "When you move to Hekaton, you get 100 times times the performance boost for your workload?" ”
    • "In the Hekaton, there is no lock, block, spin lock. ”

This is only part of the headline that I've heard in the past misunderstanding. So the purpose of this article is to clarify these biggest misunderstandings and problems, and I will tell you why they are wrong. Well, let's start with my top headlines (no particular order)!

"Does the Hekaton also provide acid properties for transactions?" "

When I started talking about Hekaton, this was always the first one I wanted to clarify: when you use Hekaton, it still has acid properties for your business! The transaction in Hekaton has always been atomic (atomicity), consistent (consistency), isolated (isolation), and persistent (durability). SQL Server only uses a different concept and method internally to guarantee these 4 important attributes.

  • Atomicity (atomicity): For transactions that you handle, you can roll forward (rolled forward) and roll back (rolled), even if your SQL Server crashes, for a database that you enable Hekaton, SQL Server can still be repaired (crash recovery) (as long as you restart with SQL Server, your data is still persistent (persisting)--described in detail below)
  • Consistency (consistency): Hekaton also provides consistent data "always". This is easy to do at this moment because one of the biggest limitations of Hekaton is the inability to create foreign keys (Foreign-keys), which can only be constrained by itself. So basically the Hekaton-enabled table is internally constrained.
  • Isolation (Isolation): A few days ago, someone tried to persuade me that Hekaton provided you with dirty reads (Dirty Reads) because there was no lock and no blocking (Locking and Blocking). Wrong!!! Hekaton uses the so-called multi-version concurrency control (Multi version Concurrency control, MVCC) method to have the magical possibility of unlocked. When you read the data, you retrieve the data that is not valid after the statement/transaction you started (depending on the transaction isolation level used (Transaction isolation levels)). Therefore there is no dirty reading in the Hekaton (Dirty Reads).
  • Persistence (Durability): This is a case in point. When you use schema and data (Durability=schema_and_data) persistence to create your Hekaton table, the data still exists when SQL Server crashes. It recovers from the transaction log, which is the checkpoint file written by Hekaton. This is important: when you want your data to persist, Hekaton still uses the transaction log, which means that your transaction log is one of the last performance bottlenecks of Hekaton. Hekaton uses a very efficient log model when compared to traditional hard disk-based tables. One of the changes here is that only data modifications are logged, not at the index level. When you perform Insert,sql server on a traditional hard disk table, you need an insert log for "Each" index (clustered and nonclustered indexes). In Hekaton, SQL Server logs only one insert at a time, because all Hekaton indexes (hash indexes, scope indexes) are rebuilt during SQL Server startup. Therefore, the impact on the transaction log is as small as possible.

When you use a Hekaton table that has only schemas and does not contain data (durability = schema_only), the persistence of your Hekaton transaction is gone: When your SQL Server crashes, or you restart SQL Server, All data in the Hekaton table is lost. Therefore, there is no persistence. Therefore, there is no log record in the transaction log. Of course, this usage only makes sense when used in special scenarios, such as data loading for your Data warehouse. If SQL Server crashes, you will have to restart your ETL process. You can rebuild your lost data.

"Hekaton is the No-sql method provided by Microsoft?" "

This misunderstanding is also very interesting. Someone is trying to convince me that Hekaton is a new way for Microsoft to develop No-sql. Come on, use Hekaton. We're still talking about relational databases with all ACID properties (see just that). Hekaton and No-sql are 2 completely different things and have nothing in common. The Hekaton interior uses an elegant and fast way to implement the characteristics of a relational database--that's it!

"Hekaton only run on a specific architecture's CPU?" "

Wow, I think I'm in the wrong place! Hekaton runs only on specific models/architectures because Hekaton internally uses so-called atomic CAS operations (atomic comparison and Exchange Atomic Compare & Swap, or Atomic Compare & Exchange). That sentence is completely wrong! Of course, internal Bw-tree uses CAS operations as multiple atomic steps to make tree SMOs (structural modification operations, Structure modification Operations). Hekaton uses the "InterlockedCompareExchange" WIN32 API function internally here. This function only compares the value of a particular memory location with the original value, and if 2 is the same, the new value is written in the memory location. The function itself executes at the CPU level as an atomic assembly instruction, meaning that no other thread can interfere with that assembler function. It executes as an atomic block (atomic block) from start to finish.

The myth here is that the assembly functions needed are supported only on specific CPU architectures. That's true, but this assembler function is supported from the Pentium processor! On the 386 and 486 architectures, the function itself is not supported ... As seen in the requirements section of the MSDN article just mentioned, the minimum supported system version is Windows xp! So when you install SQL Server 2014 in a previous Windows XP system, this myth is true!

"In the Hekaton, there is no lock, block, spin lock. "

in theory, the sentence is true. The sentence can be debated in different ways. We start with the 1th aspect. Hekaton "itself" is unlocked, blocked, and spin-locked, but you are still dealing with the traditional relational engine of SQL Server. This means that when you leave the Hekaton universe, you are still dealing with SQL Server based on the original code (I regret that ...). ), such as the transaction log Manager (Transaction Log manager). The code still has a latch (latches) and a spin lock (spinlocks) to keep the synchronization of different thread accesses. From this point of view, the above statement is partially correct.

2nd aspect in Hekaton you still have a blockage (blocking) where an atomic CAs operation cannot be interrupted by different threads when you perform an atomic CAS operation. So in bw-trees SMOs (structural modification operations, Structure modification Operations) can be implemented in a smart, elegant way. It also means that when you want to execute an SMO on the same page in the same bw-tree, one line routines wins, and the other threads need to retry the atomic CAS operation. What happened at the same time? The thread rotates and tries the CAS operation again. My basic understanding is that the atomic CAS operation itself is like a criticial section that synchronizes the concept wrapper assembly function. This means that your thread needs to be rotated, you are losing CPU cycles, and increasing the latching nature of your transactions. Of course, SMOs should be very, very rare, so it's not a big deal-but it's also possible to rotate threads, when there is competition for the underlying synchronization objects (or assembler functions).

This is my basic understanding of atomic CAS operations. If my understanding of this is wrong, please feel free to correct me!

"because the int identity value is not supported in Hekaton, use sequence (sequences)?" "

This is a very interesting sentence. I'm not bragging about that! Why? Because in your SQL Server database, the sequence (sequences) is a shared object, meaning that the current value of the access is synchronized by SQL Server. This synchronization ends in the competition, which means that you can't extend your workload, Hekaton everything is stretched.

I've done some testing on CTP1 with sequential values, and once you execute a lot of parallel threads on your hekaton table/stored procedure, you'll trigger the competition in the Sequence Builder (Sequence Generator). Of course on some internal pages, the sequence generator (Sequence Generator) stores the current values, and the latch (latch) occurs when the sequence generator reads and writes to these specific pages. In the Sequence builder, you end it with a latch-up competition (Latch contention), and your Hekaton workload does not extend. It doesn't make much difference in my tests if I ask for the entire range of sequence values, or when I use caching, there's no difference. The Sequence builder is always a bottleneck.

So how do you overcome this particular problem? Use old good friend uniqueidentifier. These values are completely independent of each other, meaning that when you generate a new value, there is no shared resource involved, so you can eliminate the bottleneck until you trigger a 100% utilization of the CPU (including other bottlenecks such as transaction logs, network bandwidth, etc.), and the Hekaton workload can be stretched continuously.  

"For your program, Hekaton is completely transparent (is completely transparent). "

This is true, as long as you have no idea of the database design. As I mentioned earlier, it is not possible to create a foreign key (Foreign-keys) in the 1th release to check the constraints. I've never seen any disk-based tables, and can be migrated to memory-optimized tables one by one (memory-optimized table). Remember, the INT IDENTITY value is not yet supported. When you move to Hekaton, you do more than simply switch to get 100 times times the performance boost. I'm sorry!

"For your program, the Hekaton is completely transparent. "

This is true, as long as you have no idea of the database design. As I mentioned earlier, it is not possible to create a foreign key (Foreign-keys) in the 1th release to check the constraints. I've never seen any disk-based tables, and can be migrated to memory-optimized tables one by one (memory-optimized table). Remember, the INT IDENTITY value is not yet supported. When you move to Hekaton, you do more than simply switch to get 100 times times the performance boost. I'm sorry!

for the range index (ranges Indexes), Hekaton uses the traditional B + tree structure. "

wrong! The scope index uses so-called Bw-tree, the current SQL Server is almost the same as the B + tree structure used for clustered and nonclustered indexes. Bw-tree is based on B-link tree--everyone may be a little confused, compared to the traditional B + Tree, Bw-tree has 3 major differences:

    • A page in the middle tier stores a wide range of key values, and the next layer of pages stores a small range of key values. Because a large range of key values is stored on the page, Smos (for example, page splitting) can be implemented in 2 atomic operations (performed by 2 atomic CAs). This concept comes from the design principles of B-link tree.
    • The page "Never" changes, so this will cause the CPU cache line to be invalid, this will pass through to the entire memory architecture, which is very expensive (in terms of wasting CPU cycles). When Hekaton needs to change the pages in memory without touching the original memory location, Hekaton only creates a new delta record, which is the modification operation. The so-called "page Mapping table" points to the new Delta record, and the original record corresponding to the Delta records has not been modified. Because this way the CPU cache line is invalid can be avoided.
    • The page size is elastic and not always 8kb in size.  
"In the database, Hekaton provides you with ultra-fast business logic. "

This is right from the Microsoft Point of view because SQL Server is licensed at the CPU core level since SQL Server 2012. The more CPU cycles you use, the more CPU you will need, and the more licenses you will pay to Microsoft. But from an architectural point of view it is wrong! The database handles storing and retrieving data, but the database is not an application server, and the application server handles the business ... Think about it. When you have a CPU competition, because you run a lot of business logic in the database server, you should refactor your database, so you move your business logic to a dedicated application server, as SQL Server authorization is completely different-you pay for the operating system, this is the wrong reason!

"How do I migrate my entire SAP database to Hekaton." Does Microsoft have a tool for this? "

When you want to migrate your entire database to Hekaton, think about it first. Hekaton are used to solve specific problems like latch-up competition (Latch contention). Only makes sense when migrating specific tables and stored procedures to Hekaton-not "all" databases! For each Database object (table, stored procedure), SQL Server needs to compile and link them into the corresponding DLL files (and then load the Sqlservr.exe's run space, which takes time). Compilation and linking are also performed when you restart your SQL Server, or when a failed cluster transfer occurs. This will directly affect the target recovery time of your Ha method (Recovery times Objective,rto).

Summary

I hope I have clarified some myths, false positives and misconceptions about Hekaton. And what I've always stressed is that if it's a "right" problem, Hekaton can help, if you have a traditional problem (wrong index design, bad storage performance) give up Hekaton, do your homework first.

Hekaton is like F1 Racing:

If you are not technically good, F1 car can not help you!

Thanks for your attention, look forward to your message!

Myths and misunderstandings of Hekaton

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.