In the current MySQL database, the most widely used is the InnoDB storage engine. InnoDB really is a very good storage engine, even high-performance MySQL said, if there is no special requirements, InnoDB is the best choice. Of course, this partial article is about TOKUDB, not innodb, compared to INNODB,TOKUDB has its own characteristics.
transferred from: http://www.kryptosx.info/archives/931.htmlComparison of Btree and fractal tree:
Currently, both SQL Server and MySQL InnoDB are used for the B+tree (SQL Server uses the standard B-tree) index structure. Theoretically, this structure should not be slow in the query process, this kind of data structure query based on the comparison of complex average complexity is Logn. The Class B tree is optimized for this, making it more adaptable to the disk and reducing the depth of the tree.
Random Io is almost always a problem for all DBAs, and when the amount of data is small, all the data is Tanhusebian in memory and that's not the problem (it's not necessary to use the B-TREE block structure at this time), but once the amount of data is larger than the memory, the problem arises. in essence, the problem with K-V storage is this: write as fast as possible and read as fast as possible.
This is also the most considered problem when designing data structures, and before we analyze the solution, we discuss a few extremes. To go one extreme, if I write the data sequentially, it is the quickest for insert, but every time a query needs to scan the entire table. So if I want to get the best reading performance, then the method is just like B-tree. But because B-tree has that random io, so we have no way to get sequential write performance,
Therefore, the TOKUDB uses an index structure called the Fractal Tree (fractals) to solve the problem of random IO. It is mainly able to make random io into sequential io.
Structure |
Inserts |
Point Queries |
Range Queries |
B-tree |
Horrible |
Good |
Good (Young) |
Append |
Wonderful |
Horrible |
Horrible |
Fractal Tree |
Good |
Good |
Good |
Introduction to Fractal tree (fractal trees)
We assume that there is a structure of such a collection, and the adjacent row space doubles. Each row is either full or full, and the full line of data is well-sequenced.
Data insertion:
For example, if you write a value again, it will be written in the first line, such as 3, when the first line is empty, so put it on the first line.
Write a value of 11 again, because the first line is already full, so take 3 out, and 11 do the sort, try to write the second line.
And because the second line is full, the second row of 5 and 10 is also taken out, the 3,11,5,10 are sorted. Write to the third row.
The final result:
Overall view:
As you can see, this data structure ensures that the blocks are full. If the front is full, a layer is merged until you find a block that you can write to.
Not clear:
The insertion complexity is O (log (N)/b), and B is the number of data rows stored in a block, and N is the amount of data. But I only think of the complexity of O (n/b). It was said to have been optimized, but I didn't read it.
mention: The complexity of the btree is O (log (N)/log (B)), which is the depth of the tree. b is actually the degree of the tree, the greater the degree of the tree, the lower the depth, the logarithmic relationship.
Summarize the characteristics of fractal tree structure
- Consists of multiple ordered arrays, exponentially increasing in size
- The array is either completely empty or full
- Data is inserted into the smallest array, and the data is merge if there is not enough space
Query performance:
If you do not optimize, the query performance is not good. We need to scan each layer, worst case io times up to log2N.
To improve the performance of the lookup, Tokudb adds a forwardpointer to each data, pointing to the position of the first data larger than it in the next row (this is called fractional cascading). On average, each number on the previous level limits the next-level search to a constant number, so the worst number of disk IO should be O (Logn).
Another way to optimize your view:
Summarize:
The main advantage of TOKUDB is the conversion of random IO into sequential io writes. Therefore, it is very good to write speed, and because of this, there is good data compression effect. But if it is sequential write, performance is inferior to btree.
Therefore, it is suitable for archiving, a large number of randomly inserted scenes.
Fractal Trees Fractal Tree Introduction--Concrete how to combine tokudb not too understand, first remember that its and LSM are the same fit write dense