[Introduction to algorithms] 12th skip table, Introduction to algorithms skip

Source: Internet
Author: User

[Introduction to algorithms] 12th skip table, Introduction to algorithms skip

This section introduces a new data structure-Skip table.


A skip table is a simple and interesting data structure for dynamic search. It is easy to implement and ensures efficient performance, namely 2 * O (lgn) search Performance


Before that, I would like to talk about the linked list. The advantage of the linked list is that it only takes the time for constant items to be inserted and deleted (and the additional O (n) time is required for searching this element ), however, the search efficiency is only O (n). Here we will supplement the linked list category. The following are two typical linked list questions that BAT companies are keen to take during interviews:


1. How to quickly find the m-th element at the bottom of a one-way linked list

2. How to quickly determine whether a one-way linked list has a ring


The core idea of linked lists is nothing more than two points. One is to open dual pointers (or even multiple pointers), and the other is to open dual-linked tables (or even multiple linked lists ), in fact, the above two problems can be cleverly solved by double pointers. The first problem is to start a pointer to take m steps, and then start a pointer to synchronize. When the current pointer goes to the end of the linked list, the next Pointer Points to the m-th element at the bottom. The second problem is to open a fast pointer and a slow pointer. The fast pointer moves two steps at a time and the slow pointer moves one step at a time, if a ring exists, the fast pointer will surely catch the slow pointer. we can imagine that two people are racing on the playground. After a long time, the fast pointer will take a very slow lap.


Next we will continue to talk about the Skip table. In fact, the Skip table uses the second idea: Opening a double-linked table or even multiple linked lists.


First, we need to create two linked lists L1 and L2. L1 is a fast table, that is, only some elements are saved. L2 stores all elements for a slow table. Note that the linked lists mentioned below are sorted linked tables.


When we want to search for an element, we should first go to the quick table, because the quick table only retains some elements, so it jumps forward until the quick table passes through this element, we can switch back to the previous node of the quick table to the slow table to continue, which is obviously better than linear search on the slow table. The speed table here is like the speed line subway of the US Metro, the express line only stops at several stations, while the slow line stops at all stations. Passengers can first take the express line to the first station closest to their destination, and then take the slow line to the place.

Why?

How to Create L1 and L2 tables? L2 is undoubtedly a one-way linked list containing all nodes, so how many nodes should L1 be set to be the most reasonable? Intuitively, L1 should be the best evenly distributed, so what is the most appropriate density distribution?


We can easily find that the upper bound time is | L1 | + | L2 |/| L1 | + constant of transfer, here | L1 | indicates the length of L1 (the worst case is that L1 goes back to a node and enters L2 after it reaches the end, because L2 can be considered as being divided into L1 segments by L1, therefore, the length of each segment is | L2 |/| L1 |, so it is | L1 | + | L2 |/| L1 | + constant of transfer ), because L2 length is n (including the entire linked list), the transfer is the time when the linked list L1 goes down to L2, which is a constant, therefore, our goal is to minimize | L1 | + n/| L1 |, that is, when | L1 | is sqrt (n), it is optimal (which can be derived or obtained through other mathematical methods, proof omitted). At this time, the time consumption is 2 * sqrt (n), that is, every sqrt (n) sets up a fast table node, a total of sqrt (n) Fast table nodes

What? Sqrt (n) is not enough?

So how can we optimize it? The answer is to add more linked lists. Let's look at how many three linked lists should be. Intuition tells us that we are 3 * n to the power of 1/3.
In fact, it can be proved that k linked lists are k * n 1/k power

Because n is a constant, how big is k? Lgn! Let's take a look at the value of k when taking lgn, that is, the value of lgn * n ^ (1/lgn), that is, the value of lgn * n ^ logn (2 ), do you still remember the bottom-up Formula for Calculating the recursion time complexity? Here n ^ logn (2) is 2 ^ logn (n) is 2 ^ 1, that is 2, so the overall time complexity is 2 * lgn, which is a very good performance.

In this case, the hop table is called an ideal hop table. The number of each layer is reduced by half. In total, the lgn-layer linked list is searched from the parent linked list. If not found, the list is searched down. A maximum of N layers can be found, each layer can search for up to 2 elements, so the search complexity is O (2lgn)


So the question is, how to dynamically maintain such a skip table?

First, let's look at the delete function. The delete function can be directly deleted after being searched from the parent linked list and deleted from all the nodes in the linked list. This is relatively simple, so what about insertion?


Insert (x) first search (x) at the bottom table and then insert this element. Is it over? No, because after several knots are inserted consecutively in a certain segment, this segment will become very long, and the balanced structure of the entire skip table will undoubtedly be broken, how can we maintain the structure of the ideal line segment table?

1. Maintain the ideal distance between each segment. If the distance is too large, it is split from the middle, and then the midpoint is increased by a node.

This method is intuitive, but it is difficult to implement, because you must record the length of each segment in real time.


2. using our favorite randomization algorithm, if a coin is thrown on the front, the node is upgraded to a level (that is, the node is also added to the linked list of the upper level ), another coin (to see if the level is continuously improved), because the ratio of the length of the two adjacent linked lists is, and the probability of the front of the coin is 50%. It turns out that this is feasible, it is worth mentioning that in this lesson, the teacher sent two coins to the students, one to generate a random number using the coin, and the other to determine whether the current insertion node needs to improve the level, I started an experiment directly in the classroom, and the course atmosphere was quite good. It also gave students an intuitive understanding of the algorithm. This teaching method is worth learning.


Note: here we need to consider a special case. When the inserted element is the smallest element, if it does not increase the level, the beginning of the parent linked list is not the first element, this will disrupt the ideal structure of the entire skip table. Therefore, we need to apply a patch to insert a negative infinity value to all linked list headers, in this way, even if a minimum element is inserted, each table can start with a negative infinity, that is, each linked list can start from the leftmost.

In the classroom, experiments show that algorithm 2 can get a good skip table on average, not just a good skip table on average, in most cases, you can get a good jump table.

It can be proved that the probability P> = 1-O (1/n ^ a) of a good skip table is a parameter between 0 and 1, which is related to n, at the end of the class, the teacher spent 20 minutes to prove the proof. The specific proof method is skipped here (in fact, I did not understand the proof process to escape ~~)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.