Data Structure and algorithm analysis (abbreviated as 5000)

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Link: Click to open the link

Okay, this is the second blog post for carrying Matrix67 ...... Why ...... Same as the previous article... The old knowledge can learn new knowledge as well as what shiniu wrote.

Continuous: predecessors V5, future generations forge ahead ah
==========================================

I bought it in April 7 and read it only a few days ago. This can explain a lot of problems, for example, learning is very tight, there is no time; the book itself is very good, very head-catching; reading is very careful, very patient.
I plan to roughly write the content in the book.
Data Structures and Algorithm Analysis in C, Second Edition, Mechanical Industry Press. The cover is ugly. It's a black background with some marble patterns on it. It's also ugly to place an original cover in the middle. A total of 12 chapters, nearly 400 pages.
There are many more than 400 pages. We must "read thick books", thin and thin, change one page, one page to one line, and one line to one word. Therefore, I have to finish the entire book within a limited number of words.

Algorithm analysis is a matter of complexity. The complexity is only "the most terrible". For example, if the algorithm that executes n ^ 2 comes to a fast sorting, it will not drag the speed at all. If n ^ 2 is more than enough, it will be thrown out without worrying about a nlogn. The book strictly defines the complexity, including four symbols: O (), o (), lift (), and Ω. To put it simply, O (n ^ 2) is the nth ^ 2 attack in the last day; o (n ^ 2) means the ceiling is less than n ^ 2, shorter than n ^ 2 (for example, the hill sorting is o (n ^ 2), because it cannot reach n ^ 2 again); Ω (n ^ 2) that is to say, at least n ^ 2 is required for an algorithm. For example, all comparison-based sorting is Ω (nlogn), and round (n ^ 2) is O (n ^ 2) it's also Ω (n ^ 2). It's stuck in the middle of the ceiling and cement. It's just it. Here is a classic example, which is the four algorithms of the largest subsequence (finding the sum of the numbers of consecutive segments in a series). The complexity is O (n ^ 3), O (n ^ 2), O (nlogn), and O
(N ). This book features a rigorous proof of algorithm complexity for each type of data structure, which is often the biggest headache.

Tables, stacks, and queues are three basic data structures. To put it bluntly, a table is used to sort the data and search for the entire team. A stack is a bucket, which is first put in and then taken out. The items below it will not come out until it comes out, it's as if you have seen an ugly person. It's impossible to spit out breakfast before lunch today. The stack is used to simulate the call of multiple processes (such as recursion). The actual point is used for expression calculation. A queue is like a traffic jam. First, the queue goes out first. The advanced team buys tickets first. It is often used to achieve wide search.

A tree is a plant with many branches. The difference is that the tree here is falling, and the branches are down and long. The top is the root, and the top is the leaf. His father gave birth to the leaf, and his father gave birth to his son. No matter whether the root is leaf or son or his father, node is called. We often store data on nodes and constantly insert, change, and delete data in the future.
A binary tree is the place where each branch can be divided into two forks at most, and the left and right sides are also clearly divided. The binary search tree means that data is stored on the node, and the left side is smaller than his father, and the right side is larger than his father. In the future, you can find only one side of the tree, half the space. You can insert a number into the Binary Search Tree and delete it. (if there are only one son, you can replace the smallest one on the right ), find the smallest number (always to the left), find the largest number (always to the right), but it is easy to make the tree become malformed, for example, if the left side soars and the right side shrinks, it will take a long time to go to the left side. We need a method to make the left and right sides of the tree almost the same, and the left value is still smaller than the right value. In fact, this method has been found, and not only is it a method of a truck, such as AVL, Splay, red/black tree, Treap, etc. Each of these methods relies on a technique called "rotation", that is, how to turn several nodes, and the left side goes to the right side. You can see the figure below.

① ②
/\ Rotate /\
② ZZ --> XX ①
/\/\
XX YY ZZ

In this way, there will be fewer on the left. If there are a lot of XX on the left, you can raise a level to balance the situation. Similarly, if there is more on the right side, it will be done in turn. This is just the simplest "single rotation". In fact, there are many other complicated rotation methods. The Splay tree is to convert the accessed nodes to the top, and the Treap is to attach a random number to each node, at any time, the random number of the son is larger than that of his father by rotating, and the rest is a bit complicated. These methods can make the Binary Search Tree relatively balanced to prevent time waste caused by distortion.
The B-tree and the binary query tree are different. One is that the node does not store data, and the data is all on the tree leaves. The other two are not necessarily binary. The data is still small on the left side for easy search. The maximum number of sons for each node is limited. The maximum number of sons is 2-3 trees, and the maximum number of forks is 2-3 trees. Because only the leaves have data, we can use the split method recursively to deal with the situation where the number of forks after the new insert is more than the number of defined sons. For example, if two or three trees are divided into four forks, they are divided into two forks. We also stipulate that, apart from the root node, the minimum number of sons for each node is half of the maximum number of sons. It is easy to think that deletion can reverse the split during insertion. When there is only one son left, it will be merged with the next one.

A Hash table is also called a Hash table. It is generally used to identify whether there are duplicates. For example, if I want to see if there are two natural persons in our class, we don't have to compare them once. Instead, we prepare a notebook to let people draw a circle one by one on his birthday, if someone wants to draw a circle and finds that there is already a circle, they will find a pair. This is very simple.
That day, I got a heart test in my class. At that time, I found a girl who was born with me.

Heap is a thing of tops. It's not a heap. It's only a heap if it's a small one. A heap is a binary tree, which always makes the following bigger than above. Compared with the binary search tree, it is both good and bad: The good thing is that you don't need to find the minimum value in the data if you want to know it. It is the top one; the bad thing is that apart from this, the heap basically cannot do anything else. Except for the top one, you have almost no control over the rest. Of course, this basic operation can still be done. Insert is to temporarily place the data in the bottom position, and then compare and exchange it until the data has reached its position and cannot be increased. The deletion is reversed, and it sink to the bottom through constant exchange. Because it is going down, we need to consider whether to put the left side or the right side. Of course, in order to ensure the size of the stack, We should replace the small side. As I said earlier, because you can only "see" the top things and do not know what the middle part is, we usually only delete the smallest (top) node. In fact, heap has another biggest benefit: it is easy to write code. Because we can intentionally ask the data to "fill up" the tree, and it is filled with a row and a row. This is called "Full binary tree ". We can create a number for the Complete Binary Tree, from top to bottom, from left to right. The root is 1. If you look for the left son, you can take 2 and 1. If you look for the right son, you can take 2 and 1. If you look for his father, you can take div.
2. It is very convenient to call someone who is later. In this way, the entire tree can be implemented using an array. Because the heap is basically used only to find the minimum value, it is best to use a binary search tree if the requirements for a problem are complex. Of course, if the problem only requires three operations: insert, delete, and find the minimum value, you should not hesitate to choose heap. After all, it is much easier to find the shortest heap, and it is easy to write. When will this problem occur? For example, if my girlfriend is in the queue, each time I choose the most pure one, the person with the least influence. Every time I met a new beautiful girl, I put her in a proper position in this team for my future entertainment. At this time, I only care about the minimum and minimum values for each insert and delete operation. This team can use a heap for optimization. Therefore, the heap also has an image named priority queue. What should I do if I ask a question without looking for the minimum and maximum number? The man must be a fool. I just want to change the heap. Isn't it enough for me to look up and down?

The most troublesome part of the study heap is the heap merger. How can we merge two heaps into one? This solution is very useful. At least the above operations are all the same: insert is to merge with the heap of a single node, and delete the root is to drop the root, merge the left and right sides of the root (apparently heap. A simple method is to recursively merge the large heap to the right of the small heap and replace the New heap with the original right son. Note that the root, root, and size of the recursion process are constantly changing. The result is a typical "right-leaning error", which damages the perfection of a Complete Binary Tree. To this end, we want to ensure that the rightmost of the heap is as few as possible at any time. Therefore, we should not simply use a binary tree, but write a few more lines of code. There is no degradation problem like the binary search tree, where "one side is more and more", because for a heap, I only care about the things on the top, and it doesn't matter whether the following is not balanced, as long as it does not prevent me from merging. As a result, we think of the next rule that allows the heap to be skewed to the left as much as possible. The rule is that for the left and right sons, the two sons closest to the left are not all (maybe none) farther than those on the right. This rule looks troublesome. In fact, it is really effective. the rightmost path is much shorter than expected. This is called a left-side heap (left-side tree ). The merge operation is convenient, but the merge operation does not need to be performed many times on the right. The solution is to find a way to keep the left heap at any time. The method is simple. Isn't merge recursive? After each layer of recursion, you can see the distance between the left and right sides of the child node and the left and right sides of the child node. If the right side is far away, you can call the right side on the left. Since we have not implemented this with arrays, it is very easy to create a linked list. This method of adjusting the left and right gives us an inspiration: Where should we care about the distance between the nodes without two sons? Since every time I merge them to the right, why don't I adjust it to the left after each merge? This idea is feasible. In fact, it has another name, called oblique heap.

The heap is stronger. It is also a heap and can also be merged, but it has exceeded the heap realm: it is not a heap, but a heap full of houses. That is to say, you cannot find the minimum value at once. Instead, you need to check the top of each heap in the two heap items. The merger of the two items is also very strong, and the root size is directly stacked under the root small heap. This means that each heap of the two items may not be a binary tree. This increases programming difficulty, but you can use a technique called "Left son and right brother" to solve the problem. The trick is to still use a binary tree to represent a multi-tree: Draw the tree, and then specify that the Left son of the node is the leftmost one on the next layer, and the right son is the right one on the right. That is to say, the left son is the real son, but the right son is born together. To make the two items heap look better and keep the number and size of the heap within the number and proportion that can be quickly operated, the two items heap made a wise rule: the size of each heap (total number of nodes) can only be 1, 2, 4, 8, 16... And only one heap of each size. The power of several distinct 2 is sufficient to represent any positive integer. Therefore, this rule can be expressed regardless of the size of the two items heap. It is easy to maintain this nature. When two heap with the same size are met, the heap is merged into a heap with the same size. Because two heap with the same size are always merged, every heap in the two Heap has a wonderful look. Let's look at a heap with the size of 16 following the end of this article, let's take a look at it and you will be able to see it again. The figure below shows the same tree represented by the "Left son and right brother" method. The line that goes down is the left son, and the line that goes down to the right is the right son.

Finally, let's briefly talk about the Fibonacci heap. We can still perform operations on the data in the heap by keeping an array record that is changing to the position of a node in the heap, at least operations such as deleting and changing values are acceptable. But it also takes some time. The Fibonacci heap is quite open and is more open than the two heap. It can reduce the value of a node without any time. It is like this: You can have a house heap on both sides. Why can't I? As a result, it is too lazy to float the reduced node by 1.1 points, but directly uses it as the root and uses it as a new heap. Every time I want to check the minimum value, I will merge them one by one to restore them into a heap, just like two heaps (but not the size of the heap. Of course, this method has a applicability, that is, the preceding value can only be reduced. When do we need a value to only reduce the non-increasing Heap Structure? No more than Dijkstra graph algorithms. Therefore, these graph algorithms can be further accelerated by optimizing with the Fibonacci heap.

A man with a woman is very happy. In fact, this is one-sided. It should be said that men with more than one woman are happier. However, this will break my character, and it will be hard for a girl to know. There are a lot of good women talking, And secrets spread fast in women. As a result, I plan to be a friend of two wonderful women at different times. Later, I realized that this would not work. A woman is so invincible that even if A and B play well, B and C play well, messages from A and C are also interconnected. Even if there is only one friend relationship, the two groups can be linked together. I had to change my strategy so that there was no channel between my girlfriends to transmit information. That is to say, although there is no direct connection between A and C among the above three people A, B and C, I cannot play with A and C at the same time. Soon after, I want to know whether two women can transmit information through a "friend chain. This is the so-called equivalent relationship-basically determining the connectivity of an undirected graph. Just like a lot of sets, we can select two and one at a time, and we always want to know whether the two elements are in the same set after being merged. What should we do? One day later, I found that girls like to play games that recognize their relatives, who is, who is, who is, who is, WHO ). I suddenly realized that my problem could be solved by using a tree structure. The relatives are still relatives, but one thing is always the same: the ancestor of all relatives is always the same. All of them are the same. Therefore, to combine two sets, you only need to make the root of one of the sets a son of an element in another set, this change in family tree relationships will make all the elements in the previous set have the same originator as those in the later set, which will become the "sign" of these elements ". This idea is inspired by the female world, so women still play a role.
This is called "parallel query", and "non-intersection. It can merge two sets and query whether two elements are in the same set. We have a very effective pruning: In recursion, we will turn all the generations that pass through the road into the sons of the root. Such a program only uses two lines.
function find_set(x:integer):integer;
begin
if x<>p[x] then p[x]:=find_set(p[x]);
exit(p[x]);
end;
P [x] indicates the location of the father of element x. In the beginning, p [x] is equal to x itself, indicating that you are a collection. The find_set (x) function returns the root of the Set (tree) in which x is located.
There are also some other pruning and complex efficiency analysis problems in the query set.

Here, several sections in data structure and algorithm analysis are clearly explained. The description in this article adjusts the order of the chapters in the original book and does not involve any minor issues in the book. Therefore, I would like to mention some of the missing things here.
Some tree structures may require multiple requirements at the same time. For example, a simple question: what if we want to construct a heap so that we can find the smallest element and the largest element? In this case, we can use a special method: the singular layer of the tree satisfies one property, and the double layer of the tree satisfies another property. We use a theme called the smallest-largest heap to solve the problem mentioned above. The dual-data layer in this heap is smaller than his father's father, and the data in the singular layer is more than his father's father. Using a similar method, we can also design a binary search tree to support data containing two different types of elements. One operation is performed on the singular layer, and the other is performed on the double data layer. This allows you to conveniently find data that is located in the specified interval of two different class elements at the same time. This binary search tree is called a 2-d tree. Extend the 2-d tree and we can get the k-d tree. The specific implementation methods of these data structures are not mentioned here. The book is also used as an exercise introduction.
In chapter 5 of the book, it took nearly 50 pages to introduce and analyze various sorting algorithms. In section 11th, we spent 10 pages on external sorting. The so-called external sorting means how to quickly sort a large file that cannot be fully read into the memory. Many sorting operations are feasible because they can read or write any specified number at will. However, in large files, we cannot implement "1,234,567,890th elements and 123rd elements interchange locations", and do not implement recursive operations. Instead, we can only "repeat" like a tape ", scan from start to end. Because the file size is too large and the memory is unacceptable, you must read and throw it. Therefore, external sorting is generated. Do not think that this limit will slow down the sorting speed. In fact, external sorting breaks through the limit of O (n ^ 2. It uses the idea of "merging two sorted arrays" in Merge Sorting, because this operation can be performed while reading. Split the file into two files first, and then process each file into an equi-long sequence (the size of a segment depends on the size of the memory that can be processed at a time ), then, the two files are extracted and merged. We can see that the length of each ordered sequence is increased to 2 times. After several times, the length will become the total length of the file. Note: we must prepare for the next merge each time (that is to say, the merged result is still two files with segments ). A good way is to store the Merged Results in two different new files.
Chapter 1 describes graph theory algorithms. This section describes graph traversal (broad search and deep search), AOV, AOE, Dijkstra, network stream, Prim, Kruskal, and NP. I learned two new things and used linear time to find cut points (points that are not connected if the graph is removed) and strong branch (a branch in a directed graph satisfies that any two points can reach each other ). Later I found that there are also some black books, and I think this is not easy to say, so I don't want to talk about it here. When it comes to the black book, I also want to add: The black book really cannot be seen-too many errors. It's not about LRJ. LRJ has his thoughts and experience on real big issues, but he is also confused about many details, which is not conducive to beginners to accept knowledge. Do not believe that I will write a log to correct the mistake of the black book one day. Citing the classic language used in political books to attack "selfish theory of human nature": "It is wrong from theory to practice ".
Chapter 4 describes "algorithm design skills". It may be greedy, divide, rule, backtracking, randomization, and so on. Scheduling Problems, the Huffman tree, the packing problem approximation algorithm, the closest vertex distance grouping algorithm, the optimal binary search tree, the Floyd-Warshall, the jump table, the Miller-Rabin kernel test, and the game algorithm are all in this chapter. lecture, and speak quite well. Because this is not the focus of this book, I will not talk about it here.
Chapter 1 is still being analyzed throughout the entire chapter. This is a complicated problem and a powerful tool for analyzing time complexity. Its analysis tells us not the complexity of a specific operation, but the average complexity of repeated operations. It is necessary to study this because we will encounter data structures, such as "getting slower" degradation and self-adjusting, A single operation does not reflect its true efficiency.

By now, everything in this book has been introduced. In general, this book is worth reading (although it is poorly translated in some places ). Its theory is very strong, it proves that the process is complete (complex analysis also proves that it is very clear, to meet those who have root questions); the whole book is a self-contained system, ECHO; exercise has research, complement each other with the text. In fact, these are common characteristics of foreign textbooks. This is the first foreign teaching material I have read. I will read some more in the future. I have been reading composite mathematics in the past few days (still published by this Publishing House). After reading it, I plan to write "image understanding of some of the content in composite mathematics ". When you read a foreign textbook, you will find that it is different from domestic books and will benefit more from it.

This article is written here. It is a abbreviation of 5000 words. I did not expect to write more than 8000 words. Moreover, this is not an abbreviation, but a simple, systematic, clear, and visualized thought and understanding. This article may be useful to people who already know some relevant knowledge, but it is not suitable for those who have never touched on data structure and algorithm analysis. If one person can get one thing from it, I will write it for the same purpose.

(End)

Matrix67

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Data Structure and algorithm analysis (abbreviated as 5000)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Data Structure and algorithm analysis (abbreviated as 5000)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support