A review of the collection (reprint) && template

A little smug, I do the first and look at the set is entirely their own imagination of the method, after using the sense of efficiency is good. Later prepared to seriously study the "standard" and check set, found that I originally created the method.

And check set, is Union-find set, also known as disjoint set (disjoint set). This is a very simple thing in the data structure. Of course, I mean the implementation of the algorithm is simple, not its theoretical analysis is simple. But I still want to pull this off today, because I suddenly found that I wrote these days and check set is wrong, today when the problem was found. So let's summarize this very simple data structure.

As its name reveals, and the basic operations of the check set are two: Union and find. where find (x) returns the number of the collection where element x resides, Union (I,J) merges I with the collection where j is located, meaning that the return value of find must be the same for each of them after this.

When implemented, the tree structure of the father representation is used to ensure that all elements in a set are in a tree, and that the representative element of the collection is the root. The parent node of all nodes is self at the initial time, and it is only necessary to find it along the upward path; If you want to merge two trees, you only need to set the parent node of one of the roots to the root of another tree. The main optimization is the so-called path compression. In short, it is not easy to return to find when the find is implemented (U[x]); instead return U[x]=find (U[x]); This will set the current node to the root of all the nodes on the path of the father to root, the original may be a long path "compression." Of course, this is the implementation of recursion, non-recursive words need to go along the path two times, the first to find the root, the second time to change.

The union operation is simpler, just U[find (x)]=find (y); I wrote the Tarjan algorithm the first two days to write this wrong sentence u[x]=find (y), unexpectedly also AC ... Sweat...... But today it is not so fortunate to do another need and check the set of topics.

The averaging complexity of the sequence of n operations for path compression is O (nα (n)), where α is an inverse function of the Acoman function, which grows very slowly and does not exceed 5 in the conceivable range of N, so it can be regarded as a constant.

The general purpose of the check-set is to maintain an equivalence class of a relationship with reflexive, symmetric and transitive properties.

If: Given the relationships between the elements, it is required to divide the elements into several sets, with the elements in each set directly or indirectly connected. The main concern in this type of problem is the merging and finding of collections, so it is called a set.

Lists are commonly used to calculate and check sets. Each element in the table is set with two pointers: one to the next element in the same set, and the other to the first element of the table.

The chain structure and the search set

With the chain storage structure, the algorithm complexity is only O (1) when the collection is searched, but the algorithm complexity of merging the set reaches O (n). If we want the time efficiency of the two basic operations to be relatively high, the chain storage method is "powerless".

The collection of tree structure

The use of tree structure support and the calculation of the set can meet our requirements. And the check set is different from the general tree structure, each vertex record is not its sub-node, but the parent node is recorded. Here are two ways to calculate the tree structure and check the set

⑴ query directly in the tree

⑵ Edge Query Edge "path compression"

Corresponding to the previous chain storage structure, the advantages of the tree structure are very obvious: the programming complexity is low, the time efficiency is high.

Query directly in the tree

The merging algorithm of the collection is simple, as long as the root node of the two tree is connected, this step is as long as the O (1) time complexity. The time efficiency of the algorithm depends on how quickly the collection is found. The search efficiency of the collection is linearly related to the depth of the tree. Therefore, the time complexity required for direct queries is an average of O (Logn). But in the worst case, the tree degenerated into a chain, making the algorithm complexity of each query O (N).

Edge Query Edge "path compression

In fact, we can further reduce the complexity of the algorithm to find the set: "Path compression" algorithm. The idea is simple: reduce the depth of the tree by the way in the collection's lookup process. With path compression, the time complexity used for each query is the inverse function--α (x) of the Ackerman function, which grows very slowly. For the conceivable n,α (n) is within 5.

and check set: (Union-find sets) is a simple and versatile collection. And the collection is a number of disjoint sets, the ability to achieve a faster merge and determine the set of elements of the operation, many applications. Generally, the tree structure is used to store and check the set, and a rank array is utilized to store the depth lower bound of the set, and the path compression during the find operation accelerates the subsequent lookup operations. This optimizes the implementation of the set, the spatial complexity of O (n), the creation of a set of time complexity of O (1), N Times combined M-find time complexity is O (M Alpha (n)), where Alpha is an inverse function of the Ackerman function, In a very large range (the universe currently observed by humans is estimated to have 10 of the 80 atoms, which is less than the previously mentioned range) the value of this function can be regarded as not greater than 4, so the operation of the check set can be regarded as linear. It supports the following three types of operations:

-union (ROOT1, Root2)//and operation; The set ROOT2 is incorporated into the collection Root1. Requirements: ROOT1 and Root2 do not intersect, otherwise do not perform the operation.

-find (x)//search operation; Searches for the collection where cell x resides, and returns the name of the collection.

-ufsets (s)//constructor. The set of S elements is initialized to a subset of s that have only one element.

-For the and check set, each set is represented by a tree.

-the element name of each element in the collection is stored in the node of the tree, and in addition, each node of the tree has a pointer to its parent node.

-Set s1= {0, 6, 7, 8},s2= {1, 4, 9},s3= {2, 3, 5}

-To simplify the discussion, ignore the actual collection name and only identify the collection with the root of the tree that represents the collection.

-For this purpose, the parent representation of the tree is represented as a collection store. The collection elements are numbered from 0 to n-1. where n is the maximum number of elements. In the parent representation, the number of elements in Group I represents the tree node that contains the collection element I. The parent of the root node is-1, which represents the number of elements in the collection. To distinguish the parent pointer information (≥0), the number of collection elements is expressed in negative numbers.

The parents of S1, S2 and S3, said:

Possible representations of the S1∪S2

const int defaultsize = 10;

Class Ufsets {//and check Set

Private

int *parent;

int size;

Public

ufsets (int s = defaultsize);

~ufsets () {delete [] parent;}

Ufsets & operator = (ufsets const & value);//Set Assignment

void Union (int Root1, int Root2);

int Find (int x);

void Unionbyheight (int Root1, int Root2); };

Ufsets::ufsets (int s) {//constructor

size = s;

parent = new int [size+1];

for (int i = 0; I <= size; i++) parent[i] = 1;

}

unsigned int ufsets::find (int x) {//Search operation

if (Parent[x] <= 0) return x;

else return Find (parent[x]);

}

void ufsets::union (int Root1, int Root2) {//And

Parent[root2] = Root1; Root2 Point to Root1

}

Find and Union operations are not performing well. Suppose that the first n elements form a forest of n trees, parent[i] = 1. Doing the processing union (0, 1), Union (1, 2), ..., Union (n-2, n-1), will produce a degraded tree as shown in the figure.

The time required to perform a union operation is O (1), and the time required for the N-1 Union operation is O (n). If you execute find (0) again, find (1), ..., find (n-1), if

The search element is I, the completion of the find (i) operation takes time to O (i), the total time required to complete n searches will reach

Weighted rules for union operations

To avoid a degraded tree, the improved method is to first determine the number of elements in the two set, if the number of nodes in the tree with I root is less than the number of nodes in the tree with J Root, i.e. Parent[i] > parent[j], then let J be the parent of I, otherwise, let I become the parents of J. This is the weighted rule for the union.

Parent[0] (= =-4) < parent[4] (= =-3)

void ufsets::weightedunion (int Root1, int Root2) {

An improved algorithm based on the weighted rules of union

int temp = PARENT[ROOT1] + Parent[root2];

if (Parent[root2] < parent[root1]) {

PARENT[ROOT1] = Root2; Root2 in multiple nodes

PARENT[ROOT2] = temp; ROOT1 Point to Root2

}

else {

Parent[root2] = Root1; ROOT1 in multiple nodes

PARENT[ROOT1] = temp; Root2 Point to Root1

}

}

Tree obtained by using weighted rules

Topic-Relatives (relation)

"Description of the problem" if a family member is too large to judge whether two are relatives, it is not easy, and now gives a relative diagram, asking whether any given two people are related.

Rule: X and Y are relatives, Y and Z are relatives, then X and Z are also relatives. If x, Y is a relative, then the relatives of X. Are relatives of Y, relatives of Y are also relatives of X. (number ≤5000, relatives ≤5000, ask relatives ≤5000).

"Algorithmic Analysis"

1. Algorithm 1, construct graph theory model.

A n*n two-dimensional array is used to describe the graph above and to memorize the relationships between the points. Then, just determine whether the given two points are connected to the general rule to see whether two elements have a "relative" relationship.

But to implement the above algorithm, we encountered two difficulties:

(1) Space problem: Need n2 space, and n up to 5000.

(2) Time problem: The processing of O (n) is required to determine connectivity each time.

The algorithm is obviously not ideal.

and find out how to optimize the problem of graph theory, we look at and find out how the performance of the set here.

2. Algorithm 2, and check the simple processing of the set.

When we think of a connected block as a set, the question is transformed to determine whether two elements belong to the same set.

Suppose that at first each element belongs to its own collection, and each time a A-B is added to the graph, it is equivalent to merging two elements of the set A and a, since the elements in set a can reach any of the elements in set B by using an edge-a-a and vice versa.

Of course, if A and B are already part of the same set, then the-a-side can not be added.

(1) Specific operation:

① the root node of the tree in which the element is located represents the set where the element resides;

② judge two elements when they belong to the same set, just to determine whether the root node of their tree is the same;

③ that is, when we merge two sets, we only need to connect the edges between the two root nodes.

(2) The merging diagram of the elements:

(3) Determining whether an element belongs to the same set:

Use Father[i] to represent the father node of element I, as shown in the previous figure:

Faher[1]:=1;faher[2]:=1;faher[3]:=1;faher[4]:=5;faher[5]:=3

At this point, we have solved the problem of space with the above algorithm, we no longer need a N2 space to record the structure of the whole graph, only need to record each node belongs to the collection with a record array.

But it is not difficult to think carefully, we still need O (n) Judgment every time we ask whether two elements belong to the same set.

3. Algorithm 3, and check the path of the set of compression.

The approach of algorithm 2 is to refer to the Father node refers to the element refers to the point, when the lesson tree is a chain, it is visible to determine whether two elements belong to the same set need O (n) time, so the path compression has a role.

The path compression is actually after finding the root node, and at the same time recursively returns the parent pointer of the element on the path to the root node.

That is to say, when we "merge 5 and 3", instead of simply pointing 5 of the father to 3, we point directly to the root node 1, so we get an algorithm that is just an O (1) complexity.

Program Manifest

(1) Initialize:

For I:=1 to n do father[i]:=i;

Because each element belongs to a separate collection, each element takes its own root node.

(2) Find the root node number and compress the path:

function Getfather (v:integer): integer;

Begin

If Father[v]=v then exit (v);

Father[v]:=getfather (Father[v]);

GETFATHER:=FATHER[V];

End

(3) Merging of two sets:

Proceudre merge (x, Y:integer);

Begin

X:=getfather (x);

Y:=getfather (y);

Father[x]:=y;

End

(4) Determine whether the elements belong to the same combination:

function judge (x, Y:integer): boolean;

Begin

X:=getfaher (x);

Y:=gefather (y);

If X=y then exit (true)

else exit (false);

End

The introduction of this topic has been fully elaborated and the basic operation and function of the check set.

**Third, and check algorithm**

Through the analysis of the above cited problem, we have been very clear-the so-called and the set algorithm is the disjoint set (disjoint set) for the following two kinds of operations:

(1) Retrieving which set the element belongs to;

(2) Merge two sets.

Our most commonly used data structures are the forest implementations that are set and checked. In other words, in the forest, each tree represents a collection, and a root is used to identify a collection. It is not important that the morphology of the tree is in and of itself, but what is important is that there are elements in each tree.

1. Merging operations

In order to get the two sets S1 and S2 together, only the father of the root of S1 is set to the root of the S2 (or the father of the S2 root is set to the root of the S1).

Here's an optimization: let the smaller-depth tree become a subtree of the larger tree, so the number of lookups will be less. This optimization is called heuristic merging. It can be proved that the depth of the tree after doing so is O (Logn). That is: in a set of n elements, we will ensure that the target can be found without moving more than logn times.

"Proof" we merge a set with a node of I and a set with J nodes, we set i≤j, we add a followed pointer to a small set, but they are now in a set of i+j. Because:

1+log I=log (i+i) <=log (I+J);

So we can guarantee the nature.

Since the depth of the tree is O (logn) after using the heuristic merging algorithm, we can derive the following properties: Heuristic merging up to 2logn pointer can determine whether two things want to contact.

At the same time, we can also draw another nature: heuristic rapid merging of the resulting collection tree, its depth does not exceed, where n is the sum of the number of members contained in the set S of all the subsets.

"Proof" we can prove by inductive method:

When I=1, there is only one root node in the tree, that is, a depth of 1

and |log2 1|+1=1 so right.

Suppose I≤n-1 was established when trying to prove i=n.

Without loss of generality, it can be assumed that this tree is composed of a tree containing M (1≤M≤N/2) elements, the root of J, and a tree containing n-m elements, the root of the K-SK combination, and, the tree J merged into the tree K, the root is K.

(1) If pre-merger: The depth of the subtree SJ, the depth of the sub-tree SK

The combined tree depth is the same as the SK and the depth does not exceed:

|LOG2 (n-m) |+1

Evidently not exceeding |log2 n|+1;

(2) If pre-merger: Sub-tree SJ depth ≥ sub-tree SK

The depth of the merged tree is +1 of the depth of SJ, i.e.:

(|log2m|+1) +1=|log2 (2m) |+1<=|log2n|+1

Summary: The practice tells us that the nature of the above stated for a M-edge n Things of the problem, up to the execution of Mlogn orders. We've just added a little bit of extra code, and we've improved the efficiency of the program considerably. A lot of experiments can tell us that heuristic merging can solve problems in linear time. More precisely, the cost of the algorithm running time, it is difficult to have more obvious excellent, efficient algorithm.

2. Find operations

Finding an element u is also simple, just follow the path of the leaf to the root node to find the root node where you are located, that is, the set of U.

Here's another optimization: when you find the root V of the tree you are in, the father of all points on the path from U to V is set to V, which also reduces the number of lookups. This optimization is called path compression (compresses paths).

There are a number of ways to compress a path, and here are two of the most common methods:

(1) Full path compression (compresses paths): This is a very simple but very common method. is when you add another collection, point to the root node for all nodes encountered.

(2) Two-way compression path (compresses paths by halving): the concrete idea is to put the current node, skip a point to the father's father, from 6 and make the entire path half the depth halved. This approach is a little bit faster than full path compression. The larger the data, the more obvious the difference will be.

The nature of the compression path reduces the depth of the path, making access faster and faster, which is a good optimization. After using path compression, because of the frequent changes in depth, we no longer use depth as the Heuristic function value for the merge operation, but instead use a new rank number. The new set of the newly created rank is 0, later when two rank the same tree merge, randomly choose a tree as a new root, and its rank plus 1, otherwise rank big Tree as a new root, the two tree rank is unchanged.

3. Complexity of Time

And the time complexity of searching for n Times is O (n) (perform n-1 and m≥n). It is an extremely slow-growing function, which is an inverse function of the Ackermann function (Ackermann functions). It can be thought of as less than 5. So it can be considered that the time complexity of the set is almost linear.

From the above analysis, we can draw: and the collection applies to all sets of merge and find operations, further can also extend to some graph theory to determine whether two elements belong to the same connected block operation. Due to the use of heuristic merging and path compression techniques, it is possible to approximate the time complexity of the Set as O (1), and the spatial complexity is O (N), which transforms a large-scale problem into a simple operation with minimal space and very fast speed.

And look up the set template

1, Make_set (x) initializes each element to a set
to create a new set, where the collection has only one element x
2, Union_set (x, y) by rank merging X, y in the collection
3, Find_set (x) Returns the representation of the collection where x resides
when performing a find operation, along the parent node pointer until the root of the tree is found. Everyone should pay attention to the arrows on the way.
4. The standard code for implementing and checking the set:
1 #include <stdio.h>
2
3 const int MAXN = 100;/* Number of nodes online */
4 int PA[MAXN]; /*p[x] Represents the parent node of x */
5 int RANK[MAXN]; /*RANK[X] is an upper bound of the height of x *
/6 7 void Make_set (int x)
8 {/* Create a cell set */
9 pa[x] = x;
Ten rank[x] = 0;
One}
of int find_set (int x)
14 {/* with path compression lookup * /if (x! = pa[x]) pa[x] = Find_ Set (P[x]); [pa[x];
20/* Merges the collection of x, y by rank * *
/void Union_set (int x, int y),
{x = Find_set (x); y = find_set (y); if (rank[x] > Rank[y])/* make rank higher as parent node */- pa[y] = x; Else ( pa[x) = y; if (rank[x] = = Rank[y])
to rank[y]++;
33}