Union-find Algorithm Introduction

Source: Internet
Author: User
Tags data structures

Reprint: http://blog.csdn.net/dm_vincent/article/details/7655764

This paper mainly introduces an algorithm to solve a kind of problem of dynamic connectivity, and uses a data structure called Union-find.

For more information, refer to section 1.5 of the algorithms book, which is actually based on a review of it.

More of the text gives some conclusions, and I try to give some idea of the process, that is, why to use this method, and not other methods. I think this might make more sense, compared to writing down some conclusions.


About dynamic Connectivity

Let's look at a picture to see what dynamic connectivity is:


Suppose we enter a set of pairs of integers, i.e. (4, 3) (3, 8), and so on, each pair of integers represents that two points/sites are connected. So with the continuous input of data, the connectivity of the entire graph will change, from the above image can be clearly found this point. Also, for points/sites that are already connected, ignore them directly, such as (8, 9) in the image above.


Application Scenarios for dynamic connectivity: Network connection judgment:

If the two integers in each pair represent a network node, then the pair is used to indicate that the two nodes need to be connected. The dynamic connectivity diagram for all pairs will reduce the need for cabling as little as possible, since the two nodes that have been connected will be ignored directly. Variable name equivalence (similar to the concept of pointers):

In a program, you can declare multiple references to point to the same object, and at this point you can determine which references actually point to the same object by establishing a dynamic connectivity graph for the references and actual objects declared in the program.

Model The problem:

When modeling the problem, we should try to figure out what problems need to be solved. Because the data structures and algorithms selected in the model are obviously different depending on the problem, the problem that we need to solve in terms of dynamic connectivity is: give two nodes, determine if they are connected, if they are connected, do not need to give a specific path to give two nodes, to determine whether they are connected, if connected, Need to give a specific path

In terms of the above two problems, although only can give the specific path of the difference, but this difference leads to the choice of algorithms, this article mainly introduces the first case, that is, do not need to give a specific path to the Union-find algorithm, and the second case can use the DFS-based algorithm.

Modeling ideas:

The simplest and most intuitive assumption is that for all nodes that are connected, we can think of them as belonging to a group, so that disconnected nodes are bound to belong to different groups. With the pair input, we need to first determine whether the input two nodes are connected. How to judge it. According to the above assumptions, we can determine the group they belong to, and then see if the two groups are the same, if the same, then the two nodes are connected, and vice versa. For simplicity, we represent all the nodes as integers, that is, an integer representation of n nodes using 0 to N-1. And before processing the input pair, each node must be isolated, that is, they belong to different groups, you can use an array to represent this layer of relationship, the index of the array is the integer representation of the node, and the corresponding value is the group number of the node. The array can be initialized to:

[Java] view plain copy print? for (int i = 0; i < size; i++) id[i] = i;

for (int i = 0; i < size; i++)
	id[i] = i;  


That is, for node I, its group number is also I.

After initialization, there are several possible operations on the dynamic connectivity diagram: The group that the query node belongs to

The value of the corresponding position of the array is the group number to determine whether two nodes belong to the same group

Get the group number of two nodes respectively, and then determine whether the group number is equal to connect two nodes so that they belong to the same group

Get the group number of two nodes respectively, the group number is the same as the end of the operation, not at the same time, the group number of one node is replaced by the group number of the other node gets the number of groups

Initialized to the number of nodes, then decrements 1 after each successful connection of two nodes


Api

We can design the appropriate API:




Note that you use integers to represent nodes, and if you need to use other data types to represent nodes, such as strings, you can use a hash table to map the string to the integer type needed here.

Analyzing the above API, the methods connected and union both rely on find,connected to call two find methods on two parameters, and the Union will need to determine whether the connection is connected before the Union is actually executed, and this is two times the Find method is called. So we need to design the implementation of the Find method as efficiently as possible. So there is the following Quick-find implementation.


Quick-find algorithm: [Java] view plain copy print? public class UF {private int[] ID,//access to Component ID (site indexed) private int count;//number of components Pu Blic UF (int N) {///Initialize Component ID array. Count = N; id = new Int[n]; for (int i = 0; i < N; i++) id[i] = i;} public int count () {return count;} public boolean connected (int p, int q) {return find (p) = = find (q);} public int Fin d (int p) {return id[p];} public void Union (int p, int q) {//Get p and q group number int pID = find (p); int qID = find (q);//If two group numbers are equal , return directly if (PID = = QID) return; Traverse once, change the group number so that they belong to a group for (int i = 0; i < id.length; i++) if (id[i] = = PID) Id[i] = QID; count--; } }

public class UF
{
	private int[] ID,//access to Component ID (site indexed)
	private int count;//Number of C Omponents public
	UF (int N)
	{
		///Initialize Component ID array.
		Count = N;
		id = new Int[n];
		for (int i = 0; i < N; i++)
			id[i] = i;
	}
	public int count ()
	{return count;}
	public boolean connected (int p, int q)
	{return find (p) = = find (q);}
	public int find (int p)
	{return id[p];}
	public void Union (int p, int q)
	{ 
		//get p and q group number
		int pID = find (p);
		int QID = find (q);
		If the two group numbers are equal, return directly if
		(PID = = QID) return;
		Traverse once, change the group number so that they belong to a group for
		(int i = 0; i < id.length; i++)
			if (id[i] = = PID) Id[i] = QID;
		count--;
	}
}

For example, for example, the input pair is (5, 9), then first found by the Find method that their group number is not the same, and then in the Union by a traversal, the group number 1 is changed to 8. Of course, it is also possible to change from 8 to 1, ensuring that the operation is done with a single rule.



The Find method of the above code is very efficient, because the group number of the node can be found only once by an array read operation, but the problem comes with the modification of the group number when a new path is added, because it is not possible to determine which node's group number needs to be modified, so the entire array must be traversed , find the node that needs to be modified, one by one, this look each add new path complexity is linear relationship, if the number of new paths to be added is M, the number of nodes is N, then the last time complexity is MN, is obviously a square order of complexity, for large-scale data, The square order algorithm is problematic, in this case, each time adding a new path is "reaching", to solve this problem, the key is to improve the efficiency of the Union method, so that it no longer need to traverse the entire array.

quick-union algorithm:

Consider why the above solution will cause "reaching". Because each node belongs to the group number are separate records, fragmented, do not have them in a better way to organize, when it comes to change, in addition to notice, modify, there is no other method. So now the problem becomes, how to organize the nodes in a better way, there are many ways of organization, but the most intuitive is the group number of the same node together, think about the data structure, what kind of structure can be some nodes to organize together. The most common is linked list, graph, tree, whatever. But which structure is most efficient for finding and modifying. There is no doubt that it is a tree, so consider how the relationship of nodes and groups is represented as a tree.

If you do not change the underlying data structure, you do not change the way arrays are represented. The node can be organized in a parent-link way, for example, the value of id[p] is the number of the parent node of the P node, if p is the root, the value of id[p] is P, so finally after several searches, a node can always find its root node, which satisfies Id[root] = root node is the root node of the group, then you can use the ordinal of the root node to represent the group number. So when dealing with a pair, the group number of each node in the pair (i.e. the ordinal of the root node of the tree) is first found, and if it belongs to a different group, the parent node of one of the root nodes is set to another root node, which is equivalent to the subtree of another independent tree programmed by a separate tree. The intuitive process is shown in the following figure. But this time again introduced the problem.



On the implementation, the Quick-find only find and union two methods differ from the previous one: [Java] view plain copy print? private int find (int p) {//Look for the root node of the group where the P node is located, the root node has properties id[root] = root while (P! = id[p]) p = id[p]; return p;} public void Uni On (int p, int q) {//Give p and q the same root. int proot = find (p); int qroot = find (q); if (proot = = Qroot) return; ID [Proot] = Qroot; Turn a tree (that is, a group) into a subtree of another lesson tree (that is, a group) count--; }

private int find (int p)
{ 
	//look for the root node of the group where the P node is located, the root node has properties id[root] = root while
	(P! = id[p]) p = id[p];
	return p;
}
public void Union (int p, int q)
{ 
	//Give p and q the same root.
	int proot = find (p);
	int qroot = find (q);
	if (Proot = = qroot) 
		return;
	Id[proot] = qroot;    Turn a tree (that is, a group) into a subtree of another lesson tree (that is, a group)
	count--;
}


Tree this data structure is prone to extreme situations, because in the process of achievement, the final form of the tree depends heavily on the nature of the input data itself, such as whether the data is sorted, randomly distributed, etc. For example, if the input data is ordered, the constructed BST will degenerate into a linked list. In our problem, it is also the extreme situation, as shown in the figure below.



To overcome this problem, BST can evolve into a red-black tree or AVL tree and so on.

However, in the scenario we are considering, there is no comparability between each pair of nodes. So we need to think of other ways. In the absence of any ideas, look at the corresponding code may be some inspiration, consider the quick-union algorithm in the Union method implementation: [Java] view plain copy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.