[Detailed analysis of the Kruskal Algorithm for Solving the minimal spanning tree using the greedy method]-Greedy Algorithm for MST

Source: Internet
Author: User

Original Intention:

Recently readingAlgorithmWe can see that the greedy method solves the problem of MST. Unfortunately, the explanation on the tree is not very fresh, and it is not thorough to find a lot of information on the Internet.

I just casually took the time to take a look at it, and I finally understood the basic idea.

I 'd like to thank the powerful Google for helping me find a good English document. (There is a link below. If you are interested, you can check it out)

Straighten out your ideas and share them with you ~ Hope to help students who want to learn the greedy method.

 

The main content of this blog is the greedy method to solve the Minimum Spanning Tree (MST) problem.

Two algorithms are commonly used to solve the minimal spanning tree using the greedy algorithm: Prim's MST algorithm and Kruskal's MST algorithm (prim algorithm and Kruskal algorithm)

Here we mainly talk about the Kruskal algorithm.

Simple definition of the minimal spanning tree:

Specify the value C (V, W) of each edge (V, W) in the undirected China Unicom Weighted Graph G (V, E ). If G's subgraph G' is a subgraph that contains all the fixed points in G, then G' is called G'sSpanning TreeIf the edge of G' has the smallest weight

Then G' is called G'sMinimum Spanning Tree.

The basic idea of the Kruskal algorithm:

1. First, consider n vertices of G as N isolated connected branches (N isolated points) and sort all edges by weight from small to large.

2. According to the increment order of Edge Weight, if a circle exists after edge is added, this edge is not added until a connected graph is formed.

Explanation of 2: If the two endpoints of the added edge are in different connected branches, the edge can be smoothly added without forming a circle.

The graph used in this example:

Sort weights in ascending order:

After Edge Addition of Kruskal:

Therefore, for any side (u, v), it is necessary to determine whether these two points exist in the same connected branch.

If yes, discard this edge and then judge another edge.

If not, add this edge to the graph and merge the connected branches of U and V.

Then, operate the next edge.

The execution process of this algorithm is to merge the connected branches according to the rules, so that only one connected branch is left.

What kind of data structure supports such operations?

This is a question worth thinking about.CommunityThe feasibility of these storage structures is not discussed here.

Here we will discuss the storage of directed trees.

Some implementation details (basic operations)

Makeset (x): Create a singleton set containing just X // The entire graph is divided into N independent connection blocks during initialization.

Find (x): To which set does x belong? // For any given point X, determine which connected block X belongs

Union (x, y): Merge the sets containing X and Y // merge two connected blocks. X and Y are the two endpoints of an edge, if the find operation above belongs to different connected blocks, they will be merged.

 

Algorithm (Algorithm Implementation ):

Kruskal (g)

1. For all uε V do

Makeset (U); // initialization, making each vertex an independent connected Block

2. x = {signature };

3. Sort the edges E by weight; // sort by Edge Weight

4. For all edges (u, v) ε E in increasing order of weight do // you can determine whether the edge e (u, v) (weight increment order) can be added to the graph.

If find (u) =find (v) Then // if the two endpoints are not in the same connected block, merge the two connected blocks.

Add edge (u, v) to X;

Union (u, v );

Below are the implementation details in the algorithm

How to store a set? (How to store connected blocks)

Example;

{B, e}

{A, c, d, f, g, h}

For each connected block, there are two items to be saved: the root node rank of the tree and the height of the tree.

Root: its parent Pointer Points to itself.

Rank: the height of subtree hanging from that node.

There is also a useful relationship. For vertices x and p (x) in the tree, they represent the parent node of X.

Below is the function implementation

Makeset (X)

1. p (x) = x; // constant time operation

2. rank (x) = 0;

Find (X)

1. While X =p (x) Do // The time taken is proportional to the height of the tree.

X = p (x );

2. Return (X );

 Instances after performing the preceding operations:

After makeset (A), makeset (B ),..., Makeset (g). (After makeset is executed)

Each vertex becomes an isolated connected branch. The number in the upper right corner indicates the rank of the tree.

After Union (a, d), Union (B, E), Union (C, F). (after merging ad, Be, CF)

After Union (C, G), Union (E, A). (after merging CG, EA)

Note that the rank in the upper-right corner of the new connected branch changes. During the merge process, try to minimize the rank.

After Union (B, G ).

Description of rank:

Property 1: For any X, rank (x) <rank (p (x). For any X, the rank of X is smaller than the rank of its parent node.

Property 2: Any root node of rank K has at least 2 K nodes in its tree. Any connected branch with rank K has at least 2 K nodes.

Property 3: if there are n elements overall, there can be at most n/2 K nodes of rank K. if there are a total of N nodes, there are n/2 k connected branches with a rank of K.

Description of property2:Because the principle of union is to minimize the rank of the tree after union, the tree after union is at least a binary tree. That is to say, the node except the leaf node has at least two children.

Explanation of property3: because a tree with a rank of K has at least 2 K nodes, there are at most n/2 K nodes.


Algorithm efficiency analysis:

Kruskal (g)

1. For all uε V do

Makeset (U); // initialization, making each vertex an independent connected Block

2. x = {signature };

3. Sort the edges E by weight; // sort by Edge Weight

4. For all edges (u, v) ε E in increasing order of weight do // you can determine whether the edge e (u, v) (weight increment order) can be added to the graph.

If find (u) =find (v) Then // if the two endpoints are not in the same connected block, merge the two connected blocks.

Add edge (u, v) to X;

Union (u, v );


In the above algorithm

Makeset (): can be completed within a constant time

Sort edges: Efficiency of sorting edge weights O (| E | log | v |) (Time Efficiency of sorting algorithms, not too long for Google)

Find (): searches up from a given point until the root of the tree takes the time to reach the height of the tree, that is, log | v |.

Note:It is worth considering how to determine the number of find () executions.

From the point of view, it is difficult to obtain an accurate answer, because each vertex connected to a vertex is not sure, that is, the points that fail are different, it is not difficult to consider one by one

In fact, the number of find () executions is closely related to the number of edges. In the algorithm, the body of the loop is expanded based on the order of edge weights. For each side we consider, we must consider the two points it connects.

Therefore, the number of find () executions is twice the number of edges. The efficiency of executing a finad () is log | v |, while union can basically be completed within a constant time.

So

Union and find operations: O (| E | log | v |)

 


In fact, the idea of this algorithm is very simple, and each time we select the smallest one, if it meets the conditions, we will add it to our results. If it does not meet the conditions, we will select the next smallest one.

We only need to consider more, such as what data structure is used for storage, and whether two points belong to the same connected branch.

It can be seen that the greedy method only provides a basic solution, and it is critical to consider how to implement it to solve the problem.

 

If you see this, you may find that the efficiency of this algorithm is not very impressive. the time used in the last union and find has a great relationship with the number of points n.

If n is large, it will take a lot of time.

So,Can the efficiency of this algorithm be improved?

The answer is yes. The technology used is

Amortized analysis (also called amortization analysis) is a magical idea.

For the Union and find operations on this problem, the efficiency in this article is O (| E | log | v | ). After amortized analysis, the complexity can be: O (| E | log * n)

What is log * n? When N is the number of all substances in the universe, log * n <= 8

That is, to reduce the maximum value of log (n) to 8.

How? Is it objective to improve efficiency ....

If you are interested, please pay attention to my next blog and explain in detail amortized analysis for this article !!

References:

Http://en.wikipedia.org/wiki/Minimum_spanning_tree

Http://www.cs.berkeley.edu /~ Vazirani/algorithms/chap5.pdf

 

If something is wrong, I hope you can point it out.

 

If the original author of this article, please indicate the source of http://www.cnblogs.com/yanlingyin/

 

Thanks!

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.