The application of the shortest path algorithm-dijkstra algorithm for word conversion (word ladder problem) (RPM)

Source: Internet
Author: User

One, the problem description

In the English word list, there are some words that are very similar, and they can get another word by changing only one character. For example: Hive-->five;wine-->line;line-->nine;nine-->mine .....

So, there is a problem: given a word as the starting Word (equivalent to the source point of the graph), given another word as the end point, the minimum transformation from the beginning word (each transformation will only transform one character) becomes the end word.

This problem is in fact the shortest path problem.

In the shortest path problem, the shortest path to the solution source point to the end point is similar to the shortest path complexity of all the vertices in the graph, so the shortest path between the two words is the shortest path between the source point and all words.

Given all the English words, about 89,000, we need to find a word that can be changed to at least 15 other words by replacing the individual letters. How is the program implemented?

Given two words, one as the source point and the other as the end point, you need to find out what words were passed through the source point, at least once, with a single letter replaced, into the end word.

For example: (zero-->five): (zero-->hero-->here-->hire-->five)

Second, algorithm analysis

Suppose all the words are stored in a TXT file, one word per line.

Now there are two main problems: ① read the word from the file and construct a graph; the shortest path algorithm of ② graph--dijkstra algorithm is implemented.

Since word a replaces one character into the word B, the reverse word B replaces a character can also become the word a (reflexive) "wine-->fine; fine-->wine". So the diagram is an no-show diagram.

Algorithm analysis of structural graphs:

Now further, assuming that the word has been read into a list<string>, the graph is stored in adjacency table Form, and the structure diagram is actually: How to construct a map<string,list< based on list<string> String>>

Where the key in the map is a word, value is the "neighbor word" list of the word, and the adjacent word is: the word is replaced by one character and becomes another word.

such as: wine adjacent words are: fine, line, nine .....

One of the most straightforward ideas is:

Since the words are stored in list<string>, start with the 1th Word, scan the 2nd to nth words sequentially, and determine if the 1th Word is associated with 2nd, 3,..... n A word is only one character. In this way, the adjacency table of the 1th word in list<string> is found.

Continue, for the 2nd Word, scan in turn 3rd, 4,.... N words, find the adjacency table of the 2nd word in list<string>.

.......

The above procedure can be described as following loops:

    for (int i = 0, i < n; i++) for        (int j = i+1; J < N; J + +)//n indicates the number of words in the Word table
Do something ....

Obviously, the time complexity of the algorithm of the above structure diagram is O (n^2). The specific code is as follows:

1 public     static map<string, list<string>> computeAdjacentWords2 (list<string> thewords) {2         map<string, list<string>> adjwords = new treemap<> (); 3         string[] words = new string[thewords.size ()]; 4         words = Thewords.toarray (words); 5          6 for         (int i = 0; I &l T Words.length; i++) 7             for (int j = i+1; J < Words.length; J + +)//compare all words in the entire Word table 8                 if (Onecharoff (Words[i], Words[j]) 9                 {Ten                     update (Adjwords, Words[i], words[j]);//No map, i--j11                     Update (Adjwords, Words[j], words[i]);//j--i12                 }13         return adjwords;14     }

Note the 4th line, which converts the list into an array, which can improve the execution efficiency of the program. Because, if the array is not converted, in the subsequent 6th, 7 rows for loop, the generic erase at execution time will be shifted frequently (object transformed to string)

The other two tool methods are as follows:

Determine that two words only replace one character into another word    private static Boolean Onecharoff (String word1, String word2) {        if (word1.length ()! = Word 2.length ())//The word length is not equal, certainly does not meet the conditions.             return false;        int diffs = 0;        for (int i = 0; i < word1.length (); i++)            if (Word1.charat (i)! = Word2.charat (i))                if (++diffs > 1)                    return false;        return diffs = = 1;    }    Add a word to the adjacency table    private static <T> void Update (Map<t, list<string>> m, T key, String value) {        Li st<string> LST = m.get (key);        if (LST = = null) {//The key is the first occurrence of            LST = new arraylist<string> ();            M.put (key, LST);        }        Lst.add (value);    }

Dijkstra Algorithm Analysis:

As mentioned above, this is a graph without direction, the shortest path problem, the Dijkstra algorithm implementation of the non-direction graph is much simpler than the weighted graph. The simple reason is that the Dijkstra implementation of the graph requires only a queue, and the idea of "breadth" traversal spreads from the source point to the distance from the source point to the other vertices of the graph, because once the non-graph accesses a vertex and updates its predecessor vertex, Its predecessor vertices will no longer change (see blog post). For a forward graph, the predecessor vertex of a vertex may be updated more than once. Therefore, a more complex data structure is needed to "greedy" to select the shortest vertex of the next distance.

 1/** 2 * Solving the shortest path from start to end using the Dijkstra algorithm 3 * @param adjcentwords Save the word map,map<string, List<string>>ke     Y: Indicates a word, Value: A word that is only one character from the word 4 * @param start start Word 5 * @param end END Word 6 * @return the middle Word from start to end passes 7 */8 public static list<string> Findchain (map<string, list<string>> adjcentwords, string start, String end         {9 map<string, string> Previousword = new hashmap<string, string> ();//key: A word, Value: The word's precursor word 10 queue<string> queue = new linkedlist<> (), Queue.offer (start), and while (!queue. IsEmpty ()) {String Preword = Queue.poll (); list<string> adj = Adjcentwords.get (Preword); (String Word:adj) {18//= ' distance ' (precursor word) for this word has not been updated.                     The first time you traverse to this word, each word's ' distance ' will only be updated once. Previousword.get (word) = = null) {//understand why the If judgment is required 20            Previousword.put (Word, preword); 21         Queue.offer (Word),}23}25}26 previousword.put ( Start, null);//Remember to add the precursor vertex of the source point to the return Gechainfrompreviousmap (Previousword, start, end); 28}

The reason for the If judgment on line 19th is that the precursor of each vertex is updated only once, as mentioned earlier. When you first traverse Word, its predecessor vertex ' Preword ' is permanently determined.

This vertex cannot be a precursor to word when it is possible to traverse from another vertex to the word again later. Because: this path to ' word ' cannot be the shortest. This is the idea of "breadth" search!

Three, the algorithm improvement of the structure graph

The algorithm of the structural diagram is improved separately as a section, because it is well used in the "idea of classification", when processing a large amount of data, the relevant data is first classified, and then in a class, one-to-one processing of all the data in the class.

The classification is to cover all the data, which is equivalent to a full division of the data set S in probability theory.

Constructs a diagram of the words in the list list<string>, essentially finding all the adjacent words for each word. Obviously, if the length of the two words is unequal, they cannot form an adjacency relationship.

Therefore, you can classify all the words in the word list first by the length of the word, divided into 1 words, 2 words in length .... A word of length n. is divided into N collections, which are a whole division of the Word table, because for any word in the word list, it must belong to one of these n sets.

Therefore, the classification will be by length first. Then the words in each class are judged. The improved code is as follows:

 1/** 2 * Constructs adjacency table According to the word 3 * @param thewords contains all the words List 4 * @return map<string, List<string>>ke             Y: Indicates a word, Value: A word that is only one character from the word 5 */6 public static map<string, list<string>> Computeadjacentwords (7 List<string> thewords) {8 map<string, list<string>> adjwords = new Treemap<> ( );  9 Map<integer, list<string>> wordsbylength = new treemap<> ();//Word classification, key denotes word length, value represents the same length of Word collection 10 One for (String word:thewords) update (Wordsbylength, word.length (), Word), and (list& Lt             String> groupWords:wordsByLength.values ()) {//Group processing word string[] words = new String[groupwords.size ()];16 Groupwords.toarray (words); (int i = 0; i < words.length; i++) for (i NT J = i + 1; J < Words.length;                    J + +)//only within a group all words are compared between the IF (Onecharoff (Words[i], words[j])) {21     Update (Adjwords, Words[i], words[j]); Update (Adjwords, Words[j], words[i]); 23 }24}26 return adjwords;27}

Line 11th to 12, complete the word classification, and keep the words sorted by length in a map. The key for map indicates the length of the word, and value represents all the same length of word collection. such as: <4, five,line,good,high....>

The For loop of line 18th through 19 is now only a comparison of all the words in a category. The 6th, 7 lines for loop in the ComputeAdjacentWords2 () method posted in the 2nd (algorithm analysis) above are traversed for all words.

It can be seen that the improved algorithm compares fewer times. But from a time-complexity point of view, it is still O (n^2). With an extra Map<integer, list<string>> to save each category.

Four, summary

This word conversion problem has made me realize the importance of graph theory algorithm. Previously felt that the figure of the algorithm on the tall, remote unreachable, the original application is so real.

The Dijkstra algorithm is a typical greedy algorithm. The minimum heap is required for Dijkstra algorithm implementations with weighted graphs. The delmin operation of the minimum heap is the worst-case complexity of O (LOGN), which conforms to the greedy selection of the Dijkstra in the smallest vertex of the next distance. Second, note that when a vertex is selected, the distance from all the adjacency points of the vertex may be updated, where heap adjustments are required, which can be considered to perform decreasekey (weight) operations on these adjacency points. However, there is a problem where we need to find all the adjacency points for that vertex! Finding operations on an element in the smallest heap is inefficient! (why most of the Dijkstra algorithms based on minimal heap implementations on the Internet do not consider finding adjacency points and performing decreasekey operations on it????) Therefore, the implementation of the Dijkstra algorithm can be achieved by using a more efficient Fibonacci Polachi or a paired heap.

Secondly, it is a sort of idea to classify the big problem of solving and decompose the big problem into several small categories. Simply "comparing" (processing) related elements rather than "comparing" all the elements effectively reduces the time complexity of the program.

Five, complete code implementation

  1 Import Java.io.BufferedReader;  2 Import Java.io.File;  3 Import Java.io.FileReader;  4 Import java.io.IOException;  5 Import Java.util.ArrayList;  6 Import Java.util.HashMap;  7 Import Java.util.LinkedList;  8 Import java.util.List; 9 Import Java.util.Map; Ten import Java.util.Queue; Import Java.util.TreeMap;     public class Wordladder {14 15/* 16 * Read the word from the file into the list<string> Suppose a word in a line is not repeated 17 */18 public static list<string> Read (final String filepath) {list<string> wordList = new ARRAYLIST&L T String> (); File File = new file (filepath); FileReader FR = null; BufferedReader br = null; String lines = null; String word = null; The-try {filereader fr = new (file); br = new BufferedReader (FR); stri NG line = null; int index =-1;             (lines = Br.readline ()) = null) {32    Word = line.substring (0, Line.indexof ("")). Trim (); line = Lines.trim (); index = Line.indexof (""); if (index = =-1) continue; PNs Word = line.substring (0, Line.indexof ("")); Wordlist.add (word);             (IOException e) {e.printstacktrace () () (), and finally {43             try {fr.close (); Br.close ();) catch (IOException e) {47 48 } (+})-return wordList; 52} 53 54/** 55 * Construct adjacency table According to Word * * @param thewords contains all the words List @return map<string, list< String>>key: Denotes a word, Value: A word that is only one character short of the word. */public static map<string, list<string>> Comput Eadjacentwords (list<string> thewords) {map<string, list<string>> adjWords = New TREEMAP&LT;> (); Map<integer, list<string>> wordsbylength = new treemap<> (); Word:thewords for (String) + update (wordsbylength, word.length (), word); List<string> groupWords:wordsByLength.values ()) {string[] words = new String[g Roupwords.size ()]; Groupwords.toarray (words); (int i = 0; i < words.length; i++) (Int j = i + 1; j < Words.length; j + +) (Onecharoff (Words[i], words[j]) {update (adjwords, words[i], wor DS[J]); Update (Adjwords, Words[j], words[i]); (adjwords); Map<string, list<string>> computeAdjacentWords2 (list<string> Thewo RDS) {map<string, list<string>> adjwords = new treemap<> (); string[] Words = new string[thewords.size ()]; Words = Thewords.toarray (words); for (int i = 0; i < words.length; i++) (int j = i+1; J < Words.length; J + +) if (Onecharoff (Words[i], words[j])) (Adjwords, Words[i         ], words[j]);//i--j Update (Adjwords, Words[j], words[i]);//j--i 93} 94 return adjwords; 95} 96 97 98//Judge two words to replace only one character into another word. private static Boolean Onecharoff (String word1, String word2) {1 if (Word1.length ()! = Word2.length ())//The word length is not equal, certainly does not meet the criteria.             101 return false;102 int diffs = 0;103 for (int i = 0; i < word1.length (); i++) 104          if (Word1.charat (i)! = Word2.charat (i)) if (++diffs > 1) 106 return false;107 return diffs = = 1;108}109 110//Add Word to adjacency table 111 private static <T> void upDate (map<t, list<string>> m, T key, String value) {list<string> LST = m.get (key); 113         if (LST = = null) {//The key is the first occurrence of LST = new arraylist<string> (); M.put (key, LST); 116 }117 Lst.add (value); 118}119 120 121/**122 * Use the Dijkstra algorithm to solve the shortest path from start to end 123 * @param ad Jcentwords Save Word map,map<string, List<string>>key: Denotes a word, Value: A word that is only one character from the word 124 * @param start start Word * @pa Ram END END Word 126 * @return the middle Word converted from start to end 127 */128 public static list<string> Findchain (map<string, List<string>> adjcentwords, string start, String end) {129 map<string, string> Previousword = new Ha Shmap<string, string> ();//key: A word, Value: The word's precursor word, queue<string> Queue = new linkedlist<> (); 13 1 Queue.offer (start), 133 while (!queue.isempty ()) {134 String Preword = Queue.poll () ; 135 list<string> adj = adjcentwords.get (Preword); 136 137 for (String Word:adj) {138// The ' distance ' (precursor word) that represents this word has not been updated.                     (The first time you traverse to this word), each word's ' distance ' will only be updated once. 139 if (previousword.get (word) = = null) {//understand why the If judgment is required 140             Previousword.put (Word, preword); 141 queue.offer (word); 142}143 144 }145}146 previousword.put (start, null);//Remember to add the precursor vertex of the source point 147 return gechainfromprevious Map (Previousword, start, end), 148}149 private static list<string> Gechainfrompreviousmap (map<str          ING, string> Previousword, string start, String end) {151 linkedlist<string> result = null;152 153 if (previousword.get (end) = null) {154 result = new linkedlist<> (); 155 for (String PR e = end; Pre! = null; Pre = Previousword.get (pre)) 156 Result.addfirst (pre); 157}158 RetuRN result;159}160} 

The word TXT file format is processed as follows:

Http://www.cnblogs.com/hapjin/p/5445370.html

The application of the shortest path algorithm-dijkstra algorithm for word conversion (word ladder problem) (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.