JavaScript trie prefix tree use detailed

Source: Internet
Author: User
This time to bring you the JavaScript trie prefix tree use in detail, the use of JavaScript trie prefix tree considerations are what, the following is the actual case, take a look.





Introduction



Trie tree (from Word retrieval), also known as prefix word, word search tree, dictionary tree, is a tree structure, is a kind of hash tree variant, is a multi-fork tree structure for fast retrieval.



It has the advantage of minimizing unnecessary string comparisons and querying efficiencies over hash tables.



The core idea of trie is space change time. Use the common prefix of a string to reduce the cost of query time to achieve the purpose of increasing efficiency.



Trie Tree also has its drawbacks, assuming that we only deal with letters and numbers, then each node has at least 52+10 child nodes. To save memory, we can use a linked list or an array. In JS we use the array directly, because the JS array is dynamic and comes with optimization.






Basic properties


    1. The root node does not contain characters, and each child node outside of the root node contains a single character

    2. From the root node to a node. The characters passing through the path are concatenated, which is the string corresponding to that node.

    3. All child nodes of each node contain different characters


Program implementation


// by Situ Zhengmei
class Trie {
 constructor () {
  this.root = new TrieNode ();
 }
 isValid (str) {
  return /^[a-z1-9]+$/i.test(str);
 }
 insert (word) {
  // addWord
  if (this.isValid (word)) {
   var cur = this.root;
   for (var i = 0; i <word.length; i ++) {
    var c = word.charCodeAt (i);
    c-= 48; // reduce "0" charCode
    var node = cur.son [c];
    if (node == null) {
     var node = (cur.son [c] = new TrieNode ());
     node.value = word.charAt (i);
     node.numPass = 1; // N strings pass through it
    } else {
     node.numPass ++;
    }
    cur = node;
   }
   cur.isEnd = true; // Remember there is a string to this node has ended
   cur.numEnd ++; // The number of times this string is repeated
   return true;
  } else {
   return false;
  }
 }
 remove (word) {
   if (this.isValid (word)) {
     var cur = this.root;
     var array = [], n = word.length
     for (var i = 0; i <n; i ++) {
       var c = word.charCodeAt (i);
       c = this.getIndex (c)
       var node = cur.son [c];
       if (node) {
         array.push (node)
         cur = node
       } else {
         return false
       }
 
     }
     if (array.length === n) {
       array.forEach (function () {
         el.numPass--
       })
       cur.numEnd-
       if (cur.numEnd == 0) {
         cur.isEnd = false
       }
     }
   } else {
     return false
   }
 }
 preTraversal (cb) {// Pre-order traversal
    function preTraversalImpl (root, str, cb) {
      cb (root, str);
      for (let i = 0, n = root.son.length; i <n; i ++) {
        let node = root.son [i];
        if (node) {
          preTraversalImpl (node, str + node.value, cb);
        }
      }
    }
    preTraversalImpl (this.root, "", cb);
  }
 // Find in the dictionary tree if a string starts with the prefix (including the prefix string itself)
 isContainPrefix (word) {
  if (this.isValid (word)) {
   var cur = this.root;
   for (var i = 0; i <word.length; i ++) {
    var c = word.charCodeAt (i);
    c-= 48; // reduce "0" charCode
    if (cur.son [c]) {
     cur = cur.son [c];
    } else {
     return false;
    }
   }
   return true;
  } else {
   return false;
  }
 }
 isContainWord (str) {
  // Find if a string exists in the dictionary tree (not a prefix)
  if (this.isValid (word)) {
   var cur = this.root;
   for (var i = 0; i <word.length; i ++) {
    var c = word.charCodeAt (i);
    c-= 48; // reduce "0" charCode
    if (cur.son [c]) {
     cur = cur.son [c];
    } else {
     return false;
    }
   }
   return cur.isEnd;
  } else {
   return false;
  }
 }
 countPrefix (word) {
  // Count the number of strings prefixed with the specified string
  if (this.isValid (word)) {
   var cur = this.root;
   for (var i = 0; i <word.length; i ++) {
    var c = word.charCodeAt (i);
    c-= 48; // reduce "0" charCode
    if (cur.son [c]) {
     cur = cur.son [c];
    } else {
     return 0;
    }
   }
   return cur.numPass;
  } else {
   return 0;
  }
 }
 countWord (word) {
  // Count the number of times a string appears
  if (this.isValid (word)) {
   var cur = this.root;
   for (var i = 0; i <word.length; i ++) {
    var c = word.charCodeAt (i);
    c-= 48; // reduce "0" charCode
    if (cur.son [c]) {
     cur = cur.son [c];
    } else {
     return 0;
    }
   }
   return cur.numEnd;
  } else {
   return 0;
  }
 }
}
class TrieNode {
 constructor () {
  this.numPass = 0; // how many words pass through this node
  this.numEnd = 0; // how many words end here
  this.son = [];
  this.value = ""; // value is a single character
  this.isEnd = false;
 }
}


Let's focus on the Insert method for Trienode and Trie. Since the dictionary tree is mainly used in Word frequency statistics, its node attributes are much more, including Numpass, numend but very important attributes.



The Insert method is used to insert a heavy word, and before we begin, we must determine whether the word is legal and cannot have special characters or whitespace. When inserted, a single character is scattered into each node. Modify the Numpass for each node that passes through it.



Optimization



Now in each of our methods, there is a c=-48 operation, in fact, the number and the uppercase and lowercase letters in fact there are other characters, which will cause unnecessary waste of space


// by Situ Zhengmei
getIndex (c) {
    if (c <58) {// 48-57
      return c-48
    } else if (c <91) {// 65-90
      return c-65 + 11
    } else {//> 97
      return c-97 + 26+ 11
    }
  }


Then the correlation method changes c-= 48 to C = This.getindex (c) to



Test


var trie = new Trie ();
   trie.insert ("I");
   trie.insert ("Love");
   trie.insert ("China");
   trie.insert ("China");
   trie.insert ("China");
   trie.insert ("China");
   trie.insert ("China");
   trie.insert ("xiaoliang");
   trie.insert ("xiaoliang");
   trie.insert ("man");
   trie.insert ("handsome");
   trie.insert ("love");
   trie.insert ("Chinaha");
   trie.insert ("her");
   trie.insert ("know");
   var map = ()
   trie.preTraversal (function (node, str) {
     if (node.isEnd) {
      map [str] = node.numEnd
     }
   })
   for (var i in map) {
     console.log (i + "appears" + map [i] + "times")
   }
   console.log ("Words and occurrences with Chin (including itself) prefix:");
   //console.log("China ")
   var map = ()
   trie.preTraversal (function (node, str) {
     if (str.indexOf ("Chin") === 0 && node.isEnd) {
       map [str] = node.numEnd
     }
    })
   for (var i in map) {
     console.log (i + "appears" + map [i] + "times")
   } 





Comparison of Trie tree and other data structures



Trie tree and two-fork search tree



Binary search tree should be our first contact with the tree structure, we know that the size of the data is N, binary search tree Insert, find, delete operation time complexity is usually only O (log n), the worst case of the whole tree all nodes have only one child node, back into a linear table, at this time insert, find, The time complexity of the delete operation is O (n).



Typically, the height of the trie tree is greater than the length m of the search string, so the time complexity of the find operation is usually O (m), and the worst-case time complexity is O (n). It is easy to see that the worst case finding of trie trees is faster than a two-fork search tree.



The trie tree is to take a string example, in fact, its own suitability for key is strict, if key is a floating-point number, it may lead to the entire trie tree giant long, the node readability is very poor, this case is not suitable to use the trie tree to save data , and the binary search tree does not have this problem.



Trie Tree and hash table



Consider the problem of hash collisions. Hash table usually we say its complexity is O (1), in fact strictly speaking this is close to the perfect hash table complexity, also need to consider the hash function itself needs to traverse the search string, the complexity is O (m). When different keys are mapped to "same position" (considering closed hashing, this "same position" can be replaced by a common list), the complexity of the search needs to be determined by the number of nodes under the "same location", so in the worst case, A hash table can also be a one-way list.



The trie tree can be conveniently sorted by the alphabetical order of key (the whole tree goes through the first sequence), which is different from most hash tables (hash tables are generally unordered for different keys).



In the better case, the hash table can be at O (1) speed hit the target quickly, if the table is very large, it needs to be put on disk, the hash table of the lookup access in the ideal case only one time, but the number of Trie tree access disk needs to be equal to the node depth.



Many times the trie tree needs more space than the hash table, and when we consider the case where a node holds a character, there is no way to save it as a single block when saving a string. The node compression of the trie tree can significantly alleviate this problem, which is discussed later.



Improvement of Trie Tree



Bitwise Trie Tree (Bitwise Trie)



The principle is similar to the ordinary trie tree, except that the smallest unit of normal trie tree is the character, but bitwise trie is a bit of storage. The access of bit data is implemented directly by CPU instruction, and it is theoretically faster than normal trie tree for binary data.



Node compression.



Branch compression: For a stable trie tree, it is basically a lookup and read operation, which can compress some branches completely. For example, the Inn of the rightmost branch in the preceding illustration can be compressed directly into a node "inn" without the need to exist as a regular subtree. Radix tree is based on this principle to solve the problem of trie tree too deep.



Node mapping table: This approach is also used in cases where the nodes of the Trie tree may have been almost completely determined, for each state of the nodes in the Trie tree, if the total number of States repeats a lot, by means of a multidimensional array of numbers (such as triple array Trie). The space overhead of storing the trie tree itself is smaller, though an additional mapping table is introduced.



Application of prefix tree



The prefix tree is still well understood, and its application is very wide.



(1) Fast retrieval of strings



The query time complexity of the dictionary tree is O (LOGL), and L is the length of the string. So the efficiency is still relatively high. The efficiency of the dictionary tree is higher than the hash table.



(2) Sorting strings



It is easy to see that the word is sorted, and the alphabet is first traversed in front. Reduced unnecessary common substrings.



(3) The longest common prefix



The longest common prefix for inn and int is in, where the common prefix for these words is in when traversing the dictionary tree to the letter N.



(4) Auto match prefix display suffix



When we use a dictionary or search engine, input appl, the following will automatically show a bunch of prefixes are appl. Then it is possible to achieve through the dictionary tree, the previous also said that the dictionary tree can find a common prefix, we just need to put the remaining suffix traversal display.



The above is the whole content of this article, I hope that everyone's learning has helped, but also hope that we support the script home.



Believe that you have read the case of this article you have mastered the method, more exciting please pay attention to the PHP Chinese network other related articles!



Recommended reading:



ANGULAR2 Parent-Child component communication mode



Summary of how jquery code is optimized



360 Browser Compatibility Mode page display does not completely how to deal with


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.