Java-implemented Trie tree data structure

Source: Internet
Author: User

Recently in the study, often see the use of trie tree data structure to solve the problem, such as "There is a 1G size of a file, each line is a word, the word size does not exceed 16 bytes, memory limit size is 1M." Returns the highest frequency of 100 words. "How to fix it?" One option is to use the trie tree plus sort implementation.

What is a trie tree? It is often said that the dictionary tree, on the internet to speak a lot of this, simple to add a personal understanding: it is actually equivalent to the common parts of the word to carry out, so that a layer of a layer up and down to get each node is not the smallest unit!

Like an example on the Internet

A group of words, Inn, int, at, age, ADV, ant, we can get the following trie:

The node here is a word, in fact, each node traversed by the path is the word that the node represents! Not much else.

What good is trie tree?It is a very fast word query structure, of course, for the word to redo statistics is also a very good choice! For example, search engine keyword Lenovo function is a good choice is to use the trie tree! For example, if you enter in, we should suggest Inn and int through the above diagram, so it can be easily implemented! In addition, for the occurrence of Word frequency statistics, and find common prefixes and other issues, can be a good solution! This article is not about theory, just gives the trie tree data structure implemented by Java itself, which realizesInsert, find, traverse, Word association (find common prefixes)such as basic functions, other functions you can add ~ ~ ~The following are the Trie tree classes:
Package Com.algorithms;import Java.util.hashmap;import Java.util.map;public class trie_tree{/** * Internal Node class * @author "ZHSHL "* @date2014 -10-14 * */private class node{private int dumpli_num;////The number of repetitions of the string, which is useful when counting repetitions, with values of 0, 1, 2, 3, 4, 5......private I NT prefix_num;///the number of strings prefixed with the string, which should include the string itself!!!!! The private node childs[];////is implemented here with an array, but it can also be implemented with a map or list to save space private Boolean isleaf;///is the word node public nodes () {dumpli_num=0; Prefix_num=0;isleaf=false;childs=new node[26];}} Private Node root;///Root public Trie_tree () {///Initialize Trie tree root=new Node ();} /** * Insert string, using loop instead of iteration implementation * @param words */public void Insert (String words) {Insert (this.root, words);} /** * Insert string, use loop instead of iteration to implement * @param root * @param words */private void Insert (Node root,string words) {words=words.tolowercase (); Convert to lowercase char[] chrs=words.tochararray (), for (int i=0,length=chrs.length; i<length; i++) {/////Use a value relative to a letter as subscript index, Also implicitly records the value of the letter int index=chrs[i]-' a '; if (root.childs[index]!=null) {////already exists, the child node Prefix_num++root.childs[index]. prefix_num++;} else{///If there is no root.childs[indEx]=new Node (); root.childs[index].prefix_num++;} If the string ends, mark if (i==length-1) {root.childs[index].isleaf=true;root.childs[index].dumpli_num++;} Root points to child nodes, continue processing Root=root.childs[index];}} /** * Traverse trie tree, find all words and occurrences * @return hashmap<string, integer> map */public hashmap<string,integer> Getallwords () {//hashmap<string, integer> map=new hashmap<string, integer> (); return PreTraversal ( This.root, "");} /** * Pre-sequence traversal ... * @param root subtree node * @param prefixs The prefix that was traversed before the node was queried * @return */private hashmap<string,integer> pretraversal (node roo T,string prefixs) {hashmap<string, integer> map=new hashmap<string, integer> (); if (root!=null) {if ( Root.isleaf==true) {////is currently a word map.put (Prefixs, root.dumpli_num);} for (int i=0,length=root.childs.length; i<length;i++) {if (root.childs[i]!=null) {char ch= (char) (i+ ' a ');//// Recursive invocation of the pre-order traversal string Tempstr=prefixs+ch;map.putall (Pretraversal (root.childs[i], tempstr));}}} return map;} /** * Determine if a string is in the dictionary tree * @param word * @return true if exists , otherwise false */public Boolean isexist (String word) {return search (this.root, word);} /** * Query Whether a string is in the dictionary tree * @param word * @return true if exists, otherwise false */private Boolean search (Node root,string wor d) {char[] Chs=word.tolowercase (). ToCharArray (); for (int i=0,length=chs.length; i<length;i++) {int index=chs[i]-' A '; if (root.childs[index]==null) {///if not present, the lookup fails to return false;} Root=root.childs[index];} return true;} /** * Gets a string set prefixed with a string, including the string itself! Associative functions like Word Input method * @param prefix string prefix * @return string set and number of occurrences, returns null if not present */public hashmap<string, integer> getwordsforpre Fix (String prefix) {return getwordsforprefix (this.root, prefix);} /** * Gets a string set prefixed with a string, including the string itself! * @param root * @param prefix * @return string set and number of occurrences */private hashmap<string, integer> getwordsforprefix (Node root,str ing prefix) {hashmap<string, integer> map=new hashmap<string, integer> (); char[] Chrs=prefix.tolowercase () . ToCharArray ();////for (int i=0, length=chrs.length; i<length; i++) {int index=chrs[i]-' a '; if (rooT.childs[index]==null) {return null;} Root=root.childs[index];}   The result includes the prefix itself///This is used to search for return pretraversal (root, prefix) using previous pre-order search methods;} }

Here is the test class:
Package Com.algorithm.test;import Java.util.hashmap;import Com.algorithms.trie_tree;public class Trie_Test {public    static void Main (String args[])//just used for test {trie_tree Trie = new Trie_tree ();    Trie.insert ("I");    Trie.insert ("Love");    Trie.insert ("China");    Trie.insert ("China");    Trie.insert ("China");    Trie.insert ("China");    Trie.insert ("China");    Trie.insert ("Xiaoliang");    Trie.insert ("Xiaoliang");    Trie.insert ("man");    Trie.insert ("Handsome");    Trie.insert ("Love");    Trie.insert ("Chinaha");    Trie.insert ("her");      Trie.insert ("Know");        Hashmap<string,integer> map=trie.getallwords ();    For (String Key:map.keySet ()) {System.out.println (key+ "appears:" + map.get (key) + "Times");        } map=trie.getwordsforprefix ("Chin");    System.out.println ("\ n \ nthe word containing the prefix of chin (including itself) and the number of occurrences:");    For (String Key:map.keySet ()) {System.out.println (key+ "appears:" + map.get (key) + "Times"); } if (Trie.isexist ("Xiaoming") ==false) {SYSTEM.OUT.PRintln ("\ n \ nthe dictionary tree does not exist: Xiaoming"); }            }}


Operation Result:
Love appears: 2 times
Chinaha appears: 1 times
Her appearance: 1 times
Handsome appears: 1 times
Know appears: 1 times
Man appeared: 1 times
Xiaoliang appears: 2 times
I appears: 1 times
China appears: 5 times

The number of words and occurrences that contain chin (including itself) prefixes:
Chinaha appears: 1 times
China appears: 5 times

Does not exist in the dictionary tree: xiaoming

Summary: In the implementation, the main is to think well how to design the structure of each node, here for a total of 26 words, using a character array to record, in fact, can be used in a list or other containers to achieve, so it can accommodate more complex objects! Another aspect is that the Prefix_num property of a node actually refers to the number of repetitions of the path (that is, the string) that the node passes through, rather than the number of repetitions to that node (because the child domain of a node does not refer to a word, so the prefix_num is meaningless to the node itself). Finally, the traversal uses the recursive implementation of the pre-order traversal. I believe it is not difficult to learn a little data structure ...


Trie tree data structures implemented by Java

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.