suggestTree-實現rank-ordered autocomplete suggestions的資料結構

來源:互聯網
上載者:User

通過該資料結構可以實現:當使用者輸入一個字串,返回以這個字串為首碼的排名最靠前的k個字串。

和現在搜尋引擎提供的關鍵詞提示功能一樣。

該資料結構以Ternary Search Tree(TST)為基礎實現的。

關於Ternary Search Tree是什麼,可以參看前面的博文http://blog.csdn.net/suwei19870312/article/details/7467522。

 

SuggestTree樹的節點node:

class Node
{
         public:
                   vector<string> list;                           //用於記錄其孩子節點所能代表的字串的集合,這些字串[0-end]的字元都是相同的,既具有相同的首碼。
                   unsigned int count;                         //用於記錄list中字串的個數
                   unsigned int end;                            //用於記錄list中字串前end個字元是相同的
                   Node* left, *mid, *right;           //三分搜尋樹的3個子節點指標

}

 

SuggestTree樹的構造:

如何構建SuggestTree,構建SuggestTree分為兩個步驟:

1. 以給定的字串集合構建TST tree.

2. 在構建好TST tree之後,把字串加入到樹的各個node中。

void Build(map<string,
int>& iMap)

         {

                   root = NULL;

                   vectorPairType lMapVector;

                   lMapVector.insert(lMapVector.begin(), iMap.begin(), iMap.end());

                   //sort lMapVector by pair->key

                   //for the balance of TST tree

                   sort(lMapVector.begin(), lMapVector.end(), ComparePairKey());

                   hBuildTST(lMapVector, 0, lMapVector.size() - 1);

                   //sort lMapVector by pair->value

                   sort(lMapVector.begin(), lMapVector.end(), ComparePairValue());

                   vectorPairType::iterator ivter = lMapVector.begin();

                   for(; ivter != lMapVector.end(); ivter ++)

                   {

                            addToList(ivter->first);

                   }

         }

輸入的是一個<關鍵字,排名>的map。

首先對map中的pair以“關鍵字”為key排序,之後以這個順序遞迴的構建TST,目的是為了構建平衡的TST。防止由於插入字串順序的不同,而導致TST退化成一個單邊的Tree,這樣對於尋找的效能不是很好。

接著,對map中的pair以“排名”作為key排序,排序的目的是為了把排名最靠前的k個字串放入到TST的node中。

 

構建TST tree的流程:

void hBuildTST(vectorPairType& irVP,int min,int max)

         {

                   if(min <= max)

                   {

                            int mid = (min + max) / 2;

                            insert(irVP[mid].first);

                            hBuildTST(irVP, min, mid -1);

                            hBuildTST(irVP, mid + 1, max);

                   }

         }

前面已經說了構建TST tree的字串集合是以”關鍵字”為key排序的一個list,遞迴構造TST Tree,每次取出[min, max]地區中的中間關鍵字插入到TST中。

 

插入節點流程:

void insert(string& suggestion)

         {

                   if(root == NULL)

                   {

                            root = new Node(suggestion);

                            return;

                   }

 

                   Node* lpn = root;

                   int i = 0;

                   while(true)

                   {

                            string s = lpn->list[0];

                            if(s.at(i) > suggestion.at(i))

                            {

                                     if(lpn->left == NULL)

                                     {

                                               lpn->left =
new Node(suggestion);

                                              
return;

                                     }

 

                                     lpn = lpn->left;

                            }

                            else
if(s.at(i) < suggestion.at(i))

                            {

                                     if(lpn->right == NULL)

                                     {

                                               lpn->right =
new Node(suggestion);

                                              
return;

                                     }

                                     lpn = lpn->right;

 

                            }

                            else

                            {

                                     while( ++i < lpn->end)

                                     {

                                              
if(i == suggestion.length() || s.at(i) != suggestion.at(i))

                                               {

                                                        lpn->mid =new Node(*lpn);

                                                        lpn->end = i;

                                                       
break;

                                               }

                                     }

                                     lpn->count ++;

                                     if(i == suggestion.length())

                                              
return;

 

                                     if(lpn->mid == NULL)

                                     {

                                               lpn->mid =
new Node(suggestion, lpn->list);

                                              
return;

                                     }

                                     lpn = lpn->mid;

                            }

                   }

         }

Insert的 workflow和普通的構建TST的流程很相似:

以suggestion字串和Tree中已有的node的list[0]字串做比較。

1.       如果Suggestion[i]小於 list[0][i], node節點更換為其左孩子,如果左孩子為空白,直接以suggestion字串構建node,作為當前節點的左孩子。

2.       如果Suggestion[i]大於list[0][i], node節點更換為其右孩子,如果有孩子為空白,直接以suggestion字串構造node,作為當前節點的右孩子。

3.       如果suggestion[i] == list[0][i], 這時候如果suggestion是list[0]的首碼,那麼直接返回,如果list[0]是suggestion的首碼,那麼node節點跟換為其中孩子,如果中孩子為,以suggestion和當前節點的list構造新的節點,作為當前節點的中孩子。

 

把關鍵字加入到node中的流程:

void addToList(string& suggestion)

         {

                   Node* lpn = root;

                   int i = 0;

                   while(true)

                   {

                            string s = lpn->list[0];

                            if(s.at(i) > suggestion.at(i))

                                     lpn = lpn->left;

                            else
if(s.at(i) < suggestion.at(i))

                                     lpn = lpn->right;

                            else

                            {

                                     if(lpn->count > lpn->list.size())

                                     {

                                               lpn->list.resize(min(lpn->count, k));

                                               lpn->list[0] = suggestion;

                                               lpn->count = 1;

                                     }

                                     elseif(lpn-> count < lpn->list.size())

                                     {

                                               lpn->list[lpn->count++] = suggestion;

                                     }

                                     i = lpn->end;

                                     if(i == suggestion.length())

                                              
return;

                                     lpn = lpn->mid;

                            }

                   }

         }

addToList的流程和insert()的流程相似,不同的是,這個過程並不修改TST tree的結構,只是填寫node裡的資料。

 

 給定前置詞字元串,在suggestTree中尋找的流程:

Node* hgetBestSuggesttions(string& prefix)

         {

                   if(prefix.length() == 0)

            return NULL;

        Node* lpn = root;

        int i = 0;

        while(lpn != NULL) {

            string s = lpn->list[0];

 

            if(s.at(i) > prefix.at(i))

                lpn = lpn->left;

            else
if(s.at(i) < prefix.at(i))

                lpn = lpn->right;

            else{

                while(++i < lpn->end)

                    if(i == prefix.length())

                        return lpn;

                    else
if(s.at(i) != prefix.at(i))

                        return NULL;

                if(i == prefix.length())

                    return lpn;

                lpn = lpn->mid;

            }

        }

         }

通過對比prefix和各個節點的list[0][0-end],如果找到和prefix相同的list[0][0-end],直接返回當前節點,該節點中list就是想要的排名靠前的相同前置詞字元串集合。

 

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.