【天天資料結構和演算法】PHP中trie資料結構的使用情境和代碼執行個體

最後更新：2017-07-19 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：函數 xpl var_dump 沒有 pre clu foreach 系統 alt

一、trie介紹

Trie樹，又稱字典樹，單詞尋找樹或者首碼樹，是一種用於快速檢索的多叉樹結構，如英文字母的字典樹是一個26叉樹，數位字典樹是一個10叉樹。

Trie一詞來自retrieve，發音為/tri:/ “tree”，也有人讀為/tra?/ “try”。

Trie樹可以利用字串的公用首碼來節約儲存空間。如所示，該trie樹用10個節點儲存了6個字串tea，ten，to，in，inn，int。

在該trie樹中，字串in，inn和int的公用首碼是“in”，因此可以只儲存一份“in”以節省空間的。當然，如果系統中存在大量字串且這些字串基本沒有公用首碼，則相應的trie樹將非常消耗記憶體，這也是trie樹的一個缺點。

Trie樹的基本性質可以歸納為：

（1）根節點不包含字元，除根節點意外每個節點只包含一個字元。

（2）從根節點到某一個節點，路徑上經過的字元串連起來，為該節點對應的字串。

（3）每個節點的所有子節點包含的字串不相同。

二、trie的優點

1.尋找或匹配字串時時間複雜度只和樹的深度有關，和節點數量無關。

2.所以，在尋找海量資料，或匹配資料，或過濾資料中有很好的實現。

三、php代碼實現

Trie.php

<?php/** * Created by PhpStorm. * User: jysdhr * Date: 2017/7/4 * Time: 9:57 * Description:PHP實現trie字典資料結構 */include "TrieNode.php";class Trie{    private $root;    public function __construct()    {        $this->root = new TrieNode();    }    public function foreach_trie()    {        echo "<pre>";        print_r($this->root);    }    public function insert($str)    {        $this->__insert($this->root, $str);    }    public function search(string $str): bool    {        return $this->__search($this->root, $str);    }    private function __insert(&$node, $str)    {        if (strlen($str) == 0)            return;        //第一個字元，插入哪一個分叉        $k = ord(substr($str, 0, 1)) - ord(‘a‘);        if (!isset($node->childs[$k]) || $node->childs[$k] == NULL) {            //如果分叉不存在則重新開闢分叉            $node->childs[$k] = new TrieNode();            //記錄字元            $node->childs[$k]->nodeChar = $k;            $node->childs[$k]->is_end = strlen($str) == 1 ? true : false;        }        $nextWord = substr($str, 1);        $this->__insert($node->childs[$k], $nextWord);    }    /**     * @Description:尋找str是否存在樹中     * @User:jysdhr     */    private function __search($node, $str)    {        if (strlen($str) == 0)            return false;        //首先對str進行拆分        $k = ord(substr($str, 0, 1)) - ord(‘a‘);        if (isset($node->childs[$k])) {            $nextWord = substr($str, 1);            if (strlen($str) == 1) {                //匹配最後一個字元                if ($node->childs[$k]->is_end)                    return true;            }            return $this->__search($node->childs[$k], $nextWord);        }        return false;    }}

TrieNode.php

<?php/** * Created by PhpStorm. * User: jysdhr * Date: 2017/7/4 * Time: 10:01 * Description: */class TrieNode{    public  $nodeChar,$childs,$is_end;    public function __construct()    {        $this->childs = array();    }}

testTrie.php

<?php// 測試檔案demo.phpinclude "Trie.php";$str = file_get_contents(‘bbe.txt‘);//將整個檔案內容讀入到一個字串中$badword = explode(" ", $str);//轉換成數組$trie = new Trie();foreach ($badword as $word)    $trie->insert($word);// array_combine() 函數通過合并兩個數組來建立一個新數組，其中的一個數組是鍵名，另一個數組的值為索引值。如果其中一個數組為空白，或者兩個數組的元素個數不同，則該函數返回 false。// array_fill() 函數用給定的值填充數組，返回的數組有 number 個元素，值為 value。返回的數組使用數字索引，從 start 位置開始並遞增。如果 number 為 0 或小於 0，就會出錯。//$badword1 = array_combine($badword,array_fill(0,count($badword),‘*‘));$test_str = ‘knowledgeasdad‘;$start_time = microtime(true);var_dump(in_array($test_str,$badword));$end_time = microtime(true);echo ($end_time-$start_time).‘</br>‘;$start_time1 = microtime(true);var_dump($trie->search($test_str));$end_time1 = microtime(true);echo ($end_time1-$start_time1).‘</br>‘;$start_time2 = microtime(true);foreach ($badword as $value){    if ($value == $test_str){        echo ‘1‘;        break;    }}$end_time2 = microtime(true);echo ($end_time2-$start_time2).‘</br>‘;?>

以上是匹配聖經中的一個單詞的執行個體，in_array是遍曆匹配，當匹配單詞越晚出現，耗時越久，當匹配單詞不存在時，耗時不可接受。

trie的表現比較穩定，無論是否存在，或匹配單詞出現的早晚都與已耗用時間無太大影響，只與單詞長度有關。（效果最好，最穩定）

遍曆尋找基本與in_array效果相當。

四、總結

所以在項目開發中，在做到敏感詞屏蔽，統計詞頻，字典等功能時，可以考慮運用trie資料結構產生一個trie樹序列化儲存起來，待更新詞庫時重新維護trie樹，是不是覺得也不是很困難呢，加油。

【天天資料結構和演算法】PHP中trie資料結構的使用情境和代碼執行個體

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

【天天資料結構和演算法】PHP中trie資料結構的使用情境和代碼執行個體

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support