API interface for AC automata (Ahocorasick) using Golang

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

We target the crawled pages to match the keyword, but with the size of the crawler more and more large, the key word calculation is not over ....  The data queue has reached about 100w ....  On multiple nodes, a docker keyword matching service is released. Keyword Matching service server number has reached 10, the inside of the code logic has been optimized, the relevant algorithm is also used by the AC automata (Ahocorasick), we also looked at the Python Ahocorasick implementation code, can say this foreigner write well, code quality is very high, The algorithm implementation is also very clear.   There are also few options available in the Python Ahocorasick module. Colleague Xiao June is also tossing the logic of AC automata about his side of the business, there will be a problem. For example, my keyword is the iphone, I match the content is "iphone iphone5 Iphone6″, but with Python under the Ahocorasick match result is the iphone, no match to iphone5 and 6.    According to the AC automata principle, he is able to match to Iphone5, Java version is can match all. Personally, the Python Ahocorasick is doing the matching accuracy, all of which do not match the iphone5 to.

http://xiaorui.cc/?p=1535


Traditional, Python's Gil limits his computational performance, and this multi-match pattern is less suitable for multiprocessing multi-process.  Today, colleagues shared the Golang, then, re-pick up the Golang I once abandoned to solve performance problems. Later, HTTP API services are made available for the business layer to access.

Export Goroot=/usr/lib/go
Export Gopath= $HOME/go
Go get Github.com/gansidui/ahocorasick
Go Build Ac.go

Python Package <textarea wrap="soft" class="crayon-plain print-no" data-settings="" readonly="" style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">Mainimport ("FMT" "Github.com/gansidui/ahocorasick") func main () {AC: = Ahocorasick. Newmatcher () Dictionary: = []string{"Hello", "World", "Nima", "Google", "Golang", "C + +", "Xiaorui"} AC. Build (dictionary) ret: = AC. Match ("Hello Golang google, I love Golang!!! Xiaorui xiaorui.cc ") for _, I: = range ret {FMT. Println (Dictionary[i])}}</textarea>
123456789101112131415161718192021 PackageMainImport (    "FMT"    "Github.com/gansidui/ahocorasick")funcMain() {    AC := Ahocorasick.Newmatcher()    Dictionary := []string{"Hello", "World", "Nima", "Google", "Golang", "C + +", "Xiaorui"}    AC.Build(Dictionary)     ret := AC.Match("Hello Golang google, I love Golang!!! Xiaorui xiaorui.cc ")     for _, I := Range ret {        FMT.Println(Dictionary[I])    }}
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.