This is a creation in Article, where the information may have evolved or changed.
Marty Schoch (@mschoch) is a engineer at Couchbase, maker of the High-performance NoSQL distributed database of the same Name. Working with Go for almost 2 and half years, he had been using it to prototype new solutions at Couchbase. This talk introduces Bleve, a text search indexing the library for Go. The slides for this talk has been posted here.
Bleve (pronounced Bleh-vee) is a modern the text indexing library written for Go. It supports a variety of features commonly found in search indexers, including filtering, ranking, and faceting.
When you say "search index," the names is come to mind is typically Lucene, Elasticsearch, and SOLR. Those systems is great, especially if you ' re already using Java and the JVM, but sometimes you don ' t want to pulled in that Dependency or want to avoid standing up yet another external service that can complicate deployment.
The Couchbase team wondered, how easy would it is to build a Go library that supported the most commonly used text analysi s components of Lucene and the could use an off-the-shelf key-value (KV) store as its underlying data store? Thus, Bleve was born.
Here were some of the keys points of how they approached building Bleve:
- They initially focused on the most commonly used text analysis components of Lucene.
- Go interfaces allow users to fill in the gaps with components for their own specific languages and domains.
- They avoided coming up with a new custom file format. There is many interesting KV stores on the "can serve as underlying data stores." They currently support LevelDB, Bolt, and Forestdb.
Here's the set of features Bleve currently supports:
- You can index any GO structure (strings, numeric values, and dates is supported).
- Search:term, Phrase, Match, MatchPhrase, Boolean, Fuzzy, Numeric range, Date range.
- Search results with TF/IDF scoring, contextual snippets, and term highlighting.
- Search result faceting (by term, numeric value, and date).
Getting Started
Installing Bleve is easy. Just Use the go get
command:
$ go get github.com/blevesearch/bleve/...
Including the trailing would /...
also install some helpful command-line utilities.
In just-lines of code, we can create our first index:
The is mapping
a default Index Mapping. The index Mapping is responsible for describing how your documents should being mapped into the index. The default mapping is designed-to-work well-out of the box, but you'll want to revisit this to improve the quality of your R search results.
The call to the New()
function takes the parameters. The first is the path to a directory where the index would be stored and the second are the and the used for this mapping
index.
The call to the Index()
method takes the parameters. The first is a unique identifier for the document, and the second are the document (a Go struct) to be indexed.
Now the we ' ve created an index, we want to open it and search:
Open()
the call to the function is takes a single parameter and the path to the index. The mapping is isn't needed, as it was serialized into the the index at the time of creation.
The query
describes what do we ' re looking for. In this case, it is a termquery, the simplest kind of query. Term queries look for a exact match of the specified term in the index.
The request
describes how the results should is returned. It can control how many results is returned, and whether or not stored fields or facets should also is returned. In this case we use a default request, which would return the first matching documents.
When we run the This example we get:
$ ./search_index 1 matches, showing 1 through 1, took 70.722µs 1. m1 (0.216978)
This shows the one document we put into the-the index does match this query.
Indexing Real World Data
To see more about the features in action, let's index the Gophercon India schedule. We ' ll map the data into the structure below:
Now let's try a more interesting search. This time we'll do a phrase search for "quality search results".
When we run the This example we get:
$ ./phrase_search_schedule 1 matches, showing 1 through 1, took 1.73394ms 1. bleve_-_modern_text_indexing_for_go (1.033644) description …earch component. But delivering high quality search results requires a long list of text analysis and indexing techniques. With the bleve library, we bring advanced text indexing and search to your Go… summary bleve - modern text indexing for Go speaker Martin Schoch
Now let's try one more example. So far all the queries we ' ve executed has been built programmatically, but sometimes it's useful to allow end-users to BU ILD their own queries. To does this we use a QueryStringQuery
:
This particular QueryString shows many options on use:
- Prefixing with
+
or -
changes, clause to a must or must isn't (the default is should).
- Prefixing with
fieldname:
restricts matches to a particular field (the default is _all).
- Placing the term in quotes results in a
PhraseQuery
.
- Suffixing a term with
~N
performs a with FuzzyQuery
edit distance N (default 2).
When we run the This example we get:
$ ./query_string_search_schedule 1 matches, showing 1 through 1, took 10.540776ms 1. bleve_-_modern_text_indexing_for_go (0.338882) description …ist of text analysis and indexing techniques. With the bleve library, we bring advanced text indexing and search to your Go applications. This talk will start with a brief introduction to text search … summary bleve - modern text indexing for Go speaker Martin Schoch duration 25
Putting it all Together
Bleve also includes a set of optional HTTP handlers. These handlers map all the major BLEVE operations to HTTP endpoints and assume that your data and index mappings is encod Ed in JSON documents. By combining the Gophercon India Schedule index with these HTTP handlers it's very simple to build a web-based search inte Rface.
Here we searched for the term "go":
We can see the search results, including stored fields and snippets for the talk description with matching terms highlight Ed. Also, on the right-hand side we see both facets, one for the day of the talk, and another for the duration of the talk. By checking these boxes we can easily add/remove filters and drill deeper into the results.
A hosted version of the application is available for the-try out yourself.
Roadmap
Bleve is still very much under active development. However, a very useful set of functionality is already available. We hope to wrap up a few key features and then prepare for a 1.0 stable release:
- Search result Sorting (currently results is sorted only by score)
- Improved spelling Suggest/fuzzy search
- Performance (so far, focus have been on features and API design)
One more Thing ...
In anticipation of Gophercon India we created a initial analyzer for Hindi. It's still experimental, but the foundation was in place for your to help make it better.
Join the Community
The community around Bleve is growing. We can ' t accomplish all the goals for this project ourselves and need help from a community of the users interested in IMPR Oving support for their own languages and search domains. Join us at blevesearch.com!