SOLR's Automatic completion implementation method (Part 1: facet method)

Source: Internet
Author: User

Most people have seen the autocomplete function (see). SOLR provides a mechanism to build this function. Today, I will show you how to use facet to add an Automatic completion mechanism.

Imagine you want to give users some tips in your online store, such as the product name. Assume that our index construction is as follows:

<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>  
<field name="name" type="text" indexed="true" stored="true" multiValued="false" />
<field name="description" type="text" indexed="true" stored="true" multiValued="false" />


The text type is defined as follows:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">  
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>


Before you begin, consider whether you want to implement a name prompt or a full name prompt. This all depends on our selection. We must set the appropriate domain for the places to be guided.

Word prompt 
In the case of words, the field we use is also a token. In this case, the domain name is enough. However, this is a stem, and all analysis operations are on the stem. Therefore, we 'd better change to another type.

Prompt for full name 
We use a different domain configuration to define the full name prompt-the best undefined domain. However, we cannot use fields similar to the string type. For this reason, we define the following fields:


<field name="name_auto" type="text_auto" indexed="true" stored="true" multiValued="false" />

The text_auto type is defined as follows:

<fieldType name="text_auto" class="solr.TextField">  
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>


To avoid affecting the format of the original data, copy the original data:

<copyField source="name" dest="name_auto" />


How to Use 
To use this data, we have prepared a simple query statement:



To be replaced:
Field: We plan to provide the suggested domain. In this example, the domain name is name or name_auto.
User_query: User-input characters

Here, you can set rows = 0, so that only the results of facet can be returned without the query results. Of course this is not necessary.

An example of a query can be written as follows:



The query results will return the following results:

<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<result name="response" numFound="4" start="0"/>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="name_auto">
<int name="hard disk">1</int>
<int name="hard disk samsung">1</int>
<int name="hard disk seagate">1</int>
<int name="hard disk toshiba">1</int>
<lst name="facet_dates"/></lst>


Extended Functions 
Here we will talk about some of his common functions.

The first is to display some additional information about the user, such as the number of results displayed when you select a prompt word. This is an interesting feature.

The other is to use the facet. Sort parameter for sorting. This depends on your requirements. We can sort documents by document quantity (by default, set the parameter to true) or alphabetically (set to false ).

We can also set facet. mincount to display more prompt words than the specified number.

Another good feature is that the prompt word can be obtained not only by the user type, but also by other attributes, which is similar to a category. For example, if we want to show users the products related to household products, we assume that users are not interested in the DVD products, so we add a parameter: FQ = Department: homeapplications (assuming this department is available ). Through such a query, you do not need to match all the indexes, but choose from the department we selected.

Like other methods, it has advantages and disadvantages. The advantage is that it is easy to use, has no additional component dependencies, and can constrain the results to a very small scope to better match the user's needs; another major advantage is that it carries the result statistics for each prompt word. The disadvantage is that additional types and fields need to be added. In addition, because of its facet mechanism, the machine performance and load are very high.

PS: I tested it myself, because this function is a real-time request (each letter is entered as a request), if the amount is large, the statistical amount will occupy a lot of memory, and the memory is too small (my 2 GB) it is easy to oom. Therefore, this function is used with caution.

Facet. prefix is recommended by a buddy on the Internet. Because there is no strong demand in this field, you can start from here when necessary.

Original article:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.