SOLR numeric characters cannot be searched for an issue

Source: Internet
Author: User
Tags solr solr query

Question one: The testers tell me that numbers can't be searched. So I started looking for reasons:

<fields>

* * *
<field name= "productName"  type= "text"  indexed= "true"  stored= "true"  />
* * *
</fields>

Fieldtype text configuration:
<fieldtype name= "text"  class= "SOLR. TextField " positionincrementgap=",
   <analyzer type= "index";
  <tokenizer class= "SOLR. Lowercasetokenizerfactory "/>
  <filter class=" SOLR. Edgengramfilterfactory " mingramsize=" 1 " maxgramsize="  side= "front"/>
    </analyzer>
   <analyzer type= "Query";
  <tokenizer  class= "SOLR. Lowercasetokenizerfactory "/>
  <filter class=" SOLR. Edgengramfilterfactory " mingramsize=" 1 " maxgramsize="  side= "front"/>
    </analyzer>
</fieldtype>

When a number character is included in my ProductName. For example, there is a product called ' Gaga 123 ' so you can't search by digital 1/2/3/12 and so on.

The same was true at the time of ' 123 Gaga '. For a long time did not find the reason. I do not know how to find this reason. So the side asked to spray oil. Conjecture is the problem of participle. So while looking at the management interface of SOLR to see what can be found?

Finally QQ Group in a buddy said Solr. Lowercasetokenizerfactory will filter out the numbers in the SOLR analysis menu and see a demo that can be participle is being configured for the current schema.xml. You can also choose the appropriate field to try to lowercasetokenizerfactory this guy's question. Then look for alternative solutions. After trying and searching. The following configuration

Finally solved the problem that the number cannot be searched. (the corresponding attribute is also changed to this type)

<fieldtype name= "Text_inclunum" class= "SOLR. TextField "positionincrementgap=" >
<analyzer type= "Index" >
<tokenizer class= "SOLR. Whitespacetokenizerfactory "/>
<filter class= "SOLR. Edgengramfilterfactory "mingramsize=" 1 "maxgramsize=" side= "front"/>
</analyzer>
<analyzer type= "Query" >
<tokenizer class= "SOLR. Whitespacetokenizerfactory "/>
<filter class= "SOLR. Edgengramfilterfactory "mingramsize=" 1 "maxgramsize=" side= "front"/>
</analyzer>
</fieldType>

Because the products in our library have phonetic fields. And it's capitalized. If I use AMXL search can find the corresponding pinyin. The corresponding product is then searched for amoxicillin. (SOLR configures all queries.) The Pinyin field is copied to all. )

But I can't search if I use AMXL. So I in the program SOLR query statement when the query value toUpperCase (); Finally solved the problem that the lowercase letters could not be searched.

Question two:

But the next day found a new problem introduced. If a product is ' d amoxicillin ' then I use D amoxicillin to search, will not be the ' d amoxicillin ' this product search out. At first I don't know why, put it in SOLR's analysis. Found out. My program has turned it into ' d amoxicillin ' for querying. But SOLR searched for ' d amoxicillin ', this time with all the lowercase letters. If you search with the full name of the product such as "amoxicillin" (auto-complete), you will not be able to search it out.

Solved the problem of numbers. The problem of a lowercase letter was encountered. I didn't find a plan for SOLR this time. So I intend to modify the program. The idea is to change the value of SOLR's query in the program to uppercase. If the value of the query has Chinese, the capitalization is not changed. If not, capitalize.

In that case. If the product has a number, or a lowercase letter can be searched out. The whole letter can also be searched according to pinyin. ("SOLR. Edgengramfilterfactory "mingramsize=" 1 "maxgramsize=" 50 "This is a word from left to right.

Then search the web for a regular lookup string whether there is Chinese:

/**     * Determine if a string contains Chinese     * @param str     * @return *     /Public    static Boolean Iscontainschinese (String str)         {            Matcher Matcher = Pattern.compile ("[\u4e00-\u9fa5]"). Matcher (str);        Boolean FLG = false;          if (Matcher.find ())    {                FLG = true;           }             return FLG;         }  public static string Toupperornot (String temp) {if (temp = = null) return ""; if (Stringutils.iscontainschinese (temp)) { return temp;} Else{return temp.touppercase ();}}

The next Toupperornot () is invoked where SOLR queries the value. It is best to invoke the following escape below.

Tip: SOLR Queries If there are special characters in the query value that need to be escaped:

public static Final Stringnead_to_convert_char= "([/:()!])";  /SOLR query need to convert meaningpublic static string Convertmeaningchar (String temp) {if (temp = = null) return ""; temp = Temp.replaceall (Nead_to_convert_char, "\\\\$1"); return temp;}



SOLR numeric characters cannot be searched for an issue

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.