Implementing Django's Full-text Search (ii): Integrated Haytack

Source: Internet
Author: User
Tags install django pip install django
We have learned the basics of using whoosh for Chinese full-text search, so you can basically create indexes, update indexes, and search by Django yourself.

In Django, we typically put data such as articles in the database model, such as the following models: [python] view plain copy class Blog (models. Model): Title=models. Charfield (U ' title ', Max_length=200,blank=true) content=models. TextField (U ' content ', blank=true) def __unicode__ (self): return self. Title class Meta:verbose_name=u "blog"
We need to enable users to search the contents of content content fields in the blog, and of course you can use whoosh to write your own code.

Now, however, there is a search app dedicated to Django, which is the protagonist of this article: Django-haystack.

Django-haystack is a third-party app dedicated to adding full text search to Django, allowing you to easily index, search, and simplify your work with the contents of the model.

And Django-haystack is designed to support Whoosh,solr,xapian,elasticsearc four Full-text search engine backend, belonging to a full-text search framework, which means that if you do not want to use this

Whoosh, you can always replace it with Xapian and other search engines without changing the code.

Like Django, Django-haystack is very easy to use.


First install Django-haystack, use pip install Django-haystack directly, however, I suggest the official website to download the latest 2.0,PIP installed 1.x version.


Take the above blog as an example

1, we create a search_indexes.py in the app directory, the code is as follows:

[python] view plain copy from models  import blog   from  haystack import indexes   Class blogindex (indexes. Searchindex, indexes. indexable):       text = indexes. Charfield (document=true, use_template=true)             Def get_model (self):           return Blog        def index_queryset (self):             "" "Used when the entire index for model is updated." ""              return self.get_model (). Objects.all ()     #确定在建立索引时有些记录被索引, here we simply return all records   

2, in the template directory Templates/indexes/<appname>/blog_text.txt, the contents are as follows: [python] view plain copy The role of this template is to let the text field contain content that might be useful in a later template.


3, in the settings.py inside configuration: [python] view plain Copy haystack_connections = {' default ': {' ENGINE ': ' Haystack.backends.whoos H_backend. Whooshengine ', ' PATH ': Os.path.join (Project_path, ' Whoosh_index '),},}

4, in the template directory templates/under the establishment of search.html, the contents are as follows:

[Python] View plain copy <form method= "get"  action= "" >       <table>            {{ form.as_table }}            <tr>                <td> </td>                <td>                    <input type= "Submit"  value= "Search" >                </td>            </tr>       </table>           {% if query %}             5. Add a line to the urls.py: [Python]View plain copy URL (r ' ^search/', include (' Haystack.urls ')),


6, finally, on the command line, run manage.py rebuild_index, create the index.
Well, the basic configuration is this, if you are not satisfied you can chew the official document, you can do more complex control.

Then run it and you can see:


Enter the keyword "central" ..... (You have to fill some data into the blog beforehand)

Silly, how did not result?


Oh, think of what we said in the previous article in Chinese search, yes, the same default whoosh is not supported in Chinese, you must do some processing.

It's easy to create a chineseanalyzer.py of what we talked about in the previous article and save it to the Haystack installation folder \lib\site-packages\haystack\backends.

The code is as follows:[Python] View Plain Copy import jieba   from whoosh.analysis import regexanalyzer   from whoosh.analysis import tokenizer,token      Class chinesetokenizer ( Tokenizer):       def __call__ (self, value, positions=false,  chars=false,                     keeporiginal=False, removestops=True,                     start_pos=0, start_char=0, mode= ' ,  **kwargs):            #assert  isinstance (value,  text_type),  "%r is not unicode"  % value            t = token (positions, chars, removestops=removestops,  mode=mode,               **kwargs)             seglist=jieba.cut (value,cut_all=true)             for w in seglist:                t.original = t.text = w                t.boost = 1.0                if positions:                    t.pos=start_pos+value.find (W)                 if chars:                    t.startchar=start_ char+value.find (w)                     t.endchar=start_char+value.find (W) +len (w)                 yield t      Def chineseanalyzer ():        return chinesetokenizer ()   
And then copy the whoosh_backend.py inside the \lib\site-packages\haystack\backends into whoosh_cn_backend.py,

Yes, we have to write a whoosh back end, but there is no need to write it back, just a few small changes.

Open whoosh_cn_backend.py for modification.   as follows: [python] view plain copy #在whoosh_cn_backend. py ... chineseanalyzer import Chineseanalyzer ... #然后找到build_schema函数处, this is a #找到 schema_fields[field_class.index_fieldname] = TEXT (Stored=true, A, that constructs the participle pattern) Nalyzer=stemminganalyzer (), field_boost=field_class.boost) We see the default use of the Stemminganalyzer word breaker, is it, replaced by our word breaker module. as follows: [python] view plain copy schema_fields[field_class.index_fieldname] = TEXT (Stored=true, analyzer= Chineseanalyzer (), Field_boost=field_class.boost)
Save, over.

in settings.py configuration: [python] view plain Copy haystack_connections = {' default ': {' ENGINE ': ' Haystack.backends.whoosh_cn_backend. Whooshengine ', ' PATH ': Os.path.join (Project_path, ' Whoosh_index '),},}

The final operating effect is as follows:



It 's almost ready to wrap up here.


However, in order to let everyone want to Chinese word segmentation more concept, I mentioned above the use of regular expressions to do the test word.

We'll change the above code to: [python] view plain copy schema_fields[field_class.index_fieldname] = TEXT (Stored=true, analyzer= Regexanalyzer ([\u4e00-\u9fa5]) | ( \w+ (\.? \w+) *) "), Field_boost=field_class.boost)
We use the Regexanalyzer regular expression method, in fact, this is not a participle, but only to identify Chinese.

And look at the results of the search:


As you can see, in the middle of the search, there is no central in the third hit, but it still appears in the results.


Yes, some friends might have guessed.

The record data there is no "central", but there are "medium", there are "Yang", because there is no Chinese word processing, so they will also

Be searched out. Obviously, Chinese word segmentation is necessary, otherwise there may be a large number of invalid results.



Finally, with Whoosh+haystack, it's easy to add full text search functionality to Django, and all Python code, easy to install,

Easy to integrate, recommend everyone to use.



Original link: http://blog.csdn.net/wenxuansoft/article/details/8170714#reply

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.