Implementing Django's Full-text Search (ii): Integrated Haytack

Last Update:2018-08-20 Source: Internet

Author: User

Tags install django pip install django

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We have learned the basics of using whoosh for Chinese full-text search, so you can basically create indexes, update indexes, and search by Django yourself.

In Django, we typically put data such as articles in the database model, such as the following models: [python] view plain copy class Blog (models. Model): Title=models. Charfield (U ' title ', Max_length=200,blank=true) content=models. TextField (U ' content ', blank=true) def __unicode__ (self): return self. Title class Meta:verbose_name=u "blog"
We need to enable users to search the contents of content content fields in the blog, and of course you can use whoosh to write your own code.

Now, however, there is a search app dedicated to Django, which is the protagonist of this article: Django-haystack.

Django-haystack is a third-party app dedicated to adding full text search to Django, allowing you to easily index, search, and simplify your work with the contents of the model.

And Django-haystack is designed to support Whoosh,solr,xapian,elasticsearc four Full-text search engine backend, belonging to a full-text search framework, which means that if you do not want to use this

Whoosh, you can always replace it with Xapian and other search engines without changing the code.

Like Django, Django-haystack is very easy to use.

First install Django-haystack, use pip install Django-haystack directly, however, I suggest the official website to download the latest 2.0,PIP installed 1.x version.

Take the above blog as an example

1, we create a search_indexes.py in the app directory, the code is as follows:

[python] view plain copy from models import blog from haystack import indexes Class blogindex (indexes. Searchindex, indexes. indexable): text = indexes. Charfield (document=true, use_template=true) Def get_model (self): return Blog def index_queryset (self): "" "Used when the entire index for model is updated." "" return self.get_model (). Objects.all () #确定在建立索引时有些记录被索引, here we simply return all records

2, in the template directory Templates/indexes/<appname>/blog_text.txt, the contents are as follows: [python] view plain copy The role of this template is to let the text field contain content that might be useful in a later template.

3, in the settings.py inside configuration: [python] view plain Copy haystack_connections = {' default ': {' ENGINE ': ' Haystack.backends.whoos H_backend. Whooshengine ', ' PATH ': Os.path.join (Project_path, ' Whoosh_index '),},}

4, in the template directory templates/under the establishment of search.html, the contents are as follows:

[Python] View plain copy <form method= "get" action= "" > <table> {{ form.as_table }} <tr> <td> </td> <td> <input type= "Submit" value= "Search" > </td> </tr> </table> {% if query %} 5. Add a line to the urls.py: [Python]View plain copy URL (r ' ^search/', include (' Haystack.urls ')),

6, finally, on the command line, run manage.py rebuild_index, create the index.
Well, the basic configuration is this, if you are not satisfied you can chew the official document, you can do more complex control.

Then run it and you can see:

Enter the keyword "central" ..... (You have to fill some data into the blog beforehand)

Silly, how did not result?

Oh, think of what we said in the previous article in Chinese search, yes, the same default whoosh is not supported in Chinese, you must do some processing.

It's easy to create a chineseanalyzer.py of what we talked about in the previous article and save it to the Haystack installation folder \lib\site-packages\haystack\backends.

The code is as follows:[Python] View Plain Copy import jieba from whoosh.analysis import regexanalyzer from whoosh.analysis import tokenizer,token Class chinesetokenizer ( Tokenizer): def __call__ (self, value, positions=false, chars=false, keeporiginal=False, removestops=True, start_pos=0, start_char=0, mode= ' , **kwargs): #assert isinstance (value, text_type), "%r is not unicode" % value t = token (positions, chars, removestops=removestops, mode=mode, **kwargs) seglist=jieba.cut (value,cut_all=true) for w in seglist: t.original = t.text = w t.boost = 1.0 if positions: t.pos=start_pos+value.find (W) if chars: t.startchar=start_ char+value.find (w) t.endchar=start_char+value.find (W) +len (w) yield t Def chineseanalyzer (): return chinesetokenizer ()
And then copy the whoosh_backend.py inside the \lib\site-packages\haystack\backends into whoosh_cn_backend.py,

Yes, we have to write a whoosh back end, but there is no need to write it back, just a few small changes.

Open whoosh_cn_backend.py for modification. as follows: [python] view plain copy #在whoosh_cn_backend. py ... chineseanalyzer import Chineseanalyzer ... #然后找到build_schema函数处, this is a #找到 schema_fields[field_class.index_fieldname] = TEXT (Stored=true, A, that constructs the participle pattern) Nalyzer=stemminganalyzer (), field_boost=field_class.boost) We see the default use of the Stemminganalyzer word breaker, is it, replaced by our word breaker module. as follows: [python] view plain copy schema_fields[field_class.index_fieldname] = TEXT (Stored=true, analyzer= Chineseanalyzer (), Field_boost=field_class.boost)
Save, over.

in settings.py configuration: [python] view plain Copy haystack_connections = {' default ': {' ENGINE ': ' Haystack.backends.whoosh_cn_backend. Whooshengine ', ' PATH ': Os.path.join (Project_path, ' Whoosh_index '),},}

The final operating effect is as follows:

It 's almost ready to wrap up here.

However, in order to let everyone want to Chinese word segmentation more concept, I mentioned above the use of regular expressions to do the test word.

We'll change the above code to: [python] view plain copy schema_fields[field_class.index_fieldname] = TEXT (Stored=true, analyzer= Regexanalyzer ([\u4e00-\u9fa5]) | ( \w+ (\.? \w+) *) "), Field_boost=field_class.boost)
We use the Regexanalyzer regular expression method, in fact, this is not a participle, but only to identify Chinese.

And look at the results of the search:

As you can see, in the middle of the search, there is no central in the third hit, but it still appears in the results.

Yes, some friends might have guessed.

The record data there is no "central", but there are "medium", there are "Yang", because there is no Chinese word processing, so they will also

Be searched out. Obviously, Chinese word segmentation is necessary, otherwise there may be a large number of invalid results.

Finally, with Whoosh+haystack, it's easy to add full text search functionality to Django, and all Python code, easy to install,

Easy to integrate, recommend everyone to use.

Original link: http://blog.csdn.net/wenxuansoft/article/details/8170714#reply

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More