Django-full-text retrieval, django-full-text retrieval

Source: Internet
Author: User
Tags install django

Django-full-text retrieval, django-full-text retrieval

After two months, I finally completed all the major functions. In the past week, I have consulted a lot of materials to achieve full-text retrieval, I will record it here today, so as not to catch it later ~

First, I will introduce the Django full-text search logic I used. I checked various materials on the Internet and found that the search engine that Django generally uses is basically whoosh, sphsh, and xapian. The sphentries and xapian at the end can be found on Baidu encyclopedia. They are basically used by a large project. After all, they are written based on C, I certainly don't have to say much about efficiency, but I still don't have to use these two coolders to kill the chicken. This time I used the first whoosh engine. Why?

Whoosh is a full-text search component implemented in python only.

Main features

  • Agile API (Pythonic API ).
  • Pure python implementation, no Binary Package. The program will not crash inexplicably.
  • Index by field.
  • Indexing and search are both very fast-it is currently the fastest pure python full-text search engine.
  • Good architecture. The scoring module, Word Segmentation module, storage module, and other modules are pluggable.
  • Powerful Query Language (implemented through pyparsing ).
  • Spelling check implemented by pure python (currently the only pure python spelling check implementation)

In fact, the most important thing for me is its ease of use and simplicity. After all, for graduation design, the time complexity is very important"

The next step is word segmentation ~ If you retrieve all the content entered by the user, it means no more than fuzzy query of the database ~ So I need a tool that can be used for word splitting, but whoosh uses regular expressions internally for word segmentation. Although it is appropriate in the English world, after all, it is mainly used to retrieve Chinese characters, therefore, we must have a reliable Chinese Word Segmentation dictionary. It is not easy to achieve intelligent and accurate Chinese word segmentation. At present, there are many commercial word segmentation dictionaries in China. Of course, there are also some free Chinese Word Segmentation libraries available, such as the "jieba" Word Segmentation library I use, and they claim to be "the best Python Chinese Word Segmentation component ", this is their GitHub address.

Now that both the search engine and Word Segmentation library are available, the next step is how to integrate the two into our project ~ The following is the main character of today ~ Django-haystack, a third-party app that supports both whoosh, solr, Xapian, and Elasticsearc full-text search engines. This means that if you do not want to use whoosh, you can replace it with other search engines such as Xapian at any time without changing the code. Just like Django, Django-haystack is easy to use.

Now we should integrate them. First, install them first.

pip install whooshpip install jiebapip install django-haystack

Then we createSearch_indexes.py,The Code is as follows:

#! /Usr/bin/env python #-*-coding: UTF-8-*-# @ Date: 14:15:13 # @ Author: jonnyF (fuhuixiang@jonnyf.com) # @ Link: http://jonnyf.comfrom dlnubuy. models import Productfrom haystack import indexesclass ProductIndex (indexes. searchIndex, indexes. indexable): text = indexes. charField (document = True, use_template = True) # index the pdname and description fields pdname = indexes. charField (model_attr = 'pdname') description = indexes. charField (model_attr = 'description') def get_model (self): return Product def index_queryset (self, using = None): return self. get_model (). objects. all ()

Note that the file name must beSearch_indexes.pyOtherwiseAn error is reported !!!

CreateTemplates/search/indexes/<appname>/product_text.txt,The role of this template is to make the content contained in the text field, which may be useful in later templates.

ThenSettings. pyConfiguration:

# full text searchHAYSTACK_CONNECTIONS = {    'default': {        'ENGINE': 'dlnubuy.whoosh_cn_backend.WhooshEngine',        'PATH': os.path.join(BASE_DIR, 'whoosh_index'),    },}
# Automatic Index UpdateHAYSTACK_SIGNAL_PROCESSOR= 'Haystack. signals. RealtimeSignalProcessor'

In this way, our search engine and word divider are connected together, but the main point is that our current search engine still uses its own word divider instead of our jieba, so the next step is to bring them together. It should be noted that I have modified the whoosh search engine here and for convenience of porting, so I put the whoosh search engine under the directory of my app, so that it no longer depends on the local environment.

First../Python27/Lib/site-packages/haystack/backendsDirectoryWhoosh_backend.pyCopy the file to the app directory and change itWhoosh_cn_backend.pyThe file name must be the same as the one configured in the setting file. Of course, not all of the files are modified. Only the individual positions in the period are modified as follows:

# Add the jieba token from jieba to the last line introduced globally. analyze import ChineseAnalyzer # modify schema_fields [field_class.index_fieldname] = TEXT (stored = True, analyzer = ChineseAnalyzer (), field_boost = field_class.boost, sortable = True)

Then re-indexing:python manage.py rebuild_index After the modification is complete, we can use it. For the Automatic Index Update, I don't think it is necessary to update the database every time I modify the database, so I changed it to use it only when I modified the retrieved field.Manage. py update_indexUpdate an index.

The following is the background logic to be processed after the search box is submitted:

# Global search def full_search (request): sform = SearchForm (request. GET) posts = sform. search () template = 'product_search_list.html 'C = Context ({'posts': posts}) return render_to_response (template, c)

Because I am using a template to render the display page, I will not post the code here (too simple, there is nothing to pay attention ), it is mainly to use loops in the template to traverse the posts variable, and the variable type is SearchResult. Here we will briefly list some parameters of SearchResult:

Attribute ReferenceThe class exposes the following useful attributes/properties:app_label - The application the model is attached to.model_name - The model’s name.pk - The primary key of the model.score - The score provided by the search engine.object - The actual model instance (lazy loaded).model - The model class.verbose_name - A prettier version of the model’s class name for display.verbose_name_plural - A prettier version of the model’s plural class name for display.searchindex - Returns the SearchIndex class associated with this result.distance - On geo-spatial queries, this returns a Distance object representing the distance the result was from the focused point.

Here, I use an object and{I. object. description }}This template is used to display details. The other part is the specific details of a specific project ~

The above is all the procedures and logic for implementing full-text search. Note that the haystack 1. version X and version 2. the automatic update of Version X is quite different. You should pay attention to the update details when referring to some materials. Record it here today to avoid forgetting it. By-JonnyF

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.