Quick full-text search of Django using SOLR

Source: Internet
Author: User
Tags solr alphanumeric characters

 

Use SOLR to quickly implement full-text search for Django. Http://fzuslideblog.appspot.com/2010/03/25/django_solr_search.htmlsource address

Django does not provide full-text search, but there are many options for adding full-text search to Django. You can use sphinx, Lucene, xapian, and so on. Here we use Lucene-based full-text search server SOLR to quickly build a Django full-text search application.

Open-source projects to be used:

SOLR: Http://lucene.apache.org/solr/ (APACHE License)

Django-SOLR-Search:Http://code.google.com/p/django-solr-search/ (bsd l)

PYmmseg-CPP:Http://code.google.com/p/pymmseg-cpp/ (mit l)

Django-SOLR-search is the Django plug-in used to connect to the SOLR server. py mmseg-CPP is a python-encapsulated mmseg Chinese Word Segmentation module.

1.Configure the environment

First install SOLR, download and decompress directly, and then enter the apache-solr-1.4.0/example below, run directly:

Java-jar start. Jar

Then, check http: // localhost: 8983/SOLR/. The welcome page is displayed. Everything is OK.

Download Django-SOLR-search and choose "setup" or "Solango" to add the full-text search project.

Modify settings. py and add the Solango app as follows:

Installed_apps = {
...
'Solango ',
...
}

 

Add SOLR path:

Solr_root = 'path/apache-solr-1.4.0/example /'
Solr_schema_path = solr_root + 'solr/CONF/Schema. xml'
Solr_data_dir = solr_root + 'solr/data'
Solr_default_operator = 'or'

Finally, run the following command in the project:

Let Django start the SOLR server Python manage. py SOLR -- start

Check whether the connection is successful:

Python manage. py Shell

>>> Import Solango
>>> Solango. Connection. is_available ()
True

If the value is true, the environment is ready.

 

2.DefinitionDocumentModel

First, let's look at the model we need to search for in full text:

Class electroniccomponent (models. Model ):
'''
Electronic components products
'''
P_name = models. charfield (_ ('product name'), max_length = 200,
Help_text = _ ("Example: adjustable capacitor "))
Partno = models. charfield (_ ('Part number'), max_length = 200,
Help_text = _ ("alphanumeric characters only (letters, digits and underscores )"))
Dc = models. charfield (_ ('date Code'), max_length = 10, default = '',
Help_text = _ ("digits and '+', '-', '/' Only "))
Qty = models. integerfield (_ ('quantity '), null = true, blank = true,
Help_text = _ ("digits only "))
MFG = models. charfield (_ ('manufactory '), max_length = 200, default = '',
Help_text = _ ("manufactory "))
Pack = models. charfield (_ ('packaging'), max_length = 20, default = '',
Help_text = _ ("packaging "))
Desp = models. textfield (_ ('description'), default = '',
Help_text = _ ("Description "))
Date_update = models. datetimefield (_ ('Last Update'), default = datetime. datetime. Now)
Special_attrib = models. manytomanyfield ('basicbbvalue ')
Cate_id = models. foreignkey ('elecompcategory ')
U = models. foreignkey (icuser)

Def _ Unicode _ (Self ):
Return self. partno

 

Based on the model above, define the document model for full-text search:

From models import electroniccomponent

Import Solango

Class electroniccomponentdocument (Solango. searchdocument ):
'''
Non-ic full-text search model
'''
P_name = Solango. Fields. charfield (copy = true)
Partno = Solango. Fields. charfield (copy = true)
MFG = Solango. Fields. charfield (copy = true)
Pack = Solango. Fields. charfield (copy = true)
Desp = Solango. Fields. charfield (copy = true)
Cate_id = Solango. Fields. charfield (copy = true)
User = Solango. Fields. charfield (copy = true)

Def transform_user (self, instance ):
Return instance. u

Solango. Register (electroniccomponent, electroniccomponentdocument)

 

Generate the full-text search model of electroniccomponentdocument based on the original electroniccomponent. The transform_user function returns the value that you want to return to the user.

3.Create an index

After defining the above model, you can create an index. First, check the defined fields:

Run the command: Python manage. py SOLR -- fields

 

########## Fields ###########

 

<Field name = "MFG" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "false" multivalued = "false"/>

<Field name = "url" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "false" multivalued = "false"/>

<Field name = "text" type = "text" indexed = "true" stored = "true" omitnorms = "false" required = "false" multivalued = "true"/>

<Field name = "site_id" type = "integer" indexed = "true" stored = "true" omitnorms = "false" required = "true" multivalued = "false"/>

<Field name = "desp" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "false" multivalued = "false"/>

<Field name = "partno" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "false" multivalued = "false"/>

<Field name = "cate_id" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "false" multivalued = "false"/>

<Field name = "user" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "false" multivalued = "false"/>

<Field name = "model" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "true" multivalued = "false"/>

<Field name = "p_name" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "false" multivalued = "false"/>

<Field name = "ID" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "true" multivalued = "false"/>

<Field name = "pack" type = "string" indexed = "true" stored = "true" omitnorms = "false" required = "false" multivalued = "false"/>

 

######## Copy fields ########

 

<Copyfield source = "MFG" DEST = "text"/>

<Copyfield source = "desp" DEST = "text"/>

<Copyfield source = "partno" DEST = "text"/>

<Copyfield source = "cate_id" DEST = "text"/>

<Copyfield source = "user" DEST = "text"/>

<Copyfield source = "p_name" DEST = "text"/>

<Copyfield source = "pack" DEST = "text"/>

 

 

The above is the fields defined in the model just now. If an old index already exists, run it first:

Python manage. py SOLR -- flush

Clear the old index. Then synchronize schema. XML according to the document model we defined:

 

Python manage. py SOLR -- Schema

Finally, restart SOLR to create an index:

Python manage. py SOLR -- reindex

 

4.Query and Word Segmentation

Facets query is easy to implement, which is also one of the reasons for choosing SOLR as the full-text search server.

Perform a test on the index just created:

In [1]: From Solango import connection

In [2]: From Solango. SOLR. query import Query

In [3]: q = query ({'facet. field': 'cate _ id'}, q = 'ad ')

In [4]: r = connection. Select (q)

In [5]: R. Count
Out [5]: 25

In [6]: facet_dict = {}

In [7]: For facet in R. facets:
...: Facet_dict [facet. Name] = []
...: For value in facet. values:
...: If value. Count> 0:
...: Facet_dict [facet. Name]. append ({'name': value. Name, 'value': value. Value, 'Count': value. Count })
...:
...:

In [8]: facet_dict
Out [8]:
{U 'cate _ id': [{'Count': 16,
'Name': U'/u6307/u793a/u706f ',
'Value': U'/u6307/u793a/u706f '},
{'Count': 4,
'Name': U'/u5176/u4ed6/u4e94/u91d1/u3001/u5de5/u5177 ',
'Value': U'/u5176/u4ed6/u4e94/u91d1/u3001/u5de5/u5177 '},
{'Count': 3,
'Name': U'/u5355/u7247/u673amcu ',
'Value': U'/u5355/u7247/u673amcu '},
{'Count': 1,
'Name': U'/u5e72/u7c27/u7ba1 ',
'Value': U'/u5e72/u7c27/u7ba1 '},
{'Count': 1,
'Name': U'/u793a/u6ce2/u5668 ',
'Value': U'/u793a/u6ce2/u5668 '}],
U'model': [{'Count': 25,
'Name': u'electroniccomponent ',
'Value': u'product _ electroniccomponent '}]}

In [9]: for one in facet_dict ['cate _ id']:
Print one ['name'], one ['Count']
....:
....:

Indicator Light 16
Other hardware and tools 4
MCU 3
Reed 1
Oscilloscope 1

 

In the preceding example, a new field: cate_id (Category) is added to Facet and "ad" is used as the keyword to query. We can see that a total of 25 data records are retrieved, at the same time, the data is located in five different categories and their numbers are obtained.

Using Facet for query provides an effective solution for Classified search.

Defining more types of queries to implement search is basically irrelevant to Django, and it fully depends on SOLR operations.

As for word segmentation, you can see it independently, and there are many options. Here we use pymmseg as an example:

 

>>> From pymmseg import mmseg
>>> Mmseg. dict_load_defaults ()
>>> Text = 'electronic circuit resistor capacitor'
>>> Algor = mmseg. algorithm (text)
>>> For Tok in algor:
... Print '% s [% d... % d]' % (Tok. Text, Tok. Start, Tok. End)
...
Electronic Circuit [0 .. 12]
Resistance [12 .. 18]
Capacitor [18 .. 24]

 

5.Template display and sorting

For template display, Django-SOLR-search also provides a concise method. After defining the document model, add the corresponding template. Here is an official example.

Specify the template corresponding to the document when defining the model:

Class entrydocument (Solango. searchdocument ):
... Fields...

Class media:
Template = "Coltrane/entry_document.html"
... Transforms...
Solango. Register (entry, entrydocument)

 

Template:

<Div class = "searchdocument">
<H3> entry: <a href = "{document. fields. URL. value }}" >{{ document. fields. title. value }}</A> <P>
{Document. Highlight | safe }}
</P>
<Ul class = "sublinks">
<Li> at {document. Fields. Date. Value | Date: "n J y" }}</LI>
<Li class = "last"> <a href = "{document. Fields. url. Value}"> permalink </a> </LI>
</Ul>
</Div>

 

Use the render_html Method on other pages to reference the embedded document template.

{% For Doc in paginator.results.doc uments %}
{Doc. render_html | safe }}
{% Endfor %}

 

Of course, you can also use a normal method to render the template as before.

Background code:

Def solr_search (request ):
'''
Use SOLR for full-text search
'''
KEYWORDS = request. Get ['q']
Q = query ({'facet. field': 'cate _ id'}, q = keywords)
R = solr_conn.select (q)
Facet_dict = {}

For facet in R. facets:
Facet_dict [facet. Name] = []
For value in facet. values:
If value. Count> 0:
Facet_dict [facet. Name]. append ({'name': value. Name, 'value': value. Value, 'Count': value. Count })

Return render_to_response ('product/productlist_solr.html ',
{'Documents': r.doc uments,
'Cate _ id_dict ': facet_dict ['cate _ id'],
'Keyword': keywords },
Context_instance = requestcontext (request ))

 

Template reference:

{% For cate_id in cate_id_dict %}
<TD> <a href = "/product/category/Category category cate_id.name1_0000.html" >{{ cate_id.name }}( {cate_id.count}) </a> </TD>
{% Endfor %}

{% Autoescape off %}
{% For one in documents %}
<Tr class = "{% cycle row1, row2 %}">
<TD> <input type = "checkbox" name = "checkone" value = "{one. ID}"/> </TD>
<TD align = "Left" class = "iclist"> <a href = "/productmenu/commandid one.id1_0000.html"> <spanclass = "partno" >{{ one. partno }}</span> </a> </TD>
<TD >{{ one. MFG }}</TD>
<TD >{{ one. desp }}</TD>
<TD class = "red"> <a href = "/inquire/ask_attach /? Partno = {one. ID }}" >{% trans 'inquiry' %} </a> </TD>
</Tr>
{% Endfor %}
{% Endautoescape %}

 

Display Effect:

 

 

In fact, sorting is controlled by SOLR. Add the following code to settings. py to control sorting:

SEARCH_SORT_PARAMS = {
        "score desc": "Relevance",
        "date desc" : "Date" # Added date
}

 

6.Index Update

Index Update is also crucial for an application. Two basic update methods are provided here.

Immediateindexer:

This is the default instant update and does not require any settings. However, Lucene indexes are not created quickly. Therefore, this method is only applicable when the data volume is small and there are not many data writing and deletion operations.

Dbqueuedindexer

This method creates a table in the database, records the data operation queue, and updates the data in the queue table at the specified time.

You need to add the following in settings. py:

 
SEARCH_INDEXER = "solango.indexing.DBQueuedIndexer"

Scheduled run:

Python manage. py SOLR-index-queued

Looking back, let's take a look at the principle of using SOLR to search for Django. It's not complicated. It's a complete crawling class:

Use urllib2 to access the URL of the SOLR server, read the XML text returned by SOLR, parse, encapsulate the text, and render the template.

The above is a quick way to add full-text search to Django applications, mainly using Django-SOLR-search. This plug-in application background is also interesting. It was originally intended for the website of The Washington Times. This newspaper has never been profitable for 20 years since its creation.

 

Official documentation: http://www.screeley.com/djangosolr/index.html

 

In addition to Django-SOLR-search, many third-party full-text search plug-ins can be used, such as haystack. If you are interested, you may wish to learn more.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.