MongoDB implementation of keyword-based article retrieval (C # Edition) _mongodb

Source: Internet
Author: User
Tags mongodb

My goal is to achieve:

You can search for articles with one or more keywords.

You can find the relevant articles through the list of keywords in the article.

The results of the query are sorted in descending order of relevance.

Queries are fast enough. (Theoretically keyword retrieval is much faster than full-text search)

Find a big circle on the internet, there is no reliable method, basically can only pass in the word to retrieve, and basically did not provide C # driver version of the code, and then they have developed this implementation scheme:

First of all, the use of tags, participle, keyword extraction components for the corresponding keyword extraction, and then as an array format, into the keywords field of the article.

Core Search Code:

///<summary>///Gets the article number and title map based on the keyword.
Note: This method returns a number of the most matched items and, depending on the degree of matching, returns several results, even if there are no keyword matches.
Another note: If it is based on the article keyword to query, in general, it must contain the original article, so should be expected to obtain the number of 1, and in the results removed from the original. </summary>///<param name= "limitnum" > Quantity cap </param>///<param name= "keywords" > keyword collection </ param>///<returns> article number with title Mapping Dictionary </returns> public async task<dictionary<guid, string>>
Getarticledicbykeywordsasync (int limitnum, ienumerable<string> keywords) {var list = await Database.getcollection<domain.entity.article> ("Article"). Aggregate (). Match (q =>!q.isdeleted && q.keywords!= null). Project (q => New {q.id, q.title, Count = Q.keywords.count (t => Keywords. Contains (t))}). Sortbydescending (q => q.count). Limit (Limitnum).
Tolistasync (); Return list.
ToDictionary (f => f.id, F => f.title); }

Note: This must be done using the aggregate () method instead of the commonly used find () method to do the query, and the search () method followed by the project () method does not change the object type that is manipulated in the method chain, so that the following sortbydescending () The target object of the method is the article class, not the anonymous class defined in the project () method, and the expression passed in in the Sortbydescending () method is only allowed to select object properties and no calculations are allowed, so it is impossible to achieve our requirements. I've been stuck here for a long time before I found the aggregate () method.

Used as a keyword search is usually the keyword into the can, but the return of the results may not be matched to any keyword, so it is best to check the item before the match to the extent of the project, the results of no matching filter out.

As a related article search, directly to the original article of the keywords attribute value, get more than you expect the number of more than 1, because it is very likely that your original article in the obtained list, and is the forefront, get filtered out the original article ID, Then execute the Take method to return the number of items you need (execute the Take method to prevent your original text from appearing in the list, which is extremely rare but also available).

In order to improve query efficiency, you can also preset the index, the code is as follows:

var c=database.getcollection<domain.entity.article> ("Article");
C.indexes.dropall ();
Await C.indexes.createoneasync (
Builders<domain.entity.article> Indexkeys.ascending (q => q.keywords));
Reference from: http://mongodb.github.io/mongo-csharp-driver/2.2/reference/driver/admin/#creating-an-index

The official C # driver version used is: mongodb.driver.2.2.3

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.