Multi-condition greedy weighted matching algorithm for SQL like statements)

Source: Internet
Author: User
Tags rtrim

Last blog: Multi-condition greedy Matching Algorithm for SQL like statements.

Directly join the question.

The previous blog only implements multiple keywords "as many matches as possible ".

However, in practical applications, as many matches as possible are not necessarily reasonable.

Take the phrase "how to register a user on a csdn Website" as an example. It is divided into three words: "csdn", "Registration", and "user. Assume that a record in the database matches the words "csdn" and "register", and the other record matches the words "register" and "user. The number of matching words in the two records is both.AlgorithmThese two records are equivalent, but obviously they are not equal! The term "csdn" is crucial in this sentence and is a prerequisite. Therefore, records matching the words "csdn" and "Registration" take precedence over those matching the words "Registration" and "user.

Based on the above facts, we are aware that keywords are not equal. We should assign them a weight with a higher priority.

SQL like statement multi-condition greedy match weighted reverse version:

Gocreate function get_strarraylength (@ STR varchar (1024), -- string to be split @ split varchar (10) -- separator) returns INTAS begin declare @ location int declare @ start int declare @ length int set @ STR = ltrim (rtrim (@ Str) set @ location = charindex (@ split, @ Str) set @ length = 1 while @ location <> 0 begin set @ start = @ location + 1 Set @ location = charindex (@ split, @ STR, @ start) set @ length = @ Length + 1 end return @ length end go create Function get_strarraystrofindex (@ STR varchar (1024), -- string to be split @ split varchar (10), -- separator number @ index int -- get the nth element) returns varchar (1024) asbegin declare @ location int declare @ start int declare @ next int declare @ seed int set @ STR = ltrim (rtrim (@ Str )) set @ start = 1 Set @ next = 1 Set @ seed = Len (@ split) set @ location = charindex (@ split, @ Str) while @ location <> 0 and @ index> @ next begin set @ start = @ location + @ Seed Set @ Location = charindex (@ split, @ STR, @ start) set @ next = @ next + 1 end if @ location = 0 select @ location = Len (@ Str) + 1 -- there are two situations: 1. The character string does not have a separator number. 2. The character string contains a separator number. After jumping out of the while loop, @ location is 0, by default, there is a separator behind the string. Return substring (@ STR, @ start, @ location-@ start) endgocreate procedure proc_common_superlike -- Name of the primary key field of the table to be queried @ primarykeyname varchar (999 ), -- Name of the table to be queried @ talbename varchar (999), -- Name of the field of the table to be queried, that is, the field where the content is located @ contentfieldname varchar (999 ), -- query the number of records (top *). The higher the number of matching records, the higher the ranking. @ selectnumber varchar (999), -- match the separator @ splitstring varchar (999 ), -- match the character combination string @ words varchar (999) asdeclare @ sqlfirst varchar (999) Declare @ sqlcenter varchar (999) Declare @ sqllast varchar (999) beginset @ sqlcenter = ''declare @ next int declare @ arraylength intset @ next = 1 Set @ arraylength = DBO. get_strarraylength (@ words, @ splitstring) while @ next <= @ arraylengthbegin -- construct an SQL query condition (intermediate part) set @ sqlcenter = @ sqlcenter + 'select' + @ primarykeyname + ', '+ convert (varchar (999), @ arraylength-@ next + 1) + 'as wordpower from' + @ talbename + 'where' + @ contentfieldname + 'like ''' % '+ DBO. get_strarraystrofindex (@ words, @ splitstring, @ next) + '% ''Union all' set @ next = @ next + 1end -- process the intermediate part of the SQL statement, remove the last useless statement set @ sqlcenter = left (@ sqlcenter, (LEN (@ sqlcenter)-10 )) -- construct the starting part of the SQL statement: Set @ sqlfirst = 'select top' + @ selectnumber + ''+ @ primarykeyname + ', count (*) + sum (wordpower) as finalpower from ('-- construct the end part of the SQL statement set @ sqllast =') as t_temp group by '+ @ primarykeyname + 'order by finalpower DESC' -- concatenate a complete SQL statement, execute execute (@ sqlfirst + @ sqlcenter + @ sqllast) End


The call method is the same as the first version:

 
Execute proc_common_superlike 'id', 't_ test', 'content', '20', '|', 'I | o | C'


The name of the primary key field in the ID table.


T_test table name.


Content Matching content field name.


20 select 20 records (the matching degree from top to bottom is getting lower ).


| Delimiter of a keyword.


I | o | C has three keywords: I, O, and C, which are separated by |.

 


The difference is that the keyword has the concept of weight.


Rule: The three keywords I | o | C are weighted down in sequence. The weight of I is 3, the weight of O is 2, and the weight of C is 1.


That is to say, place important keywords in front, and those keywords in front of the back. The higher the front, the higher the weight.

 


Note:

 


This algorithm uses the comprehensive results of weight + matching numbers as the judgment criteria, instead of blindly greedy.


For example, there are five keywords A | B | c | d | E, according to the rules, the weight is: A-5, B-4, C-3, D-2, E-1, assume that a record matches the three keywords A, B, and C, and the other record matches the Four keywords B, C, D, and E.


Follow the algorithm:


The final weight of the first record is 5 (a weight) + 4 (B weight) + 3 (C weight) + 3 (matching number) = 15


The final weight of the second record is 4 (B weight) + 3 (C weight) + 2 (D weight) + 1 (e weight) + 4 (matching number) = 14


Therefore, the first record is preferentially selected. Although the number of matches for the first record is not more than that for the second record, the final weight is high.

 


Original algorithm. You can repost it for any purpose. Just specify the source.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.