about using an SQL statement to find a record collection algorithm that meets multiple tag conditions simultaneously

Last Update:2014-06-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Table structure

Tag table:{tag_id, tag_name} #标签表

News table:{news_id, Title,......} #新闻表

Newstags table:{tag_id, news_id} #新闻的标签关系表

Explain:

A piece of news that has multiple tag tags, such as:

News A{tag1,tag2, TAG3, TAG4}

News B{tag1,tag6, Tag7, Tag8}

News C{tag8,tag9, TAG10, TAG1}

News... {Tag ..., ...}

Search for records with TAG1,TAG8 two tags at the same time.

SQL optimized in MySQL:

Select News.titlefrom (  select news_id from  (    select tag_id from    tags    WHERE tag_name in (' Tag1 ', ' Tag8 ')  B left joins Newstags C (b.tag_id = c.tag_id)  GROUP by news_id have  COUNT (0) =2) a leftjoin news B on (a . news_id = b.news_id)

Principle:

This is a 3-nested SQL that minimizes the recordset quickly and results in the most memory-efficient way possible.

1, the most inner layer to remove the need to compare the tag_id

2, the second layer through the left JOIN join, find the news_id record that owns these tag_id at the same time; the focus of this treatment is to do group by the news_id and then Count 2 of the news_id in the having (i.e., the existence of these two tag records)

3, the outermost layer according to NEWS_ID (this is the final minimum recordset), with the news table left join to find out the content of the press record

At this point, you can get a template that conforms to multiple tag names, depending on the number of tags found, you need to do a parameter modification to any number of tag intersections in the inner () and second layer.

This algorithm is not optimal, but if you want to use a template method of a SQL to achieve, basically this is the optimal algorithm (I humble opinion ha).

The disadvantage of this algorithm is that if each tag corresponds to a large number of Recordset, and gives a lot of tag intersection, then the second layer of leftjoin operation will consume a lot of memory space (because the need to generate a news_id for each tag_id collection, actually will first get a Cartesian set, Then make a group of this set, then count).

If you want to use efficiency and search memory resource consumption to do the extreme roughly, can make the following changes, the idea is as follows.

1, change the tag table, add a quoted Quantity field, for example: Tag table:{tag_id, tag_name, links} (can actually regard links as a manual index)

2, maintenance of this tag table is more troublesome, in consideration of efficiency, generally will be timed to the links field to do a full table update (for frequent insertion or deletion of records of the news table, can be early every day, to the full record set of tags to do the number of citations and update the links field)

3, constructs the dynamic multi-layered SQL statement (that is, according to the number of tags, generate n-layer nested query)

Implementation principle (SQL statement is not written)

1, according to the given Tag_name find tag_id, and according to the links size, ascending order

2, first select the first tag_id to Newstags table to find the intersection of NEWSTAGS.TAG_ID (this is the minimum basic Recordset) to get the news_id, and then repeat this step for this recordset (gradually reduce the recordset), When you complete the match for all tags, you can find all the recordsets that meet these tag_id in the smallest range.

The core of the algorithm is to get the smallest set of records for the first time (to maximize the number of subsequent alignment), and then gradually make the recordset smaller until the alignment is complete.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

about using an SQL statement to find a record collection algorithm that meets multiple tag conditions simultaneously

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

about using an SQL statement to find a record collection algorithm that meets multiple tag conditions simultaneously

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support