Problems to be solved by classification algorithm

Source: Internet
Author: User
Tags definition connect table definition
Solving algorithm | Problems in website construction, the application of classification algorithm is very common. When designing an electronic store, it involves the classification of goods, the classification of columns or channels when designing a publishing system, and the classification of software when designing software downloads; It can be said that classification is a very common problem.

I often interview some programmers, and I almost invariably ask them some questions about classification algorithms. Here are a few questions that I often ask. Do you think you can easily answer ^_^?

1, the classification algorithm often manifests as the tree representation and the traversal question. So, excuse me: If you use a table in the database to express the tree category, how many fields should there be?
2. How to quickly restore a tree from this table;
3, how to determine whether a classification is another subcategory of the subclass;
4, how to find a category of all products;
5, how to generate the path of the classification.
6, how to add new classification;

These questions are not easily answered without limiting the number of levels and categories of classification. This article tries to solve these problems.

Data structure of the classification

We know: The data structure of the classification is actually a tree. In the course of data structure, you may have studied the tree algorithm. Since we use a lot of database in the construction of the website, we will talk about the storage of tree in the database.

To simplify the problem, we assume that each node only needs to keep the name of this information. We need to number each node. There are many kinds of numbering methods. An automatic number is often used in a database. This is true in Access, SQL Server, and Oracle. Assume the Number field ID. In order to indicate that a node ID1 is a parent node of another node ID2, we need to keep a field in the database to show which node this taxonomy belongs to. Name this field Fatherid. such as the ID2 here, its fatherid is ID1. In this way, we get the data table definition for the classification catalog:

Createtable[catalog] (
[ID] [INT] Notnull,
[Name] [nvarchar] (50) Notnull,
[Fatherid] [INT] Notnull
);

Agreement: We agreed to use-1 as the first layer of the classification of the Father code. The category is numbered-1. This is a virtual classification. It has no records in the database.

How to restore a tree

The biggest advantage of the catalog definition above is that it makes it easy to recover a tree-a classification tree. To show the algorithm more clearly, let's consider a simple question: How to display the next level of classification for a category. We know that to query the next level classification of a taxonomy FID, the SQL statement is very simple:
Selectnamefromcatalogwherefatherid=fid
When displaying these categories, we simply use <LI> to:
<%
Remoconn---Database connection, GetChildren is already open when calling
Remfid-----The number of the current category

Functiongetchildren (Oconn,fid)
Strsql= "Selectid,namefromcatalogwherefatherid=" &fid
Setrscatalog=oconn.execute (strSQL)
%>
<UL>
<%
Dowhilenotrscatalog.eof
%>
<li><%=rscatalog ("Name")%>
<%
Loop
%>
</UL>
<%
Rscatalog.close
Endfunction
%>

Now let's look at how to show all the classifications under the FID. This requires a recursive algorithm. All we need to do is simply call all IDs in the GetChildren function: GetChildren (Oconn,catalog ("ID") is OK.
<%
Remoconn---Database connection, already open
Remfid-----The number of the current category

Functiongetchildren (Oconn,fid)
Strsql= "Selectnamefromcatalogwherefatherid=" &fid
Setrscatalog=oconn.execute (strSQL)
%>
<UL>
<%
Dowhilenotrscatalog.eof
%>
<li><%=rscatalog ("Name")%><%=getchildren (Oconn,catalog ("ID"))%>
<%
Loop
%>
</UL>
<%
Rscatalog.close
Endfunction
%>

The modified GetChildren can complete the task of displaying all subcategories of the FID classification. To show all the categories, just call it:
<%
remstrconn--the string to connect to the database, modify it as appropriate

Setoconn=server.createobject ("ADODB.") Connection ")
Oconn.openstrconn
GetChildren (oconn,-1)
Oconn.close
%>

How to find all products of a category;
Now to address the fourth question we raised earlier. The third question is left as an exercise. We assume that the data table for the product is defined as follows:

Createtableproduct (
[ID] [INT] Notnull,
[Name] [Nvchar] Notnull,
[Fatherid] [INT] Notnull
);

Where the ID is the product number, name is the product, and the Fatherid is the category to which the product belongs.

For the fourth question, it's easy to think of a way to find all the subclasses of this taxonomy, and then query all the products under all subclasses. Implementing this algorithm is actually very complex. The code is roughly as follows:

<%
Functiongetallid (Oconn,fid)
Dimstrtemp

Iffid=-1then
Strtemp= ""
Else
Strtemp= ","
endif

Strsql= "Selectnamefromcatalogwherefatherid=" &fid
Setrscatalog=oconn.execute (strSQL)
Dowhilenotrscatalog.eof
Strtemp=strtemp&rscatalog ("id") &getallid (Oconn,catalog ("id")) REM recursive call
Loop
Rscatalog.close

Getallid=strtemp

Endfunction

remstrconn--the string to connect to the database, modify it as appropriate

Setoconn=server.createobject ("ADODB.") Connection ")
Oconn.openstrconn

Fid=request.querystring ("FID")

Strsql= "Selecttop100*fromproductwherefatheridin (" &getallid (Oconn,fid) & ")"
Setrsproduct=oconn.execute (strSQL)
%>

<ul><%
Dowhilenotrsproduct.eof
%>
<li><%=rsproduct ("Name")%>
<%
Loop
%>
</UL>

<%rsproduct.close
Oconn.close
%>

This algorithm has many drawbacks. Try to list several of the following:

1, because we need to query the FID under all classifications, when the classification is very much, the algorithm will be very economic, and, because to construct a very large strsql, imagine if there are 1000 categories, this strSQL will be very large, whether the implementation is a problem.

2, we know that the efficiency of using in clauses in SQL is very low. This algorithm inevitably uses in clause, the efficiency is very low.

I find that more than 80% of programmers love such algorithms and use them heavily in many systems. Careful programmers will find that they write slow programs, but they can't find the cause. They repeatedly check the execution efficiency of SQL, improve the grade of the machine, but the efficiency of the increase is very small.

The most fundamental problem is the algorithm itself. The algorithm is fixed, the chance to be optimized is not much. We are going to introduce an algorithm that is more than 10 times times more efficient than the above algorithm.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.