SQL story Excerpt (iv) Preliminary ———— information mining

Source: Internet
Author: User
Tags filter date final insert interbase join joins query
The original idea of writing this book comes from the communication and learning with colleagues when working in the first company. But the trigger for releasing the book was a post on a report on the latest information on the CSDN. The questions in the post can be done in both subqueries and joins. Because of the conditions, I can not answer the details, the post of the friend can not understand my intention, let me regret. So I decided to put the idea of writing a book into action, and put this book on the CSDN, communicate with you and make progress together. Today, just to see the problem similar to the day, heart feeling, decided to discuss it in detail here.
In practice, we sometimes need to set up a data table to store the data that is changed, and to count the information we need. One of the problems is that the filter condition of the final result is from the data after the packet statistics. This kind of application is common in website database, accounting system, real time system, data warehouse and data mining. In fact, the proposition itself has included data mining. Now, let's look at the following example

Case 4-4-1: Latest Quote

Netizen Kikilyq asks:

I have a table:computer_price, the format is as follows:

Goods Price Dates

--------------------------------

HP Computer 20000 5.21

HP Computer 20050 5.23

NEC Computer 31200 5.3

NEC Computer 32000 5.5





Query results requirements: To find out the latest price of each computer;

The results of the table above are:

Goods Price Dates

---------------------------------

HP Computer 20050 5.23

NEC Computer 32000 5.5

Help, fix the problem?





According to the problem, we first set up a data table, after analysis, the table data should be marked by the name and date of the goods, so set the two fields as the main keyword:

The SQL Server version of the script is as follows

CREATE TABLE [dbo]. [Goods] (

[Goods] [Char] (a) COLLATE chinese_prc_ci_as not NULL,

[Price] [Money] Not NULL,

[Date_time] [DateTime] Not NULL,

PRIMARY KEY (goods, date_time)

) on [PRIMARY]

The InterBase version of the script is as follows

CREATE TABLE Goods

(

Goods CHAR (Ten) not NULL,

Price NUMERIC (4) is not NULL,

Date_time TIMESTAMP not NULL,

PRIMARY KEY (goods, date_time)

)

After creating the table, please insert the data yourself.

In this issue, the price of the computer in the final report is determined by the subsequent quote, which is the latest line of data for the quote date. Typically belong to the type mentioned earlier. Intuitively, I'm going to try to pick out the latest quote dates for each computer, which is simpler:

SELECT goods, MAX (date_time)

From goods

GROUP by goods

return results

Goods

---------- ------------------------------------------------------

HP 2002-05-23 00:00:00.000

NEC 2002-05-05 00:00:00.000

Obviously, as long as the various brands of computers on the above date of the price display, is the result we want. How about writing this straight?

SELECT goods, Price, MAX (date_time)

From goods

GROUP by goods

I think this statement does not have to try, a bit of experience programmers will find that the price column is not in the statistical function, nor the group by the column, the database system can not execute such statements. Such columns must be removed from another dataset, so the first thing I think of is a self-join. But I believe most friends will think of subqueries first. Now let's look at how the subquery is done, after all, it is more intuitive. The most lazy way is to directly express price as an inserted scalar quantum query:

SELECT L.goods,

(SELECT r.price from goods R WHERE r.goods = l.goods and R.date_time = MAX (l.date_time)) As Price,

MAX (l.date_time) as Current_date_time

From goods L

GROUP by L.goods

Unfortunately, this statement can only be executed in SQL Server, and the interbase prompt is invalid column reference (invalid column reference). But another way to write a generic version. Now, we say to the system, I'm going to take some rows out of the table, a line for each brand's computer, and the date is the latest quote date for this brand:

SELECT l.goods, L.price, L.date_time

From goods L

WHERE L.date_time =

(SELECT

MAX (R.date_time)

From goods R

WHERE r.goods = L.goods

GROUP by R.goods)

This one

I think it's possible to write a few different variants of the subquery, but it's not the same. This version appears to be somewhat unreliable because the main query has only date_time in the Where condition, and it does not seem to be able to accurately distinguish each row of data. But rest assured, here is a "weird" cross-reference, the main query's record to meet the date is equal to the return value of the subquery, and the subquery's cargo name (goods column) depends on the main query's cargo name (goods column). In this way, the subquery returns its correct date for the current brand, which is the stunt of the related subquery and causes it to be inefficient in many situations. My interest in subqueries ends here, as I said in the previous article, the join query is a good technique, so is it possible for this query to be implemented by joins? The previous analysis of subqueries here will help us to think. Now if we have two datasets, one has the biggest date, one has the price, put them in a joint, can not it? These two result sets are in the subquery above, the question now is how do we join them, obviously, there is a join condition is r.goods = L.goods, this also determines the final result set of one of the unique identification--goods column, and the filter conditions of the Date column copy L.date_ Time = MAX (r.date_time), plus the columns in the group by result set, then there is:

SELECT l.goods, L.price, L.date_time

From goods L

JOIN Goods R

On l.goods = R.goods

GROUP by L.price, L.goods, L.date_time

Having l.date_time = MAX (r.date_time)

Some friends may not understand this group by why, simply say. We made a goods of the table (two datasets are L and R respectively), a Cartesian product is generated. Some people call it the data explosion. But after the ore is blown up, it's convenient for us to find gold in it. With join conditions, each brand in the L data set corresponds to all of its quotation dates in the R dataset. Now we want to group the R dataset, select the latest quotation date for each brand, filter l data set should use

GROUP by R.date_time

Having l.date_time = MAX (r.date_time)

Since we're going to take three columns out of the L dataset, we're going to include them in group by, and because we have l.goods = R.goods this join condition, r.goods can not write in group by. Because of all the reasons I mentioned in the previous article, I personally prefer to use joins. Of course, in terms of efficiency, when there's a lot of data in each group, the efficiency of the join query is offset by a huge Cartesian product, where the subquery is used (if this is a table that records the real-time work of the production line, this is usually the case), and the data for each group in the table is very small, But a lot of data groups (such as large online bookstores, users may be an astronomical, relative to each user to buy the book is not much, the subquery will still be the main query for each row repeatedly traverse the entire table, the efficiency is too low, and the connection as long as the processing of the Cartesian product in advance, the cost of space in exchange for the advantages of time, The join query should then be used.

The return result sets for each of the above scenarios are:

Goods Price Date_time

---------- --------------------- ------------------------------------------------------

HP 20050.0000 2002-05-23 00:00:00.000

NEC 31000.0000 2002-05-5 00:00:00.000

Except as specified, the above scripts are available in Ms SQL Server7.0 above and InterBase6.0.1 versions. The method of designing the above script can be widely used in all kinds of SQL programming with statistic data as query condition. Starting with this, we can also discuss two interesting aspects: join queries and data groupings.

Attached: Originally this join query script should also be affixed to the kikilyq of the problem behind, but because I foolishly posted two should not post (a mistake, a paste heavy), CSDN Web page does not allow me to Kikilyq post followed. Please KIKILYQ come here and read it. I'm sorry.

About SQL Story: This book is intended to present some common problems in SQL programming solutions, summed up a number of relational database design and SQL language programming patterns, through practical problems to help readers improve database programming ability, this book will also dabble in the theoretical field of relational algebra. I plan to collect some typical SQL programming problems on the CSDN as an important part of this book. The authors ensure that no one's work is plagiarized and that all answers and analyses are by no means Isian. If there is a master to guide the maze, I will definitely in the text to point out and to thank. Welcome you to provide material, this first express gratitude. In addition, I would like to have an Oracle master to work with me to complete the Oracle part of this book. Share with me the joy of success and the hardships of labor.

The following is an outline of my preliminary "SQL Story", which means that the current examples of each article will be categorized in the appropriate place in the book, and may now have a change in the serial number arrangement.
Preface to the first chapter of basic principles one, before the code two, coding principles three, the SQL Script design principle Chapter Two, the truth of the relational database one, the set in the vector space Two, the information in the set three, set operation four, the real world The third chapter is not simple simple query one, the choice information two, the conditional filtration three, orderly and disorderly four, Information statistics and data mining the fourth chapter space "multiplication"--join query one, Cartesian product and information explosion two, higher dimensions of space three, join query four, join and where the fifth chapter of SQL "clause"--subquery, clause grammar two, cost evaluation three, flexible query sixth chapter of the collection of the written operation- Inert, delete and update one, design method two, delete three, insert four, rewrite the seventh Chapter Ultimate Weapon--cursor one, the cursor why two, cursors how to do three, the cursor stunt Four, the process of code eighth impossible task One, interesting problem collection two, infinite point set three, infinite information set four, Exploration Collection The Nineth chapter is not just sql--. Database management One, reasonable index two, persisted code--stored procedure, storage function and trigger three, view usage four, the reasonable way of data storage PostScript attached: alternative sound





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.