Troubleshooting data duplication problems caused by innerjoin paging

Source: Internet
Author: User
Tags repetition

August 9, 2016 The beautiful Tanabata morning, I came to the company refreshed. After a while, customer service house mm I, said a VIP store orders in the export report an order has duplicate lines. So I quickly started to find out the problem. After a day of careful follow-up (and of course the interruption of all sorts of things), the cause of the problem is finally discovered ...

  

There is an order for the main form, and an order commodity table order_item; Order and Order_item are a one-to-many relationship: one of the order Order_no will correspond to multiple items item_id, and the item_id of an item only corresponds to an order order_no. So where's the problem? The experienced classmate may have guessed what the reason is, but let's take a step at a pace:

First of all, it must be a recurrence problem. Fortunately, after repeated execution, as the merchant said, the corresponding product line of the order was repeated. So, how to troubleshoot it? It is obvious to print the information for this order in different parts of the application, to see where it begins to repeat itself, to trace the source from one step to the other, and to see that there is a repetition from the source where the order number is obtained. I began to think that the order itself has a special place, so let the merchant only export this order, but no repetition; This indicates that the order is likely to be the result of an export with multiple orders. I also suspect that the duplicate line of this order has a certain detail place is not the same as the repetition, but after doing the comparison found exactly the same;

now it seems a little confusing. So, I think I should print the corresponding SQL and the information of the order appearing at the same time. after repeatedly running multiple times, printing the relevant information and looking closely, it was found that the order appeared once in the following two queries, the first of the first SQL results, and the last of the second SQL result. There is a problem with the boundary.

SelectO.order_no fromorder O, Order_item iwhereO.order_no=I.order_no and' O '. ' shop_id '= 14044068  and' O '. ' Book_time '>= 1470672001  and' O '. ' Book_time ' <= 1470705254  and' I '. ' goods_id '= 290864764 Order  byI.order_nodescLimit1400, -;SelectO.order_no fromorder O, Order_item iwhereO.order_no=I.order_no and' O '. ' shop_id '= 14044068  and' O '. ' Book_time ' >= 1470672001  and' O '. ' Book_time ' <= 1470705254  and' I '. ' goods_id '= 290864764 Order  byI.order_nodescLimit1350, -;

  

  I finally realized that it was probably a problem caused by Join paging. Is there a subtle Bug in code paging? Read the code, did not see the problem, the question came again: why the order alone will be repeated, the others do not repeat it? Why doesn't the search page repeat, and the export report repeats? As a matter of case, the order number is not special, then the other order number can also be duplicated, and the search and export common interface logic, repeated checks several limit X, 50, did not find the duplicate of the other order number; So, the focus is back to the two SQL.

What's the mystery? I even suspect that there is a gap between the primary key ID value, print out the ID value and see no special rules, quickly denied the idea. So I print out these two SQL results directly in the DBA interface system, looking at the repetitive orderno sort output on the interface, suddenly a flash of light, realized ...

I executed the following SQL, directly print out all the OrderNo under this query condition, and view the position of the duplicate order number.

SelectO.order_no fromorder O, Order_item iwhereO.order_no=I.order_no and' O '. ' shop_id '= 14044068  and' O '. ' Book_time ' >= 1470672001  and' O '. ' Book_time ' <= 1470705254  and' I '. ' goods_id '= 290864764 Order  byI.order_no;

  

Do you get it? This order number corresponds to two items, so it is found that the order number will appear two times, and these two times appear at the boundary of two queries. When the offset of 1399, detected once, at 1400 of the offset, detected once, therefore, this order number will be detected two times, and according to this order number to obtain additional information will be exported two times, the order export is each limit X, 50 will be incrementally exported to the report, Duplicate rows appear in the final report (which is returned to the front end in the search interface). The other order number is not problematic because the order number at the 50*x,50*x-1 is exactly one item, only once, thus avoiding duplicate lines. In fact, the probability of exporting a duplicate row of a report order is still relatively high, as long as in 50*x, 50*x-1 appear the order number is corresponding to multiple items, it is likely to have duplicate lines in the final report. This is still to see the Luck drop ~ ~

The cause of the problem is fixed, and the solution is simple. Change the Select O.order_no to select DISTINCT (O.ORDER_NO), and, of course, the corresponding count is also changed to count (Distinct (O.ORDER_NO)).

Experience Summary of Troubleshooting:

1. The focus of the troubleshooting issue is to find: location and cause;

2. Log is very important to play critical information log in critical path and critical state, better than ten dozen code analysis and definitely should be maybe. Critical paths will help you to eliminate errors, narrow down the problem, reduce the guesswork you need to make, and troubleshoot issues with certainty, and key information or keywords can quickly lock down the code location where the problem occurs.

3. Analyze the possible causes based on experience and then take technical measures to find the location and cause. The method of troubleshooting is usually : Reproduce the problem, repeat execution, static analysis, guess the reason, breakpoint operation;

4. Is the problem object specific? Probe individually for the object itself;

5. The borders must be valued, often in the most problematic places;

6. Let's go, sleep!

PS: Even the Tanabata ah ... Although there is no beauty to accompany, can not be such a joke ah ...

Troubleshooting data duplication problems caused by innerjoin paging

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.