Actual combat: How about the second billion data?

Source: Internet
Author: User
actual combat: How the billions of data in seconds.

Author: Hangwei
Original: http://www.cnblogs.com/hangwei/p/4399238.html


Data background


First of all, the project is Siemens China in the implementation of the deployment of the MES project, because the project is operating on the production line (3 years+), data accumulation is very large. In the project database, about billions of data table has more than 5, TENS data table more than 10, millions data table, a lot of ...


(historical issues, the implementation of unmanned supervision, unmanned monitoring of the database this performance problem.) PS: I just got into the job soon ...) Not much to say, directly affixed to Siemens China's developers in our development of the SSRS report in the SQL statement:


SELECT distinct B.materialid as matl_def_id, c.descript, case if right (B.mesorderid, a) < ' 001000000000 ' then right ( B.mesorderid, 9)
else right (B.mesorderid,) end as pom_order_id, A.lotname, a.sourcelotname as Comlot,
E.defid as Commaterials, e.descript as Commatdes, D.vendorid, D.datecode,d.snnote, B.onplantid,a.sncust
From
(
Select M.lotname, M.sourcelotname, M.opetypeid, m.operationdate,n.sncust from View1 m
Left join Co_sn_link_customer as N on N.snmes=m.lotname
where
(M.lotname in (select Val from fn_string_to_table (@sn, ', ', 1)) or (@sn) = ") and
(M.sourcelotname in (select Val from fn_string_to_table (@BatchID, ', ', 1)) or (@BatchID) = ')
and (N.sncust like '% ' + @SN_ext + '% ' or (@SN_ext) = ')
) A
Left Join
(
SELECT * from Table1 where sntype = ' INTSN '
and snrulename = ' productsnrule '
and onplantid= @OnPlant
) b on b.sn = A.lotname
INNER JOIN Mmdefinitions as C on c.defid = B.materialid
Left join Table1 as D on d.sn = A.sourcelotname
INNER JOIN Mmdefinitions as E on e.defid = D.materialid
Where NOT EXISTS (
Select distinct Lotname, sourcelotname from Elcv_assemble_ops
where lotname = a.sourcelotname and sourcelotname = A.lotname
)
and (D.datecode in (select Val from fn_string_to_table (@DCode, ', ', 1)) or (@DCode) = ")
and (D.snnote like '% ' + @SNNote + '% ' or (@SNNote) = ')
and (case is right (B.mesorderid, 9) < ' 001000000000 ' then right (B.mesorderid,)
else Right (B.mesorderid, +) end) in (select Val from fn_string_to_table (@order_id, ', ', 1)) or (@order_id) = ')
and (E.defid in (select Val from fn_string_to_table (@comdef, ', ', 1)) or (@comdef) = ")


--view1 is a nested two-layer view (for confidentiality, the actual name may be different), there is an billions of data in the table and a few Tens data tables do left connection query
--table1 is a table with more than 15 million data records


This query statement, actually through my detection and investigation, the front-end of the B/s system can not find the results, half an hour, an hour .... Because I looked directly at SQL Query Analyzer, there was no result for half an hour.


(The reason is that in the face of an million-meter data sheet and 3 Tens data sheets to do a full-table scan query) not feeling, Siemens China's quality (or sense of responsibility) so.


The following is my analysis and take the detour (thinking misunderstanding), I hope you are also alert.


Exploration and misunderstanding


First, the index of the related table is not built, and the index is built.


Once the index is complete, the situation is still the same, and the query speed has barely improved. Later, I think of the related tens data above the table, have not established table partition. Then consider establishing a table partition and data replication scheme.


Here it is necessary to explain: our report is a dedicated database server, the data from the production line subscription. is often called "Read and write separation."


If you create a table partition directly on the original table, you will find that the thing that executes the table partition is directly deadlocked. The reason: Table partitioning operation itself will lock the table, the production line is still pushing data, so it is easy to "block", "deadlock".


I think the good plan is: Create a new table (empty table), build a table partition on the new table, and then copy the data.


I'm going to do this. Wait a minute. I seem to have entered a serious misunderstanding.


Analysis: The original SQL statement and business requirements, is the production line of data to do product and serial number of the trace, the key is that there is no regular "conditions" in the query conditions (such as date, number), rushed to do the table partition, here almost meaningless. Instead, it degrades query performance.


Good risk. The first step is to do a SQL statement analysis.


I. Analysis of the original SQL statements


1. The Where condition of the query statement, with a large number of fragments of @var in ... or (@var = ")

2. Where condition has like '% ' + @var + '% '

3. Where condition has case ... End function

4. Multiple connections to the same table query, and the use of their own nested view chart, is not necessary, whether or not to replace.

5. The SQL statement has an * number, and the view also has an * number appearing


Two. Optimized design


The first is to rewrite with stored procedures, the advantage is flexible design.


The core idea is to get a temporary table with one or more query conditions (at least one for the query criteria), and to update the temporary table for each query condition if the collection is found, and then simply determine if the temporary table has a value. And so on, you can create multiple temporary tables to summarize the query criteria.



  


There are at least two advantages to doing this:


1. Omit the judgment of the variable = @var or (@var = ");

2. Discard SQL stitching to improve code readability.


Then there is the writing of stored procedures, the process to note:


1. Try to find a way to use temporary table scan instead of full table scan;

2. Discard in and not statements, using exists and not exists substitution;

3. And the customer to confirm that the fuzzy query is necessary, if not necessary, remove the like statement;

4. Pay attention to the establishment of appropriate, consistent with the scene of the index;

5. Step on the "*" number;

6. Avoid functional operation of the field in the Where condition;

7. For reports with low real-time requirements, allow dirty reads (with (NOLOCK)).


Three. Stored Procedures


If you want to refer to the details of the optimized design fragment, see the SQL code:


Because the code section is too long, the length of this article is limited, please go to http://www.cnblogs.com/hangwei/p/4399238.html view


While sacrificing the readability of the code, it creates a performance value. My level is limited, also please advise.


Finally, after you replace the SSRS report with this stored procedure, SQL Query Analyzer is second-check. b/S front end spents 1-2 seconds.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.