Attilax Summary of the implementation of packet aggregation groupby for atitit data storage

Source: Internet
Author: User

Attilax Summary of the implementation of packet aggregation groupby for atitit data storage

1. Aggregation Operations 1

1.1. A, scalar aggregation Stream aggregation 1

1.2. b, hash aggregation 2

1.3. All optimal plan choices are based on existing statistics to evaluate 3

1.4. Reference 3

1. Aggregation Operations

aggregation is what we're writing . T-SQL statements are often encountered, let us analyze the characteristics of some commonly used aggregate operators and the items that can be optimized.

1.1. a, scalar aggregation Stream Aggregation

scalar aggregation is a common way of aggregating data, such as the following aggregate functions used in the statements we write: MAX (),MIN (), AVG (),COUNT (),SUM ()

The output of these data result items above is basically generated by the flow aggregation, and this operator is also known as: Scalar aggregation

in fact, Stream Aggregation The most common way for this algorithm is to group ( GROUP BY) calculates that the scalar calculations above also take advantage of this feature, except that the whole is formed into a large group to be aggregated.

I understand it through the following code

Clear the current aggredate results

Clear The current group by columns for Each input row

Begin

If the input row does not match the current group by columns

Begin

output The current aggreagate results (if any )

Clear the current aggreagate results

Set the current group by columns to the input row

End

Update the aggregate results with the input row

End

The flow aggregation operator is a simple process that maintains an aggregation group and aggregated values, scans the data in the table sequentially, ignores if the aggregation group can not match, and if so, joins the aggregation group and updates the aggregated value result item.

As an example,

SELECT Shipaddress,shipcity,count (*)

From Orders

GROUP by Shipaddress,shipcity

So here we have summed up an optimization method for flow aggregation: avoid sorting as much as possible, but avoiding sorting requires grouping (groupby) fields within the index coverage.

1.2. b, hash aggregation

The method of flow aggregation above needs to be ordered in advance, we know that sorting is a very large consumption process, so it is not suitable for large table grouping aggregation operation, in order to solve this problem, another kind of aggregation operation is introduced: hash aggregation

The so-called hash aggregation internal method is the same as the hash connection mechanism mentioned earlier in this article.

Hash aggregation does not require sorting and excessive memory consumption, and it is easy to execute the plan in parallel, leveraging multiple CPU synchronization, but one drawback is that the process is blocked, and that the hash aggregation does not produce any results until the full input.

Therefore, it is a good application scenario to use hash aggregation in large data tables.

We know that sequencing is a very resource-intensive thing, but using hash matching requires only a different column value to be extracted, so there is no doubt that the hash matching algorithm is the notch above algorithm here compared to performance.

1.3. all optimal planning choices are based on existing statistical information to assess

and the above about these two columns of content distribution type How does SQL Server know? This is where the powerful statistics of SQL Server are supported.

in the SQL Server is not a fixed statement that will form a specific plan, and the resulting specific plan is not always optimal, which is related to the content distribution, the amount of data, the type of data in the database's existing data table, and the statistical information that records these details.

all optimal planning choices are based on existing statistical information to assess , if our statistical information is not updated in a timely manner, then the optimal execution plan will not be the best, and sometimes the worst.

1.4. References

SQL Server Tuning Series Basics (Summary of common operators-three ways to analyze physical connections) - finger tips - Blog Park . html

author ::  Nickname : Old Wow's paw claw of Eagle idol wreck iconoclast image-smasher

Bird catcherWang"Bird Catcher  King of kings of Kings devout pious religiousFaithDefenders DefenderOf the Faith. Kara CarlaRed Cloak CaracallaRed Cloak

abbreviation: : Emir Attilax Akbar Emil Attilax Akbar

Full Name::EmirAttilax AkbarbinMahmud bin Attila binSolomon Al Rapanui

Emil Attilax Akbar this Mahmoud this Attila Solomon    al Rapa Nui  

common name : Span style= "font-family: Song body; Color: #333333; font-size:12pt "> Etila ( Ayron"      Email:[email protected]

reprint Please indicate source: attilax 's Column http://www.cnblogs.com/attilax/

--atiend

Attilax Summary of the implementation of packet aggregation groupby for atitit data storage

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.