Generate a histogram-observe the histogram to help you understand the data mode

Last Update:2018-12-05 Source: Internet

Author: User

Tags integer division

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Itzik Ben-Gan

Similarly, with histograms, you can analyze the sampling of performance counters (such as CPU utilization and memory) set on the server in the network. For example, assume that you record the CPU usage of a network server every five minutes, and the value measured on a certain day changes between 21% and 100%. You can generate a histogram with four levels-20 to 40, 40 to 60, 60 to 80, and 80 to 100-to obtain the number of samples of each level. If your server is overloaded on that day, most samples will belong to the fourth level.

Generate performance counter Histogram

The following is a problem involving histograms. You can see if you can solve it before reading my solution. A scheduled SQL Server proxy job periodically records performance counters of a network server in a table named samples. Run the code shown in Listing 1 to create a samples table and populate it with data. After sampling (DT), each row of the table contains the measurement ID (measid), measurement value (value), and filtering columns (100 bytes) representing other columns in the table ). (For example, this table usually contains a serverid column, so that you can record measurement samples of multiple servers. In this example, the table contains only the data of one server .)

Assume that measid 1 is the percentage of CPU utilization, and measid 2 is the memory usage in MB. Your users need histograms to help them analyze performance data for a certain metric over a period of time. The user provides the following parameters: Level (@ numsteps), measurement ID (@ measid), and date range (@ fromdt-including the start date, @ Todt-excluding the end date ). Your task is to generate the histograms of these given parameters. Note that you do not need to include a level of 0 in the result. For example, assume that you provide the following parameters:

DECLARE @numsteps int, @measid  int, @fromdt datetime, @todt  datetimeSELECT @numsteps=5, @measid=1,  @fromdt='20030101', @todt='20030102'

In the samples table, the minimum measurement value of measid 1 is 26 and the maximum measurement value is 50 during the given period. In the @ numsteps parameter, the number of requests is 5. First, you need to calculate the lower limit and upper limit of each range within the five levels. Because the lower limit of the range is not included, the following range can be obtained after calculation: 25 to 30, 30 to 35, 35 to 40, 40 to 45, 45 to 50. Figure 1 shows the level and range of a given parameter.

You need to write code that tells you how many measurements each level has-in this example, your code should generate a measurement that matches Level 1 (26), two measurements that match level 2 (33, 35), and two measurements that match level 5 (47, 50 ). Figure 2 shows the output produced by using this set of specific parameters. The following are some solutions for this problem.

Solution 1: Use the level table

The first solution involves writing a query that generates the derived table Steps, which contains the level number and the value range of each level. After such a table is generated, the solution is simple. You must complete the following operations:ValueColumns join the derived table Steps to the Samples table, group the results based on the level number, and calculate the number of rows in each group. The most difficult part of this solution is to compile the query that generates the derived table. To generate a level number, you can use a secondary table named Nums, which has a range of 1 to <Max_number_of_possible_steps>. Run the script in Listing 2 to create a Nums auxiliary table and fill it with 1000 integers.

To calculate the lower limit and upper limit of each level range, You Need To concatenate two query results: one is for Nums query to return the level number, and the other is for Samples query, returns the minimum and maximum values. The generated query may be shown in listing 3. After running this query, you will notice in the results shown in Figure 3 that the query returns the same common minimum and maximum values for each level number. This is only the intermediate result of the solution. Later, you will use the regular minimum and maximum values with the level number to calculate the minimum and maximum values of the level.

Now, you need to replace the asterisks in the SELECT list with the returned expression. In addition, you need to replace the level number with the lower limit (not included) and upper limit (included) of the level range ). The level number is very simple, yesN.

The following code is used to construct an expression for calculating the lower limit:

mn + <step_size>*(n-1) - 1

Where,MnIs the standard minimum measurement value,NYes level number,Step_sizeIs the size of the coverage range of each level. The expression used to calculate step_size is as follows:

(mx-mn+1)/@numsteps

By combining these two expressions, you can obtain

mn + (mx-mn+1)/@numsteps*(n-1) - 1

If the obtained value range can always be divided by the number of levels, you can use the above expression, but this is not always the case in actual application. The following expression uses the numerical division and rounding methods, which is applicable to the value range that cannot be divisible by the number of levels:

mn + CAST(round(1.0*(mx-mn+1)/@numsteps*(n-1), 0) AS int) - 1

You can compare the first operand of the division operation with the value 1.0.1.0) To perform fractional Division (rather than integer division), which means that SQL server does not cut off the fractional part. Then, you need to round the result to get the upper and lower limits of the entire range. Then, SQL Server converts the rounded result to an integer to remove the meaningless zero after the decimal point. You can calculate the upper limit in a similar way:

mn + CAST(round(1.0*(mx-mn+1)/@numsteps*n, 0) AS int) - 1

The only difference between the above expression and the previous expression is that it converts the level size to the level number.(N)Multiply, insteadN-Multiply by 1. To test and calculate the complete query of the derived steps table, run the code in Listing 4. Note that the result is the same as Figure 1.

Please try the parameters in the code of this solution to test the impact of changing the number of levels on the results, and so on. Remember that the query returned to the level table is only part of the solution. Now, you can see the missing, expressed as <Query that calculates steps> Query section:

SELECT step, count(*) AS cntFROM Samples JOIN (<query that calculatessteps>) AS Steps  ON value > f AND value <= tWHERE measid = @measid  AND dt >= @fromdt AND dt < @todtGROUP BY step

Replace the missing part with the query shown in Listing 4. Listing 5 shows the complete solution, which generates the histogram shown in 2 that you need. For the request counters and time periods, among all the measurements divided into five levels, there is a measurement (26) within the range of the first level, there are two measurements (33, 35) belongs to the second level. Two measurements (47 and 50) belong to the fifth level.

If you like me to use modular development methods to simplify code and maintain it, You can compile a user-defined function (UDF ), this function takes the number of levels, measurement ID, and date range as parameters, and returns the steps table. Run the code shown in Listing 6 to create the fn_steps function.

To test the function, run the following code and verify that the result is 1:

SELECT * FROM fn_steps(1, 5, '20030101', '20030102')

Now, you can use the fn_steps table instead of the derived table steps, as shown in the code in listing 7. You will get the histogram 2.

Solution 2: Real-time computing level Number

There is another solution to this problem. Instead of generating a level table, this solution calculates the level number in real time. When using this solution, you do not need to use the level Number Auxiliary table. Note the FROM clause in listing 8 (page 18th. You will concatenate two derived tables-one table named S, containing rows from Samples that match the provided parameters, and the other table named R, contains the minimum measurement value and the size of the entire range. The following expression (written in the SELECT list) is used to calculate the level Number:

floor((value-mn) / (1.0*range/@numsteps)) + 1AS step

The logic used by this expression is similar to the logic you used to calculate the upper and lower limits in the previous solution. Expression(Value-mn)It can calculate the position in the range, and(1.0 * range/@ numsteps)The level can be calculated. Divide the second operand by the first operand and Add 1 to the result to obtain the level number. Since you have attached a level number (matching the condition from the Samples table) to each row, you can use this query in the derived table named RS, in this way, the results can be grouped by the calculated level number. Listing 8 shows the complete solution. In this solution, the external query groups the results by level number and calculates the number of rows for each level.

Performance

Compare the two solutions-one using the secondary table and modular method, and the other using the real-time computing level number-we can see that the code for the second solution is shorter and more concise. To find out which solution is better, you can use the script shown in listing 9 to enter 1 million rows into the Samples table for testing. These rows contain metric IDs ranging from 1 to 10. Each counter records one sample per minute (1440 samples per day) and the date range is from January 1, January 1-20, 2003 to January 1, March 11. I used a period of time ranging from one day to one month to compare the two solutions shown in listing 5 and listing 8. In my tests, both of these solutions are doing well, and the second solution is slightly better in terms of duration and I/O, so I prefer the second solution.

Error, comment, suggestion law | privacy | Advertisement

Go to the original English page

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More