Optimization Program Performance (I)

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface

We know that polynomials are defined:

In ry, polynomials are the simplest smoothing curves. Simple means that it only consists of multiplication and addition. Smoothing is because it is similar to the smoothness in spoken language. In terms of mathematics, it is infinite and micro, that is, all its high subdifferential exists. In fact, the differentiation of polynomials is also a polynomial. The features of simplicity and smoothness make polynomials play a significant role in numerical analysis, graph theory, and computer graphics. Polynomial evaluation is the core technology to solve many problems. Taking numerical analysis as an example, polynomial functions are often used to evaluate the approximate values of trigonometric functions in a mathematical library.

Now, let's use the C language to write a function that calculates the polynomial.

Direct Algorithm

Calculate the value directly by using the cycle according to the polynomial definition:

double poly(double a[], double x){  double result = 0, p = 1;  for (int i = 0; i < N; i++, p *= x) result += a[i] * p;  return result;}

This algorithm requires 2N multiplication and N addition.

Horner's method)

We can reduce the number of multiplication by using the qinjiu algorithm or the algorithm called the Horner's method by most foreigners and some Chinese people:

The corresponding C-language program is as follows:

double polyh(double a[], double x){  double result = 0;  for (int i = N - 1; i >= 0; i--) result = result * x + a[i];  return result;}

This algorithm requires N multiplication and N addition operations. The number of multiplication is half of that of the original algorithm.

It seems that by using the qinjiu algorithm, we have greatly optimized the program performance.

Let's talk about the facts. Let's do some tests.

Test procedure

The following is the test program poly. c that compares the performance of the two algorithms:

#include <stdio.h>#include <time.h>#define N 23456789void initialize(double a[]){  for (int i = 0; i < N; i++) a[i] = i - 12345678.9012345;}double poly(double a[], double x){  double result = 0, p = 1;  for (int i = 0; i < N; i++, p *= x) result += a[i] * p;  return result;}double polyh(double a[], double x){  double result = 0;  for (int i = N - 1; i >= 0; i--) result = result * x + a[i];  return result;}int main(int argc, char *argv[]){  static double a[N];  initialize(a);  double result = 0, (*func)(double*, double) = (argc > 1) ? polyh : poly;  clock_t elapsed = clock();  for (int i = 0; i < 1234; i++) result += func(a, 2.34 / (i - 1234567) - 1);  elapsed = clock() - elapsed;  printf("%p %g %11f\n", func, result, (double)elapsed / CLOCKS_PER_SEC);}

This test program compares the advantages and disadvantages of the two algorithms by using different independent variable values for a more than 0.2 million-item polynomial over one thousand times. Parameters in the test program are carefully selected and will not cause floating point overflow during the evaluation process. In C, each double-precision floating point occupies eight bytes, so this test program requires about one hundred and eighty MB of memory space to run.

Test Results

We tested both the 32-bit Windows operating system and the 64-bit Linux operating system. The results are as follows:

D: \ work>Cl/O2 poly. cppMicrosoft (R) 32-bit C/C ++ optimized compiler 16.00.40219.01 for 80x86 is copyrighted (C) Microsoft Corporation. All rights reserved. Poly. cppMicrosoft (R) Incremental Linker Version 10.00.40219.01Copyright (C) Microsoft Corporation. All rights reserved./out: poly.exe poly. objD: \ work>Poly00871000 1.4271e + 029 107.531000D: \ work>Poly h00871080 1.4271e + 029 127.686000D: \ work>Poly h00871080 1.4271e + 029 127.686000D: \ work>Poly00871000 1.4271e + 029 106.861_d: \ work>Poly00871000 1.4271e + 029 107.935000D: \ work>Poly h00871080 1.4271e + 029 128.662000

ben@vbox:~/work> gcc -std=c99 -O2 poly.cben@vbox:~/work> ./a.out0x400640 1.4271e+29  125.240000ben@vbox:~/work> ./a.out0x400640 1.4271e+29  124.820000ben@vbox:~/work> ./a.out h0x400680 1.4271e+29  160.890000ben@vbox:~/work> ./a.out h0x400680 1.4271e+29  162.210000ben@vbox:~/work> ./a.out0x400640 1.4271e+29  125.450000ben@vbox:~/work> ./a.out h0x400680 1.4271e+29  162.230000

In the preceding running result, the first column is the value of the func variable, indicating the pointer to the poly or polyh function. The second column is the sum of the polynomial result. The third column shows the running time in seconds. To sum up, see the following table:

Operating System	Windows (32-bit)		Linux (64-bit)
Algorithm Used	Direct	Qin jiuyu	Direct	Qin jiuyu
Function address	00871000	00871080	Zero x 400640	Zero x 400680
Calculation Result	1.4271e + 029	1.4271e + 029	1.4271e + 29	1.4271e + 29
1	107.531	127.686	125.24	160.89
2	106.860	127.686	124.82	162.21
3	107.935	128.662	125.45	162.23
Average (SEC)	107.442	128.011	125.17	161.78

Test Description

Is it very unexpected? Whether in a 32-bit Windows operating system or a 64-bit Linux operating system, the results of Multiple tests show that the qinjiu algorithm is slower than the original algorithm, the time required to run the same algorithm multiple times in the same operating system is also very close. Since the Linux operating system we use is 64-bit, the same algorithm runs a little slower than the 32-bit Windows operating system. In all the test results, no matter which algorithm is used or in different operating systems, the sum of the result of Polynomial evaluation is the same, which is expected, if they are different, the problem arises.

It is the memory and CPU usage of the test program. Because it is a dual-core CPU, the CPU usage during the test is about 50%:

The number of processes is 83, and the memory usage is 965 MB. After the test is completed, the number of processes is reduced by one and the memory is reduced to 795 MB:

Test the program running status in the Linux operating system. Because VirtualBox only assigns one CPU to the Linux operating system, the CPU usage is close to 179 during the test, and the memory usage is about MB, as shown in:

Test Environment

This test is run on the Dell trend 1520, which has only one CPU and is a dual-core Intel Core2 Duo. Install the Windows Vista Hoem Premium SP2 (32-bit) operating system.

OpenSuSE 12.1 (64-bit) operating system running on Oracle VM VirtualBox:

Conclusion

Through the above discussion, it shows that the performance may not be improved by minimizing the number of operations in a computing. As for the cause of this situation, please wait for the next article.

References

Wikipedia: Polynomial
Wikipedia: Horner's method
Wikipedia: qinjiu Algorithm
C ++ Reference: clock

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Optimization Program Performance (I)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support