Optimization Program Performance (I)

Source: Internet
Author: User
Preface

We know that polynomials are defined:

In ry, polynomials are the simplest smoothing curves. Simple means that it only consists of multiplication and addition. Smoothing is because it is similar to the smoothness in spoken language. In terms of mathematics, it is infinite and micro, that is, all its high subdifferential exists. In fact, the differentiation of polynomials is also a polynomial. The features of simplicity and smoothness make polynomials play a significant role in numerical analysis, graph theory, and computer graphics. Polynomial evaluation is the core technology to solve many problems. Taking numerical analysis as an example, polynomial functions are often used to evaluate the approximate values of trigonometric functions in a mathematical library.

Now, let's use the C language to write a function that calculates the polynomial.

Direct Algorithm

Calculate the value directly by using the cycle according to the polynomial definition:

double poly(double a[], double x){  double result = 0, p = 1;  for (int i = 0; i < N; i++, p *= x) result += a[i] * p;  return result;}

This algorithm requires 2N multiplication and N addition.

Horner's method)

We can reduce the number of multiplication by using the qinjiu algorithm or the algorithm called the Horner's method by most foreigners and some Chinese people:

The corresponding C-language program is as follows:

double polyh(double a[], double x){  double result = 0;  for (int i = N - 1; i >= 0; i--) result = result * x + a[i];  return result;}

This algorithm requires N multiplication and N addition operations. The number of multiplication is half of that of the original algorithm.

It seems that by using the qinjiu algorithm, we have greatly optimized the program performance.

Let's talk about the facts. Let's do some tests.

Test procedure

The following is the test program poly. c that compares the performance of the two algorithms:

#include <stdio.h>#include <time.h>#define N 23456789void initialize(double a[]){  for (int i = 0; i < N; i++) a[i] = i - 12345678.9012345;}double poly(double a[], double x){  double result = 0, p = 1;  for (int i = 0; i < N; i++, p *= x) result += a[i] * p;  return result;}double polyh(double a[], double x){  double result = 0;  for (int i = N - 1; i >= 0; i--) result = result * x + a[i];  return result;}int main(int argc, char *argv[]){  static double a[N];  initialize(a);  double result = 0, (*func)(double*, double) = (argc > 1) ? polyh : poly;  clock_t elapsed = clock();  for (int i = 0; i < 1234; i++) result += func(a, 2.34 / (i - 1234567) - 1);  elapsed = clock() - elapsed;  printf("%p %g %11f\n", func, result, (double)elapsed / CLOCKS_PER_SEC);}

This test program compares the advantages and disadvantages of the two algorithms by using different independent variable values for a more than 0.2 million-item polynomial over one thousand times. Parameters in the test program are carefully selected and will not cause floating point overflow during the evaluation process. In C, each double-precision floating point occupies eight bytes, so this test program requires about one hundred and eighty MB of memory space to run.

Test Results

We tested both the 32-bit Windows operating system and the 64-bit Linux operating system. The results are as follows:

D: \ work>Cl/O2 poly. cppMicrosoft (R) 32-bit C/C ++ optimized compiler 16.00.40219.01 for 80x86 is copyrighted (C) Microsoft Corporation. All rights reserved. Poly. cppMicrosoft (R) Incremental Linker Version 10.00.40219.01Copyright (C) Microsoft Corporation. All rights reserved./out: poly.exe poly. objD: \ work>Poly00871000 1.4271e + 029 107.531000D: \ work>Poly h00871080 1.4271e + 029 127.686000D: \ work>Poly h00871080 1.4271e + 029 127.686000D: \ work>Poly00871000 1.4271e + 029 106.861_d: \ work>Poly00871000 1.4271e + 029 107.935000D: \ work>Poly h00871080 1.4271e + 029 128.662000
ben@vbox:~/work> gcc -std=c99 -O2 poly.cben@vbox:~/work> ./a.out0x400640 1.4271e+29  125.240000ben@vbox:~/work> ./a.out0x400640 1.4271e+29  124.820000ben@vbox:~/work> ./a.out h0x400680 1.4271e+29  160.890000ben@vbox:~/work> ./a.out h0x400680 1.4271e+29  162.210000ben@vbox:~/work> ./a.out0x400640 1.4271e+29  125.450000ben@vbox:~/work> ./a.out h0x400680 1.4271e+29  162.230000

In the preceding running result, the first column is the value of the func variable, indicating the pointer to the poly or polyh function. The second column is the sum of the polynomial result. The third column shows the running time in seconds. To sum up, see the following table:

Operating System Windows (32-bit) Linux (64-bit)
Algorithm Used Direct Qin jiuyu Direct Qin jiuyu
Function address 00871000 00871080 Zero x 400640 Zero x 400680
Calculation Result 1.4271e + 029 1.4271e + 029 1.4271e + 29 1.4271e + 29
1 107.531 127.686 125.24 160.89
2 106.860 127.686 124.82 162.21
3 107.935 128.662 125.45 162.23
Average (SEC) 107.442 128.011 125.17 161.78
Test Description

Is it very unexpected? Whether in a 32-bit Windows operating system or a 64-bit Linux operating system, the results of Multiple tests show that the qinjiu algorithm is slower than the original algorithm, the time required to run the same algorithm multiple times in the same operating system is also very close. Since the Linux operating system we use is 64-bit, the same algorithm runs a little slower than the 32-bit Windows operating system. In all the test results, no matter which algorithm is used or in different operating systems, the sum of the result of Polynomial evaluation is the same, which is expected, if they are different, the problem arises.

It is the memory and CPU usage of the test program. Because it is a dual-core CPU, the CPU usage during the test is about 50%:

The number of processes is 83, and the memory usage is 965 MB. After the test is completed, the number of processes is reduced by one and the memory is reduced to 795 MB:

Test the program running status in the Linux operating system. Because VirtualBox only assigns one CPU to the Linux operating system, the CPU usage is close to 179 during the test, and the memory usage is about MB, as shown in:

Test Environment

This test is run on the Dell trend 1520, which has only one CPU and is a dual-core Intel Core2 Duo. Install the Windows Vista Hoem Premium SP2 (32-bit) operating system.

OpenSuSE 12.1 (64-bit) operating system running on Oracle VM VirtualBox:

Conclusion

Through the above discussion, it shows that the performance may not be improved by minimizing the number of operations in a computing. As for the cause of this situation, please wait for the next article.

References
  1. Wikipedia: Polynomial
  2. Wikipedia: Horner's method
  3. Wikipedia: qinjiu Algorithm
  4. C ++ Reference: clock

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.