Code optimization example: fractional decimal

Source: Internet
Author: User

Http://community.csdn.net/Expert/topic/5563/5563568.xml

Description:

Convert the score to a decimal number. I believe many people will. in the computer and can directly perform the score calculation, You need to convert the score to a floating point or double-precision number to calculate, but this will lead to inaccurate results, then, given a fraction of N/d, n is the numerator, and D is the denominator (n, D is an integer ), please provide a precise fractional calculation method and program it to find the exact fractional form of N/D. Of course, if this decimal is an infinite repeating decimal, enclose the circular part in parentheses, then, the loop part is omitted without writing. For example:
1/3 = 0. (3)
22/5 = 4.4
1/7 = 0. (142857)
2/2 = 1.0
3/8 = 0.375
45/56 = 0.803 (571428)

For example, calculate n/D = 1/1000000007

Analysis:

I think there is nothing to optimize, and the time ratio of output results is too high.

First, give a simple program:

# Include <stdio. h>
# Ifdef Win32
Typedef _ int64 Longlong;
# Else
Typedef long Longlong;
# Endif

Void output_intpart (INT intpart, int shift ){
Int I, V;
V = 1;
For (I = 0; I <shift; I ++) V * = 10;
Printf ("% d.", intpart/V );
Printf ("% 0 * D", shift, intpart );
}

Int main (INT argc, char * argv []) {
Int n = atoi (argv [1]);
Int d = atoi (argv [2]);
Longlong ln;
Int count2 = 0, count5 = 0;
Int shift, I;
Int intpart;
While (d % 2 = 0) {count2 ++; D/= 2 ;}
While (d % 5 = 0) {count5 ++; D/= 5 ;}
If (count2> count5 ){
Shift = count2;
Ln = N;
For (I = 0; I <count2-count5; I ++) ln * = 5;
} Else if (count2 <count5 ){
Shift = count5;
Ln = N;
For (I = 0; I <count5-count2; I ++) ln * = 2;
} Else {
Shift = count2;
Ln = N;
}
Intpart = ln/d;
N = ln % d;
Output_intpart (intpart, shift );
If (n> 0 ){
Int cur = N;
Printf ("(");
Do {
Int D;
Cur * = 10;
D = cur/d;
Printf ("% d", d );
Cur = cur % d;
} While (cur! = N );
Printf (")");
}
Printf ("/N ");
Return 0;
}

The input and output are not considered. We can see that:

The main part of the code is the loop after N> 0.
We know that Division efficiency is relatively low. One thing we can consider is to convert division into multiplication.
Because here d is a constant relative to this loop, this can be done, (refer to the http://blog.csdn.net/mathe/archive/2006/09/01/1153575.aspx)
However, the generated code still needs to be written using the assembly code. Otherwise, the efficiency will not work.

Another feasible method is that many similar functions are used in this loop.
10 * A = u * D + V (1 <= V <D)
.
If we break down all of A (1 <= A <D) in advance, we don't need to use multiplication or division (we only need to add or subtract operations ). But the problem is that for large D, we need a lot of memory space. For a large D, 2 GB memory address space is insufficient.

Another feasible method is
Cur * = 10
Change
Cur * = 10 ^ K (1 <= k <= 10)
In this way, we can get multiple digits instead of one result each time. (Of course, in each loop, we need to check each bit to determine whether a loop has occurred ). However, this method can increase the speed by up to 10 times.

 Medie2005 (Arnold): However, I have used a mathematical method to obtain the length of the circular section of a decimal number, so mathe said: "Of course, in each loop, we need to check each of them to determine whether a loop has occurred. "This is not required in my method. After testing, as mathe said, the efficiency can only be increased by about 10 times. As to whether the optimization can be performed, I lowered the standard and changed the original 10 seconds to 30 seconds. If someone reaches 30 seconds

It is not difficult to calculate the cycle length in advance, so we need to perform factor decomposition on integer d. Because the range of D is not large, we only need to calculate all prime numbers not greater than 2 ^ 16 in advance.
For example
D = 2 ^ A * 3 ^ B * 5 ^ C * P1 ^ D1 * P2 ^ D2 *... * PK ^ dk
Where P1, P2,..., PK is a prime number not less than 7.
If (n, d) = 1, N/d, the cycle length is (the result is irrelevant to N)
L (d) = 3 ^ s (B, 2) * (p1-1) P1 ^ (d1-1) * (p2-1) P2 ^ (d2-1 )*.... * (pk-1) PK ^ (dk-1)
Where S (B, 2) in B> = 2 is the B-2, or 0.
However, this l (d) is not necessarily the minimum cycle length of N/D, or it may be a factor of L (d.
As long as we find a number X to make 10 ^ x = 1 (mod D), then X is the cycle length of N/d.
Of course, it may not be important to calculate the minimum cyclic section. For example, it is acceptable to write the result of 0. (3) as 0. (33.
If we must calculate the minimum cyclic section, considering that the number of factors in L (d) is not too large) it is not difficult to determine whether 10 ^ x = 1 (mod D. Of course, there are still a lot of tips in this computing process.

After doing this, I don't think there are many other opportunities. The only thing that can be tested is that every step of Computing
After cur * = 10 ^ K
We need to calculate
Cur/= D;
This step may convert division into multiplication.
I will give a program for converting division into multiplication of the unsigned type:
# Include <stdio. h>
Unsigned int_inv (unsigned X ){
Unsigned long l;
Unsigned long M = 1ull <32;
Unsigned W;
Int bits = 0;
L = 1ull;
While (L <X) {L * = 2; bits ++ ;}
Do {
W = (L/x) * x + X-L;
If (L/W> = m) Return (unsigned) (L/x + 1 );
Bits ++; L <= 1;
} While (BITS <64 );
Return 0;
}

Int main (INT argc, char * argv []) {
Unsigned n = atoi (argv [1]);
Printf ("% u/N", int_inv (n ));
}
The above program will output a number M for any input n.
Therefore, any calculation of A/N can be converted to (A * m)> (bits-32). bits is the last bits used in int_inv.
However, this computation is still highly efficient with compilation.

I tried divid by constant optimization.
For the sake of simplicity, I want to change the algorithm so that the final loop is output in reverse order. In this way, we can design an algorithm so that the compiler can optimize it by itself,
First, we need a discrete reciprocal that can calculate any number about 10. This is simple:
Unsigned inv (unsigned A, unsigned B ){
Int S, T;
A = A % B;
If (A = 1) return 1;
S = inv (B, );
T = (S * b-1)/;
Return B-T;
}

Unsigned inv10 (unsigned p ){
Return inv (p, 10 );
}

And the code performance is not important, so I will not optimize it.
Then we can replace the code in if (n> 0):
Int u = inv10 (d );
Int W;
Int index [10];
Longlong cur = N;
For (W = 1; W <10; W ++ ){
Index [w] = (10-W) * u) % 10;
}
Printf ("(");
Do {
W = index [cur % 10];
Cur = (cur + W * D)/10;
// Printf ("% d", W );
} While (cur! = N );
Printf (") % d", W );

In this way, all Division operations have been divided by the constant 10.
Unfortunately, I found that the computing speed of this Code is almost the same as that of the original code.
After a brief analysis, we can see that the main reason is that cur is declared as long and 64 is an integer, which jumps out of the optimized range.
After changing the cur declaration to int (in this way, D = 100000007 cannot be used, and the multiplication must be out of bounds)
If the result is 50000017, only 0.7s is required (no result is output ). Compared with the original 4.6 s, it has increased by many times.
Therefore, this optimization is very effective, but it only requires a certain number range.

The above code can be slightly modified, so there will be no cross-border problems:
The Code is as follows: the only problem is that the data in the loop section is output upside down:
# Include <stdio. h>
# Ifdef Win32
Typedef _ int64 Longlong;
# Else
Typedef long Longlong;
# Endif

Unsigned inv (unsigned A, unsigned B ){
Int S, T;
A = A % B;
If (A = 1) return 1;
S = inv (B, );
T = (S * b-1)/;
Return B-T;
}

Unsigned inv10 (unsigned p ){
Return inv (p, 10 );
}

Void output_intpart (INT intpart, int shift ){
Int I, V;
V = 1;
For (I = 0; I <shift; I ++) V * = 10;
Printf ("% d.", intpart/V );
If (shift> 0 ){
Printf ("% 0 * D", shift, intpart );
}
}

Int main (INT argc, char * argv []) {
Int n = atoi (argv [1]);
Int d = atoi (argv [2]);
Longlong ln;
Int count2 = 0, count5 = 0;
Int shift, I;
Int intpart;
While (d % 2 = 0) {count2 ++; D/= 2 ;}
While (d % 5 = 0) {count5 ++; D/= 5 ;}
If (count2> count5 ){
Shift = count2;
Ln = N;
For (I = 0; I <count2-count5; I ++) ln * = 5;
} Else if (count2 <count5 ){
Shift = count5;
Ln = N;
For (I = 0; I <count5-count2; I ++) ln * = 2;
} Else {
Shift = count2;
Ln = N;
}
Intpart = ln/d;
N = ln % d;
Output_intpart (intpart, shift );
If (n> 0 ){
Int u = inv10 (d );
Int W;
Int index [10];
Int cur = N;
For (W = 1; W <10; W ++ ){
Index [w] = (10-W) * u) % 10;
}
Printf ("(");
Do {
W = index [cur % 10];
Cur = (cur + W * D)/10;
// Printf ("% d", W );
} While (cur! = N );
Printf (") % d", W );
}
Printf ("/N ");
Return 0;
}

Finally, we can solve the problem of file input and output:
By outputting file data in blocks, you can significantly speed up file writing:
For example:
# Define buffer_len (4096)
...
Char Buf [buffer_len];
Int used = 0;
File * fout = fopen ("out.txt", "WB ");
...
Do {
Int curm10 = cur % 10;
Int curd10 = cur/10;
W = index [curm10];
Cur = curd10 + dd10 * w + (curm10 + dm10 * w)/10;
Buf [used ++] = (char) ('0' + W );
If (used = buffer_len ){
Fwrite (BUF, buffer_len, 1, fout );
Used = 0;
}
// Printf ("% d", W );
} While (cur! = N );
If (used> 0 ){
Fwrite (BUF, used, 1, fout );
}
...
With this modification, the df3 version containing the input and output of the file ranges from
2m42s down to 39 S.
I found that on my computer, it took me the same time to simply copy the output result using the CP command. This shows that there is no optimization opportunity.

Now we can design an algorithm that can basically reach the limit speed and output results in normal order. The method is simple.
I) Calculate the cycle length in advance.
Ii) use the above algorithm to calculate each result in the loop section. The results are in reverse order. We can write each segment (for example, 4 K) of data into the array Buf in reverse order. Based on the length of the loop section, you can calculate the location where this set of data should be stored in the file and write the data to the file by calling fseek. Note that the initial offset of the written data file must be adjusted to a 4 K speed.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.