In "multiply, divide the article" mentioned a kind of special multiplication--square. The reason for this is that the two multiplied numbers are the same, and when each bit is traversed, you actually know the two-digit information.
In fact, the complexity of the need O (n^2) is because of the need to embed two layers of the loop to come out two of the total information, so if you need to traverse only once to get the information, then the processing since the square can reach the O (N^2/4) level.
Although there is no qualitative leap in the level, but the greater the N, the linear reduction in computational capacity and improve the performance is considerable, after all, no silicon step has been thousands of miles.
Next we discuss the implementation of the idea:
1, the super-long integer An (n for integer length) divided into two segments l1,l2, then L1=L2=N/2 (non-divisible when the choice of l1=l2+1 or l2=l1+1 any) (Example: L1:0...N/2; l2:n/2...n)
2, respectively, each of the respective sum of squares, each of the results are deposited in li ' = L (i+i), where i ' = I+i
3, perform a L1 and L2 each bit multiplication, the result of each operation is enlarged twice times, the corresponding result location LK = Li + Lj, where k = I+j
Code View:
/* Self-squared operation */int mulhbint (Hbigint *product, Hbigint *bia) {hbigint BiT; The result of the calculation is saved in the temporary variable register un_short *pworda = bia->pbigint; Register Un_short *PPRD = NULL; Long result = 0, i = 0, j = 0, index=0; Initializes a temporary large integer variable if (result = Inithbint (&bit,bia->length<<1)) = return_ok_bint) RETURN result; Bit.length = bia->length << 1; Bit.sign = 1; PPrd = Bit.pbigint; index = bia->length >> 1; for (i=0; i<bit.length; ++i) {Pprd[i+i] = Pworda[i]*pworda[i]; } for (i=0, i<index; ++i) {for (j=index; j<bia->length; ++j) {Pprd[i+j] + = ((pworda[i ] * Pworda[j]) << 1); If exceeding the unit length can represent the maximum value (in this case, 2<<16), one time format is processed if (Pprd[i+j] >> bit_pre_word) formathbint (&bit); }} trimhbint (&bit);//Remove high-level invalid 0 extendhbint (product,bit.length); Assignhbint (Product,&bit); Deletehbint(&bit); Clear temporary variable return return_ok_bint; }
Prospect:
In the program in fact the complexity of O (N+N^2/4), compared to the pure square level of complexity is limited, but for the implementation of the parallel algorithm provides a way of thinking, because it can do data decomposition, decomposition granularity can be customized according to the configuration of the processor. Following the author will be all the algorithm (super-long integer part) of the parallel transformation, on the one hand to enhance their ability, but also please correct me. After all, everyone is good, is really good!
Implementation of the basic arithmetic algorithm of super-long integers from square chapter