Big integer algorithm [13] single-digit multiplication, integer multiplication
★Introduction
Recently, wxWidgets has been involved, and procrastination has been interrupted for a long time. This time we will talk about single-digit multiplication. We will talk about Comba and Karatsuba multiplication. These two algorithms are suitable for processing large integers, but for a large integer and a single-precision number, the effect is poor, because the amount of computing is too large. In fact, single-digit multiplication is only a special case of baseline multiplication, and there is no nested cyclic carry. Therefore, we can reduce the computational workload through optimization. In addition, unlike full multiplication, single-digit multiplication does not require any temporary variable storage and memory allocation (except for the increase in target precision ).
★Algorithm Concept
The single-digit multiplication is similar to the calculation of 1234567890*8. The second number has only one digit. In a large integer, it is a single-precision variable and only needs to execute O (n) single-precision Multiplication can complete the main calculation. After each single-precision multiplication is completed, carry transfer is executed. The specific implementation ideas are as follows:
Calculate z = x * y, where z and x are bignum and y are unsigned Single-precision numbers.
1. olduse = z-> used records the number of digits used by the current z, which is used to assist in High-position clearing.
2. Increase the target precision by 1: z-> used = x-> used + 1
3. z-> sign = x-> sign (Because y is the unsigned number, the result symbol is only related to x)
4. initialize carry: u = 0
5. Perform a loop between I from 0 to x-> used-1:
5.1 r = u + x (I) * y // r is a double-precision variable
5.2 z (I) = r & BN_MASK0 // take the lower half part as the standard (in 32 bits, BN_MASK0 = 0 xFFFFFFFF, similarly to other situations, it is 2 ^ n-1)
5.3 u = r> biL // take the high half part as the carry
6. z (x-> used) = u // pass the last carry
7. Reset the remaining high positions
8. Compress excess bits
Here, y is the unsigned number. If you want to calculate the product of-y, you can reverse the z symbol after calculation.
★Implementation
Like the Comba multiplication, the overall implementation is provided first. The details are described later. Considering the computing efficiency and portability, the key code of the fifth cycle is written in the macro definition, and the multiplier is expanded according to step 1, 4, 8, 16, and 32 to reduce the overhead of loop control.
int bn_mul_word(bignum *z, const bignum *x, const bn_digit y){ int ret; size_t i, olduse; bn_digit u, *px, *pz; olduse = z->used; z->sign = x->sign; BN_CHECK(bn_grow(z, x->used + 1)); u = 0; px = x->dp; pz = z->dp; for(i = x->used; i >= 32; i -= 32) { MULADDC_WORD_INIT MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_STOP } for(; i >= 16; i -= 16) { MULADDC_WORD_INIT MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_STOP } for(; i >= 8; i -= 8) { MULADDC_WORD_INIT MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_STOP } for(; i >= 4; i -= 4) { MULADDC_WORD_INIT MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_CORE MULADDC_WORD_STOP } for(; i > 0; i--) { MULADDC_WORD_INIT MULADDC_WORD_CORE MULADDC_WORD_STOP } *pz++ = u; for(i = x->used + 1; i < olduse; i++) *pz++ = 0; z->used = x->used + 1; bn_clamp(z);clean: return ret;}
The above is the overall implementation of single-digit multiplication. The key is in macro definition. The following describes the implementation methods in different environments.
★Single and Double Precision Variables
In this case, bn_digit and bn_udbl are defined at the same time, which is the easiest to implement. The three macros are defined as follows:
#define MULADDC_WORD_INIT \{ \ bn_udbl r; \#define MULADDC_WORD_CORE \ \ r = u + (bn_udbl)(*px++) * y; \ *pz++ = (bn_digit)r; \ u = (bn_digit)(r >> biL); \#define MULADDC_WORD_STOP \}
This situation is completely implemented based on the train of thought, and the specific principles are not described in detail.
★Only single-precision Variables
In this case, bn_udbl is not defined. Single-precision multiplication needs to be converted into four semi-precision multiplication for calculation, which is relatively complicated. The specific implementation principle is similar to the multiplier of Comba, refer to here: http://www.cnblogs.com/starrybird/p/4441022.html. The specific macro implementation is as follows:
#define MULADDC_WORD_INIT \{ \ bn_digit a0, a1, b0, b1; \ bn_digit t0, t1, r0, r1; \#define MULADDC_WORD_CORE \ \ a0 = (*px << biLH) >> biLH; \ b0 = ( y << biLH) >> biLH; \ a1 = *px++ >> biLH; \ b1 = y >> biLH; \ r0 = a0 * b0; \ r1 = a1 * b1; \ t0 = a1 * b0; \ t1 = a0 * b1; \ r1 += (t0 >> biLH); \ r1 += (t1 >> biLH); \ t0 <<= biLH; \ t1 <<= biLH; \ r0 += t0; \ r1 += (r0 < t0); \ r0 += t1; \ r1 += (r0 < t1); \ r0 += u; \ r1 += (r0 < u); \ *pz++ = r0; \ u = r1; \#define MULADDC_WORD_STOP \}
★Use of inline assembly
The inline assembly details of C are not much to be said, and you won't be able to skip it if you do.
VC x86:
#define MULADDC_WORD_INIT \{ \ __asm mov esi, px \ __asm mov edi, pz \ __asm mov ecx, u \#define MULADDC_WORD_CORE \ \ __asm lodsd \ __asm mul y \ __asm add eax, ecx \ __asm adc edx, 0 \ __asm mov ecx, edx \ __asm stosd \#define MULADDC_WORD_STOP \ \ __asm mov px, esi \ __asm mov pz, edi \ __asm mov u, ecx \}#endif
GCC x86:
#define MULADDC_WORD_INIT \{ \ asm \ ( \ "movl %3, %%esi \n\t" \ "movl %4, %%edi \n\t" \ "movl %5, %%ecx \n\t" \#define MULADDC_WORD_CORE \ \ "lodsl \n\t" \ "mull %6 \n\t" \ "addl %%ecx, %%eax \n\t" \ "adcl $0, %%edx \n\t" \ "movl %%edx, %%ecx \n\t" \ "stosl \n\t" \#define MULADDC_WORD_STOP \ \ "movl %%esi, %0 \n\t" \ "movl %%edi, %1 \n\t" \ "movl %%ecx, %2 \n\t" \ :"=m"(px),"=m"(pz),"=m"(u) \ :"m"(px),"m"(pz),"m"(u),"m"(y) \ :"%eax","%ecx","%edx","%esi","%edi" \ ); \}
GCC x64:
#define MULADDC_WORD_INIT \{ \ asm \ ( \ "movq %3, %%rsi \n\t" \ "movq %4, %%rdi \n\t" \ "movq %5, %%rcx \n\t" \#define MULADDC_WORD_CORE \ \ "lodsq \n\t" \ "mulq %6 \n\t" \ "addq %%rcx, %%rax \n\t" \ "adcq $0, %%rdx \n\t" \ "movq %%rdx, %%rcx \n\t" \ "stosq \n\t" \#define MULADDC_WORD_STOP \ \ "movq %%rsi, %0 \n\t" \ "movq %%rdi, %1 \n\t" \ "movq %%rcx, %2 \n\t" \ :"=m"(px),"=m"(pz),"=m"(u) \ :"m"(px),"m"(pz),"m"(u),"m"(y) \ :"%rax","%rcx","%rdx","%rsi","%rdi" \ ); \}
★Summary
The algorithm is also very simple, just follow the Baseline Multiplication method, and pay attention to the key points for optimization. Next we will talk about the calculation of Square.
[Back to the directory of this series]
Copyright Notice
Original blog, reproduced must contain this statement, to maintain the integrity of this article, and in the form of hyperlink to indicate the author Starrybird and the original address of this article: http://www.cnblogs.com/starrybird/p/4489859.html