Big integer algorithm [13] single-digit multiplication, integer multiplication

Last Update:2015-05-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

★Introduction

Recently, wxWidgets has been involved, and procrastination has been interrupted for a long time. This time we will talk about single-digit multiplication. We will talk about Comba and Karatsuba multiplication. These two algorithms are suitable for processing large integers, but for a large integer and a single-precision number, the effect is poor, because the amount of computing is too large. In fact, single-digit multiplication is only a special case of baseline multiplication, and there is no nested cyclic carry. Therefore, we can reduce the computational workload through optimization. In addition, unlike full multiplication, single-digit multiplication does not require any temporary variable storage and memory allocation (except for the increase in target precision ).

★Algorithm Concept

The single-digit multiplication is similar to the calculation of 1234567890*8. The second number has only one digit. In a large integer, it is a single-precision variable and only needs to execute O (n) single-precision Multiplication can complete the main calculation. After each single-precision multiplication is completed, carry transfer is executed. The specific implementation ideas are as follows:

Calculate z = x * y, where z and x are bignum and y are unsigned Single-precision numbers.

1. olduse = z-> used records the number of digits used by the current z, which is used to assist in High-position clearing.

2. Increase the target precision by 1: z-> used = x-> used + 1

3. z-> sign = x-> sign (Because y is the unsigned number, the result symbol is only related to x)

4. initialize carry: u = 0

5. Perform a loop between I from 0 to x-> used-1:

5.1 r = u + x (I) * y // r is a double-precision variable

5.2 z (I) = r & BN_MASK0 // take the lower half part as the standard (in 32 bits, BN_MASK0 = 0 xFFFFFFFF, similarly to other situations, it is 2 ^ n-1)

5.3 u = r> biL // take the high half part as the carry

6. z (x-> used) = u // pass the last carry

7. Reset the remaining high positions

8. Compress excess bits

Here, y is the unsigned number. If you want to calculate the product of-y, you can reverse the z symbol after calculation.

★Implementation

Like the Comba multiplication, the overall implementation is provided first. The details are described later. Considering the computing efficiency and portability, the key code of the fifth cycle is written in the macro definition, and the multiplier is expanded according to step 1, 4, 8, 16, and 32 to reduce the overhead of loop control.

int bn_mul_word(bignum *z, const bignum *x, const bn_digit y){    int ret;    size_t i, olduse;    bn_digit u, *px, *pz;    olduse = z->used;    z->sign = x->sign;    BN_CHECK(bn_grow(z, x->used + 1));    u = 0;    px = x->dp;    pz = z->dp;    for(i = x->used; i >= 32; i -= 32)    {        MULADDC_WORD_INIT        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_STOP    }    for(; i >= 16; i -= 16)    {        MULADDC_WORD_INIT        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_STOP    }    for(; i >= 8; i -= 8)    {        MULADDC_WORD_INIT        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_STOP    }    for(; i >= 4; i -= 4)    {        MULADDC_WORD_INIT        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_CORE    MULADDC_WORD_CORE        MULADDC_WORD_STOP    }    for(; i > 0; i--)    {        MULADDC_WORD_INIT        MULADDC_WORD_CORE        MULADDC_WORD_STOP    }    *pz++ = u;    for(i = x->used + 1; i < olduse; i++)        *pz++ = 0;    z->used = x->used + 1;    bn_clamp(z);clean:    return ret;}

The above is the overall implementation of single-digit multiplication. The key is in macro definition. The following describes the implementation methods in different environments.

★Single and Double Precision Variables

In this case, bn_digit and bn_udbl are defined at the same time, which is the easiest to implement. The three macros are defined as follows:

#define MULADDC_WORD_INIT                          \{                                                  \    bn_udbl r;                                     \#define MULADDC_WORD_CORE                          \                                               \    r = u + (bn_udbl)(*px++) * y;                  \    *pz++ = (bn_digit)r;                           \    u = (bn_digit)(r >> biL);                      \#define MULADDC_WORD_STOP                          \}

This situation is completely implemented based on the train of thought, and the specific principles are not described in detail.

★Only single-precision Variables

In this case, bn_udbl is not defined. Single-precision multiplication needs to be converted into four semi-precision multiplication for calculation, which is relatively complicated. The specific implementation principle is similar to the multiplier of Comba, refer to here: http://www.cnblogs.com/starrybird/p/4441022.html. The specific macro implementation is as follows:

#define MULADDC_WORD_INIT                           \{                                                   \    bn_digit a0, a1, b0, b1;                        \    bn_digit t0, t1, r0, r1;                        \#define MULADDC_WORD_CORE                           \                                                \    a0 = (*px << biLH) >> biLH;                     \    b0 = (  y << biLH) >> biLH;                     \    a1 = *px++ >> biLH;                             \    b1 =  y    >> biLH;                             \    r0 = a0 * b0;                                   \    r1 = a1 * b1;                                   \    t0 = a1 * b0;                                   \    t1 = a0 * b1;                                   \    r1 += (t0 >> biLH);                             \    r1 += (t1 >> biLH);                             \    t0 <<= biLH;                                    \    t1 <<= biLH;                                    \    r0 += t0;                                       \    r1 += (r0 < t0);                                \    r0 += t1;                                       \    r1 += (r0 < t1);                                \    r0 += u;                                        \    r1 += (r0 < u);                                 \    *pz++ = r0;                                     \    u = r1;                                         \#define MULADDC_WORD_STOP                           \}

★Use of inline assembly

The inline assembly details of C are not much to be said, and you won't be able to skip it if you do.

VC x86:

#define MULADDC_WORD_INIT                           \{                                                   \    __asm   mov   esi, px                           \    __asm   mov   edi, pz                           \    __asm   mov   ecx, u                            \#define MULADDC_WORD_CORE                           \                                                \    __asm   lodsd                                   \    __asm   mul   y                                 \    __asm   add   eax, ecx                          \    __asm   adc   edx, 0                            \    __asm   mov   ecx, edx                          \    __asm   stosd                                   \#define MULADDC_WORD_STOP                           \                                                \    __asm   mov   px,  esi                          \    __asm   mov   pz,  edi                          \    __asm   mov   u,   ecx                          \}#endif

GCC x86:

#define MULADDC_WORD_INIT                           \{                                                   \    asm                                             \    (                                               \       "movl %3, %%esi       \n\t"                  \       "movl %4, %%edi       \n\t"                  \       "movl %5, %%ecx       \n\t"                  \#define MULADDC_WORD_CORE                           \                                                \       "lodsl                \n\t"                  \       "mull %6              \n\t"                  \       "addl %%ecx, %%eax    \n\t"                  \       "adcl $0, %%edx       \n\t"                  \       "movl %%edx, %%ecx    \n\t"                  \       "stosl                \n\t"                  \#define MULADDC_WORD_STOP                           \                                                    \       "movl %%esi, %0       \n\t"                  \       "movl %%edi, %1       \n\t"                  \       "movl %%ecx, %2       \n\t"                  \       :"=m"(px),"=m"(pz),"=m"(u)                   \       :"m"(px),"m"(pz),"m"(u),"m"(y)               \       :"%eax","%ecx","%edx","%esi","%edi"          \    );                                              \}

GCC x64:

#define MULADDC_WORD_INIT                           \{                                                   \    asm                                             \    (                                               \       "movq %3, %%rsi       \n\t"                  \       "movq %4, %%rdi       \n\t"                  \       "movq %5, %%rcx       \n\t"                  \#define MULADDC_WORD_CORE                           \                                                \       "lodsq                \n\t"                  \       "mulq %6              \n\t"                  \       "addq %%rcx, %%rax    \n\t"                  \       "adcq $0, %%rdx       \n\t"                  \       "movq %%rdx, %%rcx    \n\t"                  \       "stosq                \n\t"                  \#define MULADDC_WORD_STOP                           \                                                    \       "movq %%rsi, %0       \n\t"                  \       "movq %%rdi, %1       \n\t"                  \       "movq %%rcx, %2       \n\t"                  \       :"=m"(px),"=m"(pz),"=m"(u)                   \       :"m"(px),"m"(pz),"m"(u),"m"(y)               \       :"%rax","%rcx","%rdx","%rsi","%rdi"          \    );                                              \}

★Summary

The algorithm is also very simple, just follow the Baseline Multiplication method, and pay attention to the key points for optimization. Next we will talk about the calculation of Square.

[Back to the directory of this series]

Copyright Notice
Original blog, reproduced must contain this statement, to maintain the integrity of this article, and in the form of hyperlink to indicate the author Starrybird and the original address of this article: http://www.cnblogs.com/starrybird/p/4489859.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Big integer algorithm [13] single-digit multiplication, integer multiplication

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Big integer algorithm [13] single-digit multiplication, integer multiplication

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support