Multi-precision PI computing Assembly implementation

Last Update:2018-12-05 Source: Internet

Author: User

Tags mul

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The PI program on vier gourdon is a good start for computing, piclassic. I downloaded the C program and compiled it with/O2. I found that it is still a little faster than I did without changing my mind. Of course it cannot be compared with the above program. It takes 770 ms, and I have rewritten it to calculate 10000 bits for 465 Ms. It is rewriting rather than rewriting. Visible/O2 is sometimes inferior to manual optimization.

When I write this program, its efficiency is not the main thing to learn. It is mainly to understand the idea of multi-precision computing and simulate big data with multiple bytes, of course, some three or two lines of code may also calculate pi
However, it is hard to understand this.
Data such as 3. 14159265... can be defined as follows:
X = x (0) + x (1)/B ^ 1 +... + x (n-1)/B ^ (n-1), base B (visible, x (0) is the only positive integer)
For example, when B = 10
3 1 4 1 5 9 2 6 5...
X (0) x (1) x (2) x (3) x (4) x (5) x (6) x (7) x (8)
When B = 100
3 14 15 92 65...
X (0) x (1) x (2) x (3) x (4 )...
...
When B = 10000
3 1415 9265...
X (0) x (1) x (2 ).....
It can be seen that when X (I) is of the DWORD type at the same time, for pi data storage, base 10000 requires less storage space than base 10.

There are many formulas for pi calculation. We choose a simple and relatively efficient guass formula:
PI/4 = 12 * arctan (1/18) + 8 * arctan (1/57)-5 * arctan (1/239) (Gauss)
More formulas can be referred to: http://en.wikipedia.org/wiki/Pi
For efficiency, the current Chudnovsky + FFT is invincible in PI computation. However, if there is no difficult mathematical skills, FFT cannot be thoroughly tested and understood, the efficient big data library usually involves the multiplication of large numbers in the Fast Fourier Transform (FFT. So we don't talk about efficiency. If it goes through simple optimization, it will still be very fast. For example, some Division operations are not required, but we will use it for Division learning, or use mpich2 to implement multi-core computing.
Oh, no more... You can't be impetuous .......

The following program calculates pi to 10000 digits after the decimal point (of course, more digits can be obtained after a slight modification ).
........................................ .............................

; **************************************** ********************
; * -- = -- * Simple ASM Program to compute PI with your digits.
; * -- = -- * By G-spider 2010.11.27
; * -- = -- * Web: http://blog.csdn.net/G_Spider
; * -- = --*---------------------------------------------------
; * -- = -- * Ml/C/coff pi. ASM
; * -- = -- * Link/subsystem: Console pi. OBJ
; * -- = -- * Please ensure the above integrity !!
; **************************************** ********************

. 386
. Model flat, stdcall
Option Casemap: None

Include windows. inc
Include user32.inc
Include kernel32.inc
Include msvcrt. inc

Includelib user32.lib
Includelib kernel32.lib
Includelib msvcrt. Lib
; **************************************** *******************
. Data
Num dd 0
Flag dd 0
FMT dB '%. 4D', 0
Fmt1 dB '% d.', 0dh, 0ah, 0
Fmt2 dB ': % d', 0dh, 0ah, 0
Fmt3 dB 0dh, 0ah, 0
Szpause dB 'pause', 0
; Nbdigits dd 10000
B dd 10000
LB dd 4
Maxdiv dd 450
_ SIZE dd 2502; 2 + nbdigits/lb (precise)
P dd 239
M dd 12, 8,-5

Dwlm dd 1000; 1000 in milliseconds, and 1000000 in microseconds
Fmt6 dB/
'================================================ =========== ', 0dh, 0ah ,/
'Simple ASM Program to compute PI with invalid digits. ', 0dh, 0ah ,/
'Pi/4 = 12 * arctan (1/18) + 8 * arctan (1/57)-5 * arctan (1/239) (Gauss )',/
0dh, 0ah, 'by G-spider @ 2010', 0dh, 0ah ,/
'Computation of 10000 digits of PI ', 0dh, 0ah ,/
'Total computation time :',/
'% 6lld Ms', 0dh, 0ah ,/
'================================================ ============ ', 0dh, 0ah, 0

. Data?
X dd?
Y dd?
_ Sizechar dd?
Lppi dd?
Lparctan dd?
Lpbuffer1 dd?
Lpbuffer2 dd?

;-------------------------------------
Dqtickcounter1 DQ?
Dqtickcounter2 DQ?
Dqfreq DQ?
Dqtime DQ?

; **************************************** *******************
. Code
;-----------------------------------------------------------
_ Settointeger proc N: DWORD, LPX: DWORD, integer: DWORD
MoV eax, 1
MoV edX, n
MoV ESI, LPX
@@:
MoV dword ptr [ESI + eax * 4], 0
INC eax
CMP eax, EDX
Jl @ B
MoV eax, integer
MoV [esi], eax
XOR eax, eax
RET
_ Settointeger endp
;-----------------------------------------------------------
_ Iszero proc N: DWORD, LPX: DWORD
If all values are 0, 1 is returned.
XOR eax, eax
MoV ESI, LPX
MoV ECx, n
@@:
MoV edX, [ESI + eax * 4]
Test edX, EDX
Jnz exit
INC eax
CMP eax, ECx
Jl @ B
MoV eax, 1
RET
Exit:
XOR eax, eax
RET

_ Iszero endp
;-----------------------------------------------------------
_ Add proc N: DWORD, LPX: DWORD, lpy: DWORD
; X + = y
MoV EDI, LPX
MoV ESI, lpy
XOR ECx, ECx; carry
MoV eax, n
Dec eax; n-1 subscript

@@:

MoV edX, [ESI + eax * 4]
Add edX, ECx
Add edX, [EDI + eax * 4]

CMP edX, B
Jl a111

MoV ECx, 1
Sub edX, B
MoV [EDI + eax * 4], EDX
Dec eax
JNS @ B; jump if not negative
A111:
XOR ECx, ECx
MoV [EDI + eax * 4], EDX
Dec eax
JNS @ B; jump if not negative
XOR eax, eax
RET

_ Add endp
;-----------------------------------------------------------
_ Sub proc N: DWORD, LPX: DWORD, lpy: DWORD
; X-= y
MoV EDI, LPX
MoV ESI, lpy
MoV eax, n

@@:

Dec Emax; n-1
MoV EBX, [EDI + eax * 4]
Sub EBX, [ESI + eax * 4]
MoV [EDI + eax * 4], EBX; X [I]-= Y [I]

Cmp ebx, 0; X [I] <0
Jge a000

Test eax, eax
JZ a001

Add EBX, B
MoV [EDI + eax * 4], EBX; X [I] + = B
Dec eax
Sub dword ptr [EDI + eax * 4], 1; X [I-1] --
INC eax
A000:
CMP eax, 0
JG @ B
A001:
XOR eax, eax
RET
_ Sub endp
;-----------------------------------------------------------
_ Mul proc N: DWORD, LPX: DWORD, Q: DWORD
; X * = Q
XOR ECx, ECx; carry = 0
MoV EDI, LPX
MoV eax, n

@@:
Dec Emax; n-1
MoV EBX, [EDI + eax * 4]
Imul EBX, Q; xi = xi * q
Push eax
Pop eax
Add EBX, ECx

Cmp ebx, B
Jl a000
;******************************

Push eax
MoV eax, EBX
CDQ
Idiv B; eax = eax/B = xi/B
MoV ECx, eax; ECx = carry
Imul eax, B; carry * B
Sub EBX, eax; XI-= (carry * B)
Pop eax
MoV [EDI + eax * 4], EBX
Test eax, eax
Jnz @ B
;******************************
JMP exit

A000:
MoV ECx, 0
MoV [EDI + eax * 4], EBX
Test eax, eax
Jnz @ B

Exit:
XOR eax, eax
RET

_ Mul endp
;-----------------------------------------------------------
_ Div proc N: DWORD, LPX: DWORD, D: DWORD, lpy: DWORD
; Y = x/d
XOR ECx, ECx; carry = 0
XOR eax, eax
MoV ESI, LPX
MoV EDI, lpy

@@:

MoV EBX, ECx
Imul EBX, B
Add EBX, [ESI + eax * 4]; EBX = x [I] + carry * B
Push eax ;*************

MoV eax, EBX
CDQ
Idiv D; eax = EBX/d
MoV edX, eax; save eax to edX vendors

Imul eax, D; eax = eax * d
MoV ECx, EBX
Sub ECx, eax; carry = xi-eax * d

Pop eax ;*************
MoV [EDI + eax * 4], EDX
INC eax
CMP eax, n
Jl @ B
XOR eax, eax
RET

_ Div endp

_ Arccot proc _ p: DWORD, N: DWORD, LPX: DWORD, buf1: DWORD, buf2: DWORD
Local @ sign, P2, K
MoV @ sign, 0
MoV eax, _ p
Imul eax, _ p
MoV P2, eax
MoV K, 3

Invoke _ settointeger, N, LPX, 0
Invoke _ settointeger, N, buf1, 1
Invoke _ Div, N, buf1, _ p, buf1
Invoke _ add, N, LPX, buf1

@@:

Invoke _ iszero, N, buf1
Test eax, eax
Jnz exit
MoV eax, maxdiv
. If P <eax
Invoke _ Div, N, buf1, P2, buf1
. Else
Invoke _ Div, N, buf1, _ p, buf1
Invoke _ Div, N, buf1, _ p, buf1
. Endif
Invoke _ Div, N, buf1, K, buf2
. If @ sign
Invoke _ add, N, LPX, buf2
. Else
Invoke _ Sub, N, LPX, buf2
. Endif
Add K, 2
Neg @ sign
Add @ sign, 1
JMP @ B

Exit:
XOR eax, eax
RET

_ Arccot endp

_ Print proc N: DWORD, LPX: DWORD
Local @ I
MoV ESI, LPX
Dec n
MoV @ I, 1
Invoke crt_printf, offset fmt1, dword ptr [esi]

@@:
MoV eax, @ I
Invoke crt_printf, offset FMT, dword ptr [ESI + eax * 4]
MoV eax, @ I
MoV ECx, 10
CDQ
Idiv ECx
Test edX, EDX
Jnz a000
MoV edX, @ I
SHL edX, 2
Invoke crt_printf, offset fmt2, EDX
A000:
Inc dword ptr @ I
MoV eax, @ I
CMP eax, n
Jl @ B

Invoke crt_printf, offset fmt3

XOR eax, eax
RET

_ Print endp

_ Pic1c proc
Local @ nbarctan, @ flag

MoV @ nbarctan, 3
MoV eax, _ size
SHL eax, 2
MoV _ sizechar, eax
;-----------------------------------
Invoke crt_malloc, _ sizechar
. If eax
MoV lppi, eax
. Else
Invoke MessageBox, null, 0, 0
. Endif
Invoke crt_malloc, _ sizechar
. If eax
MoV lparctan, eax
. Else
Invoke MessageBox, null, 0, 0
. Endif
Invoke crt_malloc, _ sizechar
. If eax
MoV lpbuffer1, eax
. Else
Invoke MessageBox, null, 0, 0
. Endif
Invoke crt_malloc, _ sizechar
. If eax
MoV lpbuffer2, eax
. Else
Invoke MessageBox, null, 0, 0
. Endif
;-----------------------------------
; Start of Time Test
Invoke queryperformancecounter, ADDR dqtickcounter1
; //////////////////////////////////////// //////////
Invoke _ settointeger, _ size, lppi, 0
XOR ECx, ECx
@@:
Push ECx
Invoke _ arccot, P [ECx * 4], _ size, lparctan, lpbuffer1, lpbuffer2
Pop ECx
MoV eax, M [ECx * 4]
And eax, 80000000
. If eax! = 0
Neg M [ECx * 4]
MoV @ flag, 1
. Else
MoV @ flag, 0
. Endif
Push ECx
Invoke _ Mul, _ size, lparctan, M [ECx * 4]
. If @ flag
Invoke _ Sub, _ size, lppi, lparctan
. Else
Invoke _ add, _ size, lppi, lparctan
. Endif
Pop ECx
INC ECx
CMP ECx, @ nbarctan
Jl @ B

Invoke _ Mul, _ size, lppi, 4
; //////////////////////////////////////// //////////
; End of time test
Invoke queryperformancecounter, ADDR dqtickcounter2
Invoke queryperformancefrequency, ADDR dqfreq
MoV eax, dword ptr dqtickcounter1
MoV edX, dword ptr dqtickcounter1 [4]
Sub dword ptr dqtickcounter2, eax
Sub dword ptr dqtickcounter2 [4], EDX
Finit
Fild dqfreq
Fild dqtickcounter2
Fimul dwlm
Fdivr
Fistp dqtime; the 64-bit value in dqtime is the time interval (in milliseconds)
Invoke crt_printf, offset fmt6, dqtime

;---------------------------------------------------
Invoke _ print, _ size, lppi; output results to the console
;----------------------------------------------------

Invoke crt_free, lppi
Invoke crt_free, lparctan
Invoke crt_free, lpbuffer1
Invoke crt_free, lpbuffer2
XOR eax, eax
RET

_ Pic1c endp

; **************************************** *******************
Start:
Invoke _ pic1c
Invoke crt_system, offset szpause
Invoke exitprocess, 0
End start

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More