JavaScript, Ruby, and C performance at a Glance (3): On assembly

Source: Internet
Author: User

In the Bowen (1) and (2), respectively, in 4 ways to write a prime filter algorithm, respectively, is JavaScript in Browser, node. js, Ruby and C; The end result is C fastest, node. js Second, JS in B Although not slow, but extremely unstable, So ranked third, Ruby is the slowest.

Now we rewrite the sieve algorithm with assembly language in linux64, and see How to use the final weapon: assembly language, can we further optimize the prime number screening algorithm.

If you forget the algorithm logic, it does not matter, the following separately post node. js, Ruby and C sieve code:

The first is node. JS:

 function sieve(n){    varA =New Int8array(n+1);varMax =Math. Floor (Math. sqrt (n));varp =2; while(P <= Max) { for(varI=2*P;I&LT;=N;I+=P) A[i] =1; while(A[++p]);/ * Empty * /} while(A[n]) n--;returnn;}

Then there is ruby:

def sieve (n) A =Array.New(n+1);    max = MATH.SQRT (n). To_i; p =2; whileP<=max Doi =2*p whileI<=n DoA[i] =1I+=pEnd         whilea[p+=1] ==1  Do End    End     whileA[n] Don-=1 EndNEnd

And finally the C code:

ULL Sieve (ULL N) {char *a= malloc (n+1);    if(!a)return 0;Memseta,0, n+1);ULL max = Sqrtl (n);ULL p =2;     while(P <= Max) {for (ULL i=2*p; i<=n;i+=p)            a[I] =1;         while(a[++p]); */* Empty * /} while(a[n]) n--;    returnN;}

The following is an attempt to rewrite the Sieve function with the assembly, the points to note are:

    1. You can not call the SQRTX standard function in C library and use the floating point fsqrt directive directly;
    2. The vast majority of memory variables can be placed in registers to speed up access;
    3. Only care about the algorithm of the Sieve function, and the C code calls the assembly of the sieve, so that can play their strengths; otherwise I have to write a preamble to read the input parameters, not worth it;
    4. Note that the assembly and the C call interface: In linux64, the parameter does not press the stack pass, because the sieve has only one parameter, so it is passed in Rdi, and the return value is placed in Rax.
    5. You need to call mmap to request enough memory to make a sieve list. Note that there is not enough detailed error handling, please refer to this cat's "Linux next 64-bit assembly system call" series blog.
    6. Finally, it is important to note that code optimization and code writing must not be done at the same time! This applies in all programming languages and is especially important in the compilation! Otherwise you will become a porridge bird! Because no one can come up and write the optimized code, it must be the functional logic before starting to consider the optimization of the problem. This cat first write is the most conservative code, all the variables in memory, with the use of the fetch, run out of save. After the code logic is correct (at which time the sieve 100000000 is calculated to take 4xxx ms), the memory variable is gradually transferred to the register.

To illustrate that the code can certainly be further optimized, but this cat has been here, hoping to offer a point. Let's talk about the results: The sieve version written in sinks is the fastest, over the C code, running the fastest 37xx milliseconds on the cat Intel (R) Core (TM) 2 Duo CPU T7100 @ 1.80GHz, which is 100-200 milliseconds faster than the C version, and very stable.

Finally post the C main.c and assemble the SIEVE.S code:
MAIN.C:

#include <stdio.h>#include <stdlib.h>#include <stdbool.h>#include <string.h>#include <time.h>#include <unistd.h>typedef unsigned Long LongULL; ULL Sieve (ULL N);intMainintargcChar**ARGV) {ULL n =0;if(ARGC <2){printf("Usage%s n\n", argv[0]);return 1; }sscanf(argv[1],"%llu", &n);if(n = =0){puts("Wrong number format");return 2; }Else if(N <0){puts("must + number");return 3; }intstart = Clock (); ULL result = Sieve (n);if(Result = =-1){puts("Sieve Calc failed!");return 4; }DoubleEnd = ((1.0* (Clock ()-start)/clocks_per_sec) *1000.0;printf("Max P is%llu (take%f ms) \ n", result,end);return 0;}

SIEVE.S of the assembly:

Section. DataN:dq0Len:dq0Addr:dq0P:dq2Max:dq0I:dq2Section. TextGlobal sieveSieve:    PushRbpPushRbxPushRcxmovRbp,rspmov[N],rdi; save 1st Arg to n    IncRdimov[Len],rdi; Mmap len = n + 1    moveax9           ; call Syscall mmap    movRdi0    movRsi,[len]movRdx3    mov R10, -    mov R8,-1    mov R9,0Syscall CMP Rax,0xfffffffffffff001      ; Mmap ErrorJB Nextmovrax,-1          ; return-1    jmpQuitNext:   ; save Mmap return addr                    ; Fixme:mmap space always 0???Fild Qword [n]; calc sqrt (n) and save result to MaxFsqrt FISTP Qword [MAX]mov R15, [P]; R15 = P    mov R14, [Max]; R14 = Max    mov R13, [n]; R13 = n    mov R12, [i]; R12 = iEnter_while:CmpR15,R14         ; if P<=maxJA quit_whilemovRbxR15SHL RBX,1    mov R12, RBXenter_for:CmpR12,R13JA quit_formovbyte [Rax +R12],1    Add R12,R15    jmpEnter_forquit_for:    Inc R15    movCl,byte [Rax +R15] Test CL,CL jnz quit_forjmpEnter_whileQuit_while:    movCl,byte [Rax +R13] Test CL,CL JZ pre_quitDec R13    jmpQuit_whilePre_quit:    movRaxR13quit:    movRsp,rbpPopRcxPopRbxPopRbpret

JavaScript, Ruby, and C performance at a Glance (3): On assembly

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.