Comparison of efficiency between C/C ++/perl/ASSEMBLY/Java

Source: Internet
Author: User

Incident

I rummaged over the books that spread the case. These books are all about programming and system management. After so many years of working as a programmer, there have been countless projects and software. Every new friend asks me this so-called "Old Generation": Which language is the best? I always make a knowledgeable look, repeat the "knowledge" that I heard from older elders or some famous books ". Just like from the first day of learning programming, we were told by computer teachers that the COBOL language is good at dealing with business affairs, and the fotran language is used for scientific computing. Similar knowledge is "assembly language is much faster than C Language" and "Java is a very low-efficiency language environment", which has become beyond doubt among generations of programmers.

I came up with an idea. Can I use several programming languages for the same application to compare them and see which language is the most efficient?

To be honest, I personally think this idea is boring. Think about who will write the same program repeatedly in different languages? It is also idle to beat children in rainy days. In addition, there is a quantitative analysis of the vulnerabilities and advantages of a language, which is of little guiding significance for the tool selection we will face in our future project. In addition, I think fun is the real reason for doing this.

Answer questions

What kind of program problems should be selected for such testing? This is a critical issue and most likely affects the fairness of the test. In addition, each language has different advantages. Programmers have different preferences. In the Internet and in reality, the debate on which language is better has never been stopped. Even the various camps composed of programmers from various schools, there are not a few who treat a language as God. Don't believe it. Why do you say "Java execution efficiency is too low for cloud computing" in the csdn Java forum? It will be immediately overturned by overwhelming bricks. Similarly, there are also Postmaster preferences and arguments about the operating system: If you praise windows on a Linux forum, it is simply hard to say that it is so fierce. Therefore, in this sense, the programmer's preference for programming languages is similar to the warrior's love for firearms. The player's love for racing cars has become a spiritual thing. Mr. Cai xuexiao said well: Some people will fight against Microsoft, and others will hold it at Microsoft. This is a pure spiritual love, but it may affect normal and scientific thinking.

As you may expect, this article will surely be met by various heroes.

Okay, let's turn to justice. First, the most common elements of the various programming languages to be used in our selection. What are the most common elements? Of course, all of them are assignment, array operation, loop, Judgment, etc. In addition, Io operations are also important to programming languages. Second, the operation time must be long. Otherwise, it is extremely unfair for the explanatory language: the interpreter has not been transferred to the memory yet, and the compilation school has run out. Finally, the program cannot be too complex. In addition to my determination to complete a complex algorithm in a variety of languages with less perseverance, the program is too complex and the algorithm plays an increasingly important role in testing, the reason for affecting the running efficiency is also increased. The algorithm is too complex, and the extended part of the development tool is used more. So it became a competition between additional language libraries, which I don't want to see.

Considering the above factors, I have designed a simple option: To search for a specified string from a specified text file and calculate the number of strings. And print the number of searched results. As a programmer, you will immediately think that this algorithm contains basic program language factors such as conditional judgment, loop, array operations, etc. This satisfies the first condition above. In addition, to meet the second condition, I have prepared a text file of up to 2 GB, with a total of more than 15 million lines of text. This saves enough running time (but it should not be too long), and will never be executed in a blink of an eye. Finally, we all know that the algorithm used to search for substrings in a text string is a typical example in a data structure textbook (the exam is also frequently tested) and meets the simple requirements of the algorithm. At the same time, in order to make the environment of each program the same, I have to restart the machine once every test to avoid the impact of cache.

Preparation

The competition should be fair. First, the hardware platform should be unified. I found a pretty good machine (server): Two piii800, 1 GB memory. OS: Windows Server is installed on the original machine. Almost no other applications are installed. I am a little lazy and haven't re-installed the OS. Use it like this.

First player: Perl

If someone else gives me this question, I will immediately decide to use the Perl language to do it. This question is a complete text processing problem. Is it more appropriate to use Perl? Perl is a language specially designed for text processing. As a matter of fact, it took me 2 minutes to write a few lines of code to easily implement this problem. This also shows that choosing a suitable programming language tool is more important than choosing a preferred tool.

#! /Usr/bin/perl
$ Filename = "D:/access. log _";
$ COUNT = 0;
Open (file, "<$ FILENAME ");
While (<File>)
{
@ Match_list = ($ _ = ~ /Hit/G );
$ COUNT = $ count + @ match_list;
}
Close (File );
Print "Count = $ count ";
Exit
 

Perl was invented by Larry Wall, a linguistics. In fact, it was originally used to process text files on UNIX platforms (Perl = Practical Extraction Report Language: utility report extraction language ). Later, it was discovered that HTML pages consisting of a large number of texts could not be used as CGI programs to generate dynamic pages. With the rise of the Internet, Perl grew bigger. The syntax of this language is similar to that of the C language. Therefore, it is easy to master. Moreover, its powerful processing functions of "regular expressions" are basically beyond the expectations of others. In fact, it is similar to "filtering out the first and third letters containing Tom or ABC, and the latter appears at least twice, the latter has five lines of text with an interval of 8 or 4 letters or spaces ". I guess you are trying to figure out this sentence repeatedly. In fact, this is a so-called regular expression. In this case, Perl only needs one line of statements. How many statements are required in C language.

I will give a brief explanation of the above program, so that programmers who have not used the Perl language can also have a perceptual knowledge.

The first line is used in UNIX, because Perl is an interpreted scripting language.

The fourth line is to open the file.

The following loop refers to the content of a read file in one row. The first sentence in the loop is to put all the hit contained in the text line into an array; the second sentence in the loop is to count the number of hit in the array just now, and then accumulate. After the cycle is completed, our task is completed. How are you doing? "/Hit/G" is the simplest regular expression.

The current Perl language is no longer the original script language image. Modern Perl has almost all the features of its special language, and with the help of the module function, it can implement a large number of applications. It also adds some object-oriented features. Although most people are still using it to process a large amount of text, Perl is also used to complete large-scale applications, especially in the Web aspect. It is worth mentioning that Perl is also a cross-platform language.

On the test platform, I used the perl5.8 interpreter to scan 15 million lines of text at 8 minutes 18 seconds and got the correct result.

Second contestant: pure C

Maybe I am older, but I really like C. What I like most is to use pointer and forced type conversion for arbitrary data operations. I even piece together a long integer of data through pointers in the program. If a sentence may be controversial, I think the Java language's practice of dropping cute pointers is basically an escape. You don't need to use it because it's hard to grasp it. In the end, efficiency is sacrificed.

In this article, it should be a good choice to use the C language. The following code is a string search program of pure C Code implemented under VC (to avoid the interference of the graphic interface, it is made into a console program ). Use the speed-First compilation option during compilation.

# Include <stdio. h>
# Include <string. h>

Void main ()
{
Int Len = 2048;
Char filename [20]; // file name
Char buff [10000]; // File Buffer
Char hit [5];
File * FD;
Int I, j, flag = 0, over = 0;
Int Max, readed;
Int COUNT = 0; // the final result.
Strcpy (& filename [0], "d:/access. log _");
Strcpy (& hit [0], "hit ");
Buff [0] = 0x0;
Buff [1] = 0x0;
// Open the file:
If (FD = fopen (& filename [0], "rb") = NULL)
{
Printf ("error: can not open file % s", & filename [0]);
}
// Read the file content
While (over! = 1)
{
Readed = fread (& buff [2], 1, Len, FD );
If (readed <Len)
{
Over = 1;
Max = readed;
}
Else
{
Max = Len;
}
For (I = 0; I <Max; I ++)
{
For (j = 0; j <3; j ++)
{
If (hit [J]! = Buff [I + J])
{
Flag = 0; // exit once there is a different one and the flag is 0
Break;
}
Else
{
Flag = 1; // if one is the same as 1, the final result must be 1 if both are continuous.
}
}
If (flag = 1)
{
Count ++;
I + = J-1;
}
Else
{
If (j = 0)
{
I + = (j );
}
Else
{
I + = (J-1 );
}
}
}
// Transfer the last two characters to the first two bytes to prevent the string from being dropped.
Buff [0] = buff [Max];
Buff [1] = buff [Max + 1];
}
Fclose (FD );
Printf ("count: % d", count );
}
 

The program is familiar with the standard string search algorithm in the textbook, but it is longer than the previous Perl program? That's because Perl has helped you complete most of the work. However, you may be happy to see the running results of the above program. It used only 2 minutes 10 seconds at the earliest time, and 15 million lines of text search tasks were completed at 2 minutes 20 seconds at the slowest time. The average number is more than 2 minutes 15 seconds. Why is the time different? I don't know the specific reason, but those who have learned the operating system will understand that the code can be executed only in a single task system.

Some experienced friends may say that your buffer only uses 2048 bytes, increasing the speed will increase. Yes, and I believe there are other experts who can make faster programs, but this is not important. What is important is that we want to examine the efficiency of the same job in different languages. And you can understand that it is enough to improve the efficiency of the program. Because in C language programs, these are free and controllable.

Third contestant: C ++

C ++ is a relative of C ++. I simply transplanted the previous C code and changed the input part of the file to a stream object. As for the algorithm. C is exactly the same as above. At last, in addition to the optimal Compilation speed, C ++ compilation parameters are also used. Therefore, the execution file length is longer than that of C, this shows that the stream code I added is more complex than the standard C library. Yes, C ++ is the top computing complexity in the popular computer programming languages. Its complex class and inheritance relationships, as well as various initialization sequence and constructor execution sequence need to be considered. There are also polymorphism and Dynamic Association technology. C ++ is also my favorite language. It provides object-oriented code reuse features and sufficient security models, but it is indeed less efficient than pure C. You know, most of the core operating systems are written in pure C. Although complicated, object-oriented technology is rarely used. Why is it not that object-oriented technology is not good or the core of the operating system is not complex enough ?), The main consideration is efficiency.

# Include <stdio. h>
# Include <string. h>
# Include <fstream. h>

Void main ()
{
Int Len = 2048;
Char filename [20]; // file name
Char buff [10000]; // File Buffer
Char hit [5];
Int I, j, flag = 0;
Int Max;
Int COUNT = 0; // the final result.
Strcpy (& filename [0], "d:/access. log _");
Strcpy (& hit [0], "hit ");
Buff [0] = 0x0;
Buff [1] = 0x0;
// Open a file with an input stream:
Ifstream input (& filename [0]);
// Read the file content
While (input)
{
Input. Getline (& buff [2], Len );
Max = strlen (& buff [2]);
For (I = 0; I <Max; I ++)
{
For (j = 0; j <3; j ++)
{
If (hit [J]! = Buff [I + J])
{
Flag = 0; // exit once there is a different one and the flag is 0
Break;
}
Else
{
Flag = 1; // if one is the same as 1, the final result must be 1 if both are continuous.
}
}
If (flag = 1)
{
Count ++;
I + = J-1;
}
Else
{
If (j = 0)
{
I + = (j );
}
Else
{
I + = (J-1 );
}
}
}

}
Printf ("count: % d", count );
}

On the test platform, the C ++ program used the fastest time from 4 minutes 25 seconds to 5 minutes 40 seconds to complete text retrieval of 15 million lines, and 10951968 "hit" strings are retrieved from 2 GB files. The result is correct.

Fourth contestant: Assembly

I thought that the assembler program could reach an unprecedented high speed, leaving the front players far behind and smiling. This idea supported me to complete the difficult code. But in fact, the lack of test results makes me very disappointed. The program is completely written using machine commands, and only a few hundred bytes are removed from the buffer zone. The algorithm is exactly the same as the previous C program, scanning 15 million lines of text would take as short as 2 minutes 14 seconds and 56! This is even the fastest possible comparison with the C language. On average, the speed of the assembler program is in the gap with the previous C program. I'm afraid this result is beyond the surprise of most people. Since the day we entered the line, we were told that assembly is the fastest language you can master! Although the Code is hard to understand, the cost of performance is worthwhile. From the test here, do you think the following code is worth the same speed and function as the C language?

; Stack segment
Stsg segment Stack's'
DW 64 DUP (?)
Stsg ends

; Data Segment
Data Segment
Rlength equ 2048
Fname dB 'access. log _ ', 0
Hit DB 'Hit $'
Fd dw? ; File handle
Resault dB 'count: $ '; Result prompt
Count dd 0; Save the result
Disflag db 0; display flag
Buff dB 5000 DUP (0); buffer zone
Data ends

; Code segment
Code segment
Main proc far
Assume Cs: code, DS: data, SS: stsg, ES: Nothing
MoV ax, Data
MoV ds, ax
; My code starts:
MoV ah, 3DH; open the file
Lea dx, fname
MoV Al, 00 h; file Opening Method
Int 21 h; Start Operation
In this case, we will not handle the error!
; Cf = 0 indicates correct, cf = 1 indicates an error, and ax indicates a file handle or error code.
MoV FD, ax; save the file handle

Read: mov ah, 3fh; read files
MoV BX, FD; file handle
MoV CX, rlength; read length bytes

Lea dx, Buff; provides the read buffer pointer
Add dx, 2; buffer pointer backward error two (to solve the boundary problem: there is an hit that just spans the rlength boundary)
Int 21 h; start reading
; Ax contains the actual number of bytes read
After reading, scan the Buffer Zone
Push ax; number of ax bytes saved
CMP ax, 0
JZ allend; exit after the file is read

Sub dx, 2; Pointer Forward Error 2,
MoV Si, DX
Add dx, 2; return the pointer to the original position
Add dx, ax; end of Calculation
Lod3: CMP Si, DX; read the file again when it reaches the header
JZ Ovr
Lods buff
Lea BX, hit
CMP Al, [BX]
Jnz lod3; read the first byte again if it is not equal

CMP Si, DX
JZ Ovr
Lods buff
CMP Al, [bx + 1]
Jnz lod3; if the first byte is equal, read 2nd bytes. If not, repeat it from the first byte.

CMP Si, DX; if the second byte is equal, the third byte is compared.
JZ Ovr
Lods buff
CMP Al, [bx + 2]
Jnz lod3; the third byte is not equal and starts from scratch
; There is an hit match
Push BX
Lea BX, count
Add word PTR [BX], 1; add a counter
ADC word PTR [bx + 2], 0; carry
Pop BX
JMP lod3

Ovr: mov ah, [Si-1]
MoV byte PTR buff + 1, ah
MoV ah, [Si-2]
MoV byte PTR buff, ah

Pop ax; restores the total number of bytes read this time
CMP ax, rlength; to see if it is the last time (the remaining zero header)
JZ read
; For the last file read,

Allend: mov ah, 3eh; close the file
MoV BX, FD; file handle
Int 21 h; close the file

MoV ah, 9; display result string
Lea dx, resault
Int 21 h

; Convert the binary result to the 10-hexadecimal form.
MoV BX, word PTR count
Call tern

MoV ax, 4c00h; returns DoS
Int 21 h
; End code. The maximum number is already at the beginning.
Main endp

Tern proc; This subroutine converts and displays binary numbers
MoV CX and 10000
Call dec_div
MoV CX and 1000
Call dec_div
MoV CX and 100
Call dec_div
MoV CX, 10
Call dec_div
MoV CX, 1
Call dec_div
RET
Tern endp
Dec_div proc
MoV ax, BX
MoV dx, 0
Div CX
MoV BX, DX
MoV DL, Al
Add DL, 30 h
MoV ah, disflag; read flag
CMP ah, 0
Jnz disp; A valid number is displayed.
Cmp dl, 30 h
JZ nodisp
MoV disflag, 1; 0 is not displayed before the first valid number appears
Disp: mov ah, 2
Int 21 h
Nodisp: Ret
Dec_div endp
Code ends
End main
 

I guess you are too lazy to read the above Code. In fact, he cannot "Display Results ". Because the program responsible for converting the final result to a program that can display the ASCII code can only convert binary 16-bit data, and the final result is as high as 10 million 0, the display will fail. As the final result shows that it has nothing to do with the running of the program, I am too lazy to write a 32-bit ASCII Conversion Program. That's it.

Fifth contestant: Java

Java is a contestant who cannot skip the competition. So many people love it, half of them because of Java's object-oriented features and good cross-platform features. The other half is purely because JAVA does not have a surname "micro (soft)", which is the annotation of ideology to a certain language in the programmer's mind. In terms of language elements, I still prefer Java. Because his syntax is clean and concise. Environment. Although the use of virtual machine system (JVM) practices to achieve cross-platform features is not a great idea (unlike the BASIC Interpreter 30 years ago? Don't tell me any intermediate code? Almost all interpreters translate language factors into intermediate code. The JVM is only implemented in two steps, but it should be similar in terms of the operating mechanism .), However, JVM still makes Java's cross-platform features unprecedented. In addition, JVM is a very clean system, which is pleasing to the eye. I can't help but mention the J2EE enterprise application framework. I wonder how many people can understand Sun's J2EE "theoretical works "? The paper is filled with various creative concepts and is filled with beautiful words. The Java Enterprise Application Framework is really complicated. Although it cannot catch up with the later. NET Framework, it is enough to discourage most beginners. In a word, there are too many things. In fact, Java's enterprise-level applications are not as successful as imagined. iPlanet gradually fades out with the collapse of the e-commerce concept. Now I changed my name to "SunONE"-sun's employee's original words.

Back to the Java language element, Java can actually be understood as the purified C ++. Java removes some "non-object-oriented features" added by C ++ to be compatible with C, and uses other alternatives to implement functions directly implemented by C ++, such as multi-inheritance. In terms of implementation mechanism, Java programs are first compiled into. class files, and then such cross-platform intermediate code can "compile once and run everywhere. Of course, it is necessary to run in a JVM virtual machine environment, and even images and everything can be copied. In other words, you use a Java program to draw a circle on the PC screen, it is still a circle on the JAVA-PDA.

In this test, I wrote the following code and used Java for the same test. In the test, I actually used the Java file stream class, the basic language factors such as loop, condition judgment, and array operations are run. The environment is a J2SE1.3.1-06. The Java program used 15 million lines of text scanning at 8 minutes 21 seconds. It should be said that it is the slowest in several languages, basically at the same level as the pure explanation of Perl. The JVM environment of J2EE is also called hotspot optimized.

Import java. Io .*;
Public class langtest
{
Public static void main (string [] ARGs)
{
String filename = "D:/access. log _";
Try
{
Count (filename );
}
Catch (ioexception E)
{
System. Err. println (E. getmessage ());
};
}

Public static void count (string filename) throws ioexception
{
Long Count = 0;
Long Len;
String strline = "";
Char hit [] = {'h', 'I', 't'}; // string to be searched
Char buff [] = new char [2100];

Reader in = new filereader (filename); // use the filereader class to construct a reader Class Object
Linenumberreader line = NULL; // generates a null pointer.
Try
{
Line = new linenumberreader (in); // create a linenumberreader Class Object
While (strline = line. Readline ())! = NULL)
{
// A row has been read here, and several hit lines are analyzed using the following code:
Int I = 0, j = 0, max = 0, flag = 0;
Buff = strline. tochararray (); // convert to a character array
Max = strline. Length ();

For (I = 0; I <Max; I ++)
{
For (j = 0; j <3; j ++)
{
If (hit [J]! = Buff [I + J])
{
Flag = 0; // exit once there is a different one and the flag is 0
Break;
}
Else
{
Flag = 1; // if one is the same as 1, the final result must be 1 if both are continuous.
}
}
If (flag = 1)
{
Count ++;
I + = J-1;
}
Else

{
If (j = 0)
{
I + = (j );
}
Else
{
I + = (J-1 );
}
}
}
}
System. Out. println ("count:" + count );
}
Catch (ioexception E)
{
System. Err. println (E. getmessage ());
}
Finally
{
Try
{
If (in! = NULL) in. Close ();
}
Catch (ioexception E)
{
}
}
}
}
 

The macro translated by Mr. Hou jiesheng, titled Java programming thoughts, says on page 1: "Use the original Java interpreter, java is about 20 to 50 times slower than C. "I have doubts when I read it. I thought that Java is completely unnecessary. After my own hands-on experiment, I think it is more reliable to say that Java is 2-3 times slower than C in the J2EE environment. Moreover, the emergence of more and more hardware JVMs has given Java more and more opportunities. But I am worried about this. The diversity of JVM manufacturers may cause some compatibility problems. For example, I have seen an article that discusses examples of a Java program that is available in a IBM-JVM but not available in a SUN-JVM. Hopefully, Java can grow healthily.

Summary

In fact, this article has two basic meanings passed to the readers who are new programmers:

1. Let go of your ideology and select the most appropriate programming language to complete your work. Each popular language has its own meaning.

2. in programming, if you have an idea, you can do it yourself. You will come to your own conclusion.

At this point, you should understand that all the previous test results are not important. What is important is that you understand the characteristics of these languages, maybe I will add a little "experience" in my future programming career.

Postscript

I would like to continue to test another popular interpretation language, Python and the new expensive C #, and complete these tests on the Linux platform, but after all, I am still lazy and finally I am not fighting. Fortunately, Python and Perl are similar, while C # and Java are similar. You can also make a little reference.

In fact, there is a big unfair test in this article. I believe the readers have already discovered that C and ASM both use the buffer direct read method, judge the value regardless of (and check the buffer boundary with a pointer ). While C ++ and other languages use a very convenient stream to read data by row, they do a lot more: every character must be judged whether it is a carriage return or line break, recently, row-based reading has reduced the buffer size each time. Therefore, other languages suffer greatly. However, this does not affect the conclusion, because the test itself is more convenient and less efficient. People always need to do things, isn't it?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.