Why does C language run fast without java?

Source: Internet
Author: User

I wrote activex controls and java applets for the first time. Lao Meng mentioned this problem and intuitively told me that java is running much faster than c.
If you don't believe it, you can install vc6 and vj6, write two programs with the same functions, run them, and you will feel That vj is no inferior to vc in terms of running speed and compilation speed, WFC is better than MFC.
I/O processing in java is not as good as that in c. I have tested it for a long time. The throughput of apache and resin on the same Linux 2.6 server is basically the same, and there is no performance difference. With the exception of nginx, not everyone can write nginx. Mustang also uses epoll technology. The only difference in java is that it cannot directly obtain data from the Linux kernel. You still need to copy data from the kernel space to the user space.
Java's object pool design is a mistake. I often see similar code in sohu, create a new object, and put it in memcache. The name is like this. When I use it, I will retrieve it from memcache, as we do not know, connecting to memcache once, getting the object, and then deserializing it, I am afraid that hundreds of cpu commands are not enough. Creating a java object, but 10 cpu commands, and when will this object be released, it is also a problem. It is better to create a single unit and use it more efficiently.
Reference of other websites
In the modern JVM, the best malloc implementation is much faster. After HotSpot 1.4.2, the Common Code paths of new objects () in virtual machines can contain up to 10 machine commands (for data provided by Sun, see references ), the best execution of malloc is implemented in C language. The average number of commands required for each call is between 60 and 100 (Detlefs, etc.; see references ). In addition, the allocation performance is not a negligible part in the overall performance. The evaluation shows that for many actual C and C Programs (such as Perl and Ghostscript ), 20% to 30% of the overall execution time are spent on malloc and free. If you don't believe it, look for the c program. Many statements are malloc and free.
This "sound rational comment" (it is easier to clean up garbage in large batches than by a few o'clock a day later) is evidenced by data. A study (Zorn; see references) measures the effect of replacing malloc with a conservative Boehm-Demers-Weiser (BDW) in many common C applications. The result is: many programs run with garbage collection instead of a traditional distributor, which shows a higher speed. (BDW is a conservative, non-mobile Garbage Collector that severely limits the ability to optimize allocation and recovery, and also limits the ability to improve the memory location; for example, the precise floating collectors used in JVM can be better .)

Allocation in JVM is not always so fast. Early JVM allocation and garbage collection performance were actually very poor. This is of course the origin of JVM allocation slowness. Early on, we saw a lot of "Slow allocation" comments-just like everything in the early JVM, it's really slow-performance Consultants offer a lot of tips to avoid allocation, such as object pools. (Public service declaration: Except for objects with the highest weight, the object pool is now a serious performance loss for all objects, it also requires a skill to use the object pool without causing a concurrency bottleneck .) However, many changes have taken place since JDK 1.0. The generation collector introduced in JDK 1.2 supports much simpler allocation methods, which greatly improves performance.

Generational garbage collection

The generational Garbage Collector divides the heap into multiple generations. Most JVMs use two generations: "Young Generation" and "Old Generation ". Objects are allocated in the young generation; if they still exist after a certain amount of garbage collection, they are treated as "long-lived" and promoted to the old generation.

HotSpot provides the choice of using three young generation collectors (Serial copy, parallel copy, and parallel cleaning). They all adopt the "copy" collector form and have several important common features. The copy collector divides the memory space from the middle into two halves, each of which uses only half. At the beginning, half of them constitute a large block of available memory. When a distributor meets the allocation request, it returns the first N Bytes of the space it does not use, and move the pointer (separated by "use") from the "free" part, as shown in the pseudo code in Listing 1. When the half used is full, the garbage collector copies all the active objects (not those in the garbage) to the bottom of the other half (compresses the heap into a continuous one ), then, allocate data from the other half.

Listing 1. Actions of the distributor when a copy collector exists

Void * malloc (int n ){
If (heapTop-heapStart doGarbageCollection ();

Void * wasStart = heapStart;
HeapStart = n;
Return wasStart;
}

From this pseudocode, we can see why the copy collector can achieve such a fast distribution-allocating new objects only checks whether there is enough space in the heap. If there is still, move the pointer. You do not need to search for the Free List, the best match, the first match, and the lookaside list. You only need to retrieve the first N Bytes from the heap.

How to recycle it?

But the allocation is only half of the memory management, and the recovery is the other half. For most objects, the cost of direct garbage collection is zero. This is because the copy collector does not need to access or copy dead objects and only processes active objects. Therefore, after allocation, it will soon become a garbage object, which will not cause the workload of the collection cycle.

In typical Object-oriented Programs, the vast majority of objects (between 92% and 98%, according to different studies) "die young", which means that after they are assigned, usually it will soon become garbage before the next garbage collection. (This attribute is called the generational hypothesis. It has been tested in many object-oriented languages and proved to be true .) Therefore, not only must the allocation be fast, but also can be freely recycled for most objects.

Local thread allocation

If the distributor is fully implemented as shown in Listing 1, the shared heapStart field will quickly become a significant concurrency bottleneck because each allocation requires a lock to protect this field. To avoid this problem, most JVM uses local thread allocation blocks. In this case, each thread allocates a larger memory block from the heap, then, the local block of this thread is used in sequence to provide services for small allocation requests. Therefore, threads are greatly reduced in a large amount of time to obtain the shared heap lock, thus improving concurrency. (In the traditional implementation of malloc, it is more difficult and costly to solve this problem. The construction of thread support and garbage collection into the platform promotes such collaboration .)

 
Stack allocation

C provides programmers with the option to allocate objects in the heap or stack. Stack-based allocation is more effective: the allocation is cheaper and the collection cost is actually zero. Moreover, the language provides help to isolate the object lifecycle, reducing the risk of forgetting to release the object. On the other hand, in C, you must be very careful when releasing or sharing stack-based object references, because during stack frame sorting, stack-based objects will be automatically released, this leads to an exclusive pointer.

Another advantage of stack-based allocation is that it is more friendly to high-speed cache. In modern processors, the cost of cache omission is very significant, so if the language and runtime can help the program to achieve better data location, it will improve performance. The top of the stack is usually "hot" in the cache, while the top of the stack is usually "cold" (because it may take a long time since this part of the memory is used ). Therefore, allocating objects on the stack results in more cache omissions than allocating objects on the stack.

Worse, when allocating objects on the stack, there is a particularly annoying memory interaction in the cache omission. When allocating memory from the heap, No matter what content is left after the memory is used last time, the content in the memory is treated as garbage. If the memory block allocated at the top of the heap is not in the cache, execution will delay when the memory content is loaded into the cache. Then, we need to overwrite those values that are time-consuming and laborious to load the cache with 0 or other initial values, resulting in a waste of memory activity. (Some processors, such as Azul's Vega, include hardware support for accelerating heap allocation .)

Escape Analysis

Java statements do not provide any explicit way to allocate objects on the stack, but this fact does not affect JVM's ability to use stack allocation where appropriate. JVM can use the escape analysis technology. Through this technology, JVM can find that some objects are limited to a single thread throughout their lifecycle, you will also find that this lifecycle is bound to the lifecycle of the specified stack frame. Such objects can be securely allocated on the stack rather than on the stack. Better yet, for small objects, JVM can completely optimize the allocation, and only put the object fields into registers.

Listing 2 shows an example of using escape analysis to optimize heap allocation. The Component. getLocation () method performs a protective copy of the Component's position, so that the caller cannot inadvertently change the actual position of the Component. Call getDistanceFrom () to obtain the location of another component, including object allocation. Then, use the x and y fields of the Point returned by getLocation () to calculate the distance between the two components.

Listing 2. Typical protective copy method for returning compound values

Public class Point {
Private int x, y;
Public Point (int x, int y ){
This. x = x; this. y = y;
}
Public Point (Point p) {this (p. x, p. y );}
Public int getX () {return x ;}
Public int getY () {return y ;}
}

Public class Component {
Private Point location;
Public Point getLocation () {return new Point (location );}

Public double getDistanceFrom (Component other ){
Point otherLocation = other. getLocation ();
Int deltaX = otherLocation. getX ()-location. getX ();
Int deltaY = otherLocation. getY ()-location. getY ();
Return Math. sqrt (deltaX * deltaX deltaY * deltaY );
}
}
The getLocation () method does not know how the caller should handle the Point it returns; it may get a reference pointing to the Point, such as placing it in the collection, so getLocation () adopts a protective encoding method. However, in this example, getDistanceFrom () does not do this. It only uses Point for a short period of time and then releases it. This seems like a waste of perfect objects.

The clever JVM will see the work to be done and optimize the allocation of protective copies. First, the call to getLocation () will become inline, and the call to getX () and getY () will also be processed, resulting in getDistanceFrom () will be as effective as listing 3.


Listing 3. pseudocode describes the result of applying inline optimization to getDistanceFrom ().

Public double getDistanceFrom (Component other ){
Point otherLocation = new Point (other. x, other. y );
Int deltaX = otherLocation. x-location. x;
Int deltaY = otherLocation. y-location. y;
Return Math. sqrt (deltaX * deltaX deltaY * deltaY );
}

At this point, the escape analysis can show that the objects allocated in the first row will never break away from its basic blocks, and the getDistanceFrom () will never change the status of the other component. (Escape indicates that the object reference is not saved to the heap, or is passed to an unknown generation that may retain a copy.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.