Server-side I/O performance: Node, PHP, Java, and go

Last Update:2017-05-19 Source: Internet

Author: User

Tags benchmark php server php example

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is an article created on, where the information may have evolved or changed.
Server I / O performance competition: Node, PHP, Java and Go
See the original text: Server-side I / O Performance: Node vs. PHP vs. Java vs. Go.

Understanding the application's input / output (I / O) model means that it differs between the planned processing load and the brutal actual usage scenario. If the application is small and does not serve a high load, it may have little effect. But as the load of the application gradually increases, using the wrong I / O model may make you step on the pit everywhere, scarred.

As with most scenarios where multiple solutions exist, the focus is not on which one is better, but on understanding how to make tradeoffs. Let's visit the landscape of I / O and see what can be stolen from it.

In this article, we will compare Node, Java, Go, and PHP with Apache, discuss how these different languages model their I / O, the advantages and disadvantages of each model, and draw some preliminary benchmarks. Conclusion. If you are concerned about the I / O performance of your next Web application, then you have found the right article.

I / O basics: a quick review
In order to understand the factors closely related to I / O, we must first review the concepts underlying the operating system. Although you won't deal with most of these concepts directly, you have been dealing with them indirectly through the application's runtime environment. The key lies in the details.

System call
First, we have a system call, which can be described as follows:

Your program (in the "user area", as they say) must let the operating system kernel perform I / O operations on its own.
"System call" (syscall) means that your program asks the kernel to do something. Different operating systems have different details for implementing system calls, but the basic concepts are the same. This will have some specific instructions to transfer control from your program to the kernel (similar to function calls but with some special sauces specifically for this kind of scenario). Generally speaking, system calls are blocked, meaning that your program needs to wait for the kernel to return to your code.
The kernel performs low-level I / O operations on what we call physical devices (hard disks, network cards, etc.) and replies to system calls. In the real world, the kernel may need to do many things to complete your request, including waiting for the device to be ready, updating its internal state, etc., but as an application developer, you do n’t need to care about these. The following is the working situation of the kernel.
Blocking and non-blocking calls
Okay, I just said above that system calls are blocking, which is usually true. However, some calls are classified as "non-blocking", meaning that after the kernel receives your request, it puts it in a queue or somewhere in the buffer, and then returns immediately without waiting for the actual I / O call. So it just "blocked" for a very short period of time, just short of putting your request on the list.

Here are some examples (Linux system calls) that help to explain clearly: -read () is a blocking call-you pass it a file handle and a buffer to store the read data, then this call will be used when the data is good Then return. Note that this method has the advantages of elegance and simplicity. -epoll_create (), epoll_ctl (), and epoll_wait () are calls that let you create a set of handles for listening, add / remove handles from that set, and then block until there is activity. This allows you to effectively control a series of I / O operations through a single thread. If you need these features, this is great, but as you can see, it is certainly quite complicated to use.

It is important to understand the magnitude of the time-sharing difference here. If a CPU core runs at 3GHz, it will perform 3 billion cycles per second (or 3 cycles per nanosecond) without optimization. Non-blocking system calls may require cycles of the order of 10 nanoseconds to complete-or "relatively few nanoseconds." It may take more time for blocking calls that are receiving information through the network-for example 200 milliseconds (0.2 seconds). For example, suppose a non-blocking call consumes 20 nanoseconds, then a blocking call consumes 200,000,000 nanoseconds. For blocking calls, your program waited 10 million times longer.

The kernel provides two methods, blocking I / O ("read and give me data from network connections") and non-blocking I / O ("tell me when there is new data on these network connections"). Which mechanism is used, the length of the blocking time corresponding to the calling process is obviously different.

Scheduling
The next key thing is what to do when a large number of threads or processes start to block.

For our purposes, there is not much difference between threads and processes. In fact, the most obvious execution-related difference is that threads share the same memory, and each process has its own memory space, making separate processes often occupy a lot of memory. But when we discuss scheduling, it ultimately comes down to a list of events (threads and processes are similar), where each event needs to get a slice of execution time on a valid CPU core. If you have 300 threads running and running on 8 cores, then you have to run each core for a short time and then switch to the next thread, dividing these times so that each thread can get It's time-sharing. This is achieved through "context switching", so that the CPU can switch from a running thread / process to the next.

These context switches have a certain cost-they consume some time. In the fast time, it may be less than 100 nanoseconds, but depending on the implementation details, processor speed / architecture, CPU cache, etc., it is not uncommon to consume 1000 nanoseconds or longer.

The more threads (or processes), the more context switches. When we talk about thousands of threads, and each switch requires hundreds of nanoseconds, the speed will become very slow.

However, a non-blocking call essentially tells the kernel to "call me only when you have some new data or any of these connections has an event." These non-blocking calls are designed to efficiently handle large amounts of I / O load and reduce context switching.

Are you still reading this article so far? Because now comes the interesting part: let's take a look at how some fluent languages use these tools, and make some conclusions about the trade-off between ease of use and performance ... and other interesting comments.

Please note that although the examples shown in this article are trivial (and incomplete, but only show the relevant parts of the code), database access, external caching systems (memcache, etc.) and any I / O requiring Things end with performing some I / O operations behind them, and these have the same impact as the examples shown. Similarly, for scenarios where I / O is described as "blocking" (PHP, Java), reading and writing of HTTP requests and responses are blocking calls themselves: again, more I / O hidden in the system O and its accompanying performance issues need to be considered.

There are many factors to consider when choosing a programming language for a project. When you only consider performance, there are even more factors to consider. However, if you are concerned that the program is mainly limited to I / O, if I / O performance is critical to your project, then these are all you need to know. The "keep it simple" approach: PHP.

Back in the 1990s, many people wore Converse shoes and wrote CGI scripts in Perl. Then came PHP, which many people like to use, which makes it easier to make dynamic web pages.

The model used by PHP is fairly simple. Although there are some changes, basically the PHP server looks like:

The HTTP request comes from the user's browser and accesses your Apache web server. Apache creates a separate process for each request, and reuses them with some optimization to minimize the number of times it needs to execute (the creation process is relatively slow). Apache calls PHP and tells it to run the corresponding .php file on disk. The PHP code executes and makes some blocking I / O calls. If file_get_contents () is called in PHP, it will trigger the read () system call behind the scenes and wait for the result to return.

Of course, the actual code is simply embedded in your page, and the operation is blocked:

<? php

// Blocked file I / O
$ file_data = file_get_contents ('/ path / to / file.dat');

// Blocked network I / O
$ curl = curl_init ('http://example.com/example-microservice');
$ result = curl_exec ($ curl);

// More blocked network I / O
$ result = $ db-> query ('SELECT id, data FROM examples ORDER BY id DESC limit 100');

?>
Regarding how it integrates with the system, it looks like this:
Quite simple: one request, one process. I / O is blocked. What are the advantages? Simple and feasible. What are the disadvantages? Connect with 20,000 clients at the same time, and your server will hang up. Since the tools provided by the kernel for processing large-capacity I / O (epoll, etc.) are not used, this method cannot be scaled well. To make matters worse, running a separate process for each request often uses a lot of system resources, especially memory, which is usually the first thing encountered in such a scenario.

Note: The method used by Ruby is very similar to PHP, in a broad and common way, we can think of it as the same.

Multi-threaded approach: Java
So just when you bought your first domain name, Java came and it was cool to just say "dot com" after a sentence. Java has multithreading built into the language (especially when it is created), which is great.

Most Java web servers start a new thread of execution for each incoming request, and then finally call the function you wrote as an application developer in that thread.

Performing I / O operations in a Java Servlet often looks like this:

public void doGet (HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException
{

// Blocked file I / O
InputStream fileIs = new FileInputStream ("/ path / to / file");

// Blocked network I / O
URLConnection urlConnection = (new URL ("http://example.com/example-microservice")). OpenConnection ();
InputStream netIs = urlConnection.getInputStream ();

// More blocked network I / O
out.println ("...");
}
Since our doGet method above corresponds to a request and runs in its own thread, instead of each request corresponding to a separate process that requires its own dedicated memory, we will have a separate thread. This will have some good advantages, such as sharing state between threads, sharing cached data, etc., because they can access each other ’s memory, but the impact of how it interacts with scheduling is still the same as in the previous PHP example The content is almost identical. Each request will create a new thread, and various I / O operations in this thread will block until the request is completely processed. To minimize the cost of creating and destroying them, threads are brought together, but still, having thousands of connections means tens of thousands of threads, which is not good for the scheduler.

An important milestone is that in Java version 1.4 (and version 1.7, which was again significantly upgraded), the ability to perform non-blocking I / O calls is gained. Most applications, websites and other programs do not use it, but at least it is available. Some Java web servers try to take advantage of this in various ways; however, the vast majority of deployed Java applications still work as described above.
Java takes us one step further, and of course there are some good "out-of-the-box" features for I / O, but it still does not really solve the problem: when you have a serious I / O-bound application that is being What to do when thousands of blocked threads are dragging about to fall to the ground.

Non-blocking I / O as first-class citizens: Node
When it comes to better I / O, Node.js is undoubtedly the new favorite. Anyone who has had the simplest understanding of Node has been told that it is "non-blocking" and that it can effectively handle I / O. In a general sense, this is correct. But the devil is hidden in the details, and this way of witchcraft is crucial when it comes to performance.

In essence, the paradigm implemented by Node is not basically to say "write code here to process requests", but to change to "write code here to start processing requests". Every time you need to do something involving I / O, make a request or provide a callback function that Node will call when it is done.

The typical Node code for I / O operations in seeking is as follows:

http.createServer (function (request, response) {
fs.readFile ('/ path / to / file', 'utf8', function (err, data) {
response.end (data);
});
});
As you can see, there are two callback functions. The first will be called when the request starts, and the second will be called when the file data is available.

This basically gives Node a I / O opportunities are effectively handled between these callback functions. A more relevant scenario is to make database calls in Node, but I do n’t want to list this annoying example because it is exactly the same principle: start the database call and provide a callback function to Node, which uses non-blocking calls separately Perform I / O operations, and then call the callback function when the data you require is available. This kind of I / O call queue, let Node process, and then get the callback function mechanism is called "event loop". It works very well.
However, there is a level in this model. Behind the scenes, the reason is more about how to implement JavaScript V8 engine (Chrome's JS engine for Node) 1, rather than anything else. All the JS code you write runs in one thread. think for a while. This means that when using effective non-blocking technology to perform I / O, the JS that is performing CPU binding operations can run in a single thread, and each code block blocks the next. A common example is looping database records, processing them in some way before outputting to the client. Here is an example that demonstrates how it works:

var handler = function (request, response) {

connection.query ('SELECT ...', function (err, rows) {

if (err) {throw err};

for (var i = 0; i <rows.length; i ++) {
// Process each row of records
}

response.end (...); // output result

})

};
Although Node does handle I / O effectively, the for loop in the example above uses CPU cycles in your main thread. This means that if you have 10,000 connections, the loop may make your entire application as slow as a snail, depending on how long each loop takes. Each request must be shared in the main thread for a period of time, one at a time.

The premise of this overall concept is that I / O operations are the slowest part, so the most important thing is to effectively handle these operations, even if it means doing other processing serially. This is correct in some cases, but not all are correct.

Another point is that although this is just an opinion, writing a bunch of nested callbacks can be quite annoying, and some people think it makes the code obviously unruly. In the depths of Node code, it is not uncommon to see four levels of nesting, five levels of nesting, and even more levels of nesting.

We are back to weighing again. If your main performance problem is I / O, then the Node model works well. However, its Achilles heel (translator's note: from Greek mythology, which represents a fatal weakness) is that if you are not careful, you may process HTTP requests and place CPU-intensive code in a function, and finally Each connection is as slow as a snail.

True non-blocking: Go
Before entering the Go chapter, I should disclose that I am a Go fan. I have used Go in many projects, and is an open supporter of its productivity advantages, and I saw them at work while using it.

In other words, let's see how it handles I / O. A key feature of Go is that it contains its own scheduler. Not every thread execution corresponds to a single OS thread, Go uses the concept of "goroutines". The Go runtime can assign a goroutine to an OS thread and execute it, or suspend it without being associated with the OS thread, depending on what the goroutine does. Each request from Go's HTTP server is processed in a separate Goroutine.

This scheduler works as follows:
This is achieved at various points in the Go runtime, by writing requests / reading / connecting / etc to implement I / O calls to put the current goroutine to sleep, and use information to put the goroutine when further action can be taken Wake up again.

In fact, except that the callback mechanism is built into the implementation of I / O calls and automatically interacts with the scheduler, what the Go runtime does is not much different from what Node does. It is also not subject to the restriction that all handler code must run in the same thread. Go will automatically map Goroutine to the OS thread it deems appropriate according to the logic of its scheduler. The final code looks like this:

func ServeHTTP (w http.ResponseWriter, r * http.Request) {

// The underlying network call here is non-blocking
rows, err: = db.Query ("SELECT ...")

for _, row: = range rows {
// handle rows
// Each request is in its own goroutine
}

w.Write (...) // Output response result, also non-blocking

}
As you saw above, our basic code structure looks like a simpler way, and it implements non-blocking I / O behind it.

In most cases, this is ultimately "the best of the two worlds." Non-blocking I / O is used for all important things, but your code looks like it is blocking, so it is often easier to understand and maintain. The interaction between the Go scheduler and the OS scheduler handles the rest. This is not a complete magic. If you build a large system, it is worthwhile to spend more time to understand more details of how it works; but at the same time, the "out of the box" environment can be very Work well and expand well.

Go may have its shortcomings, but in general, the way it handles I / O is not among them.

Lies, cursed lies and benchmarks
It is difficult to accurately time the context switching of these various modes. It can also be said that this does not have much effect on you. So instead, I will give some benchmarks that compare the performance of HTTP servers in these server environments. Remember, the performance of the entire end-to-end HTTP request / response path is related to many factors, and the data I put together here are just a few samples so that a basic comparison can be made.

For each of these environments, I wrote appropriate code to read a 64k file in random bytes, run a SHA-256 hash N times (N is specified in the query string of the URL, for example ... /test.php?n=100), and print the generated hash in hexadecimal form. I chose this example because it is a very simple way to use some consistent I / O and a controlled way to increase CPU usage to run the same benchmark.

For more details on environmental use, please refer to these benchmark points.

First, let's look at some examples of low concurrency. Run 2000 iterations, 300 concurrent requests, and only one hash per request (N = 1), you can get:
Time is the average number of milliseconds to complete a request among all concurrent requests. The lower the better.

It is difficult to draw a conclusion from a chart, but to me, it seems to be related to the connection and the amount of calculation. We see that time is more related to the general execution of the language itself, so it is more I / O. Please note that the language that is considered to be a "scripting language" (input at random, dynamic interpretation) performs the slowest.

But if N is increased to 1000 and 300 requests are still concurrent, what will happen-the same load, but the hash iteration is 100 times the previous (significant increase in CPU load):
Time is the average number of milliseconds to complete a request among all concurrent requests. The lower the better.

Suddenly, Node's performance dropped significantly because the CPU-intensive operations in each request blocked each other. Interestingly, in this test, PHP's performance is much better (relative to other languages), and beats Java. (It is worth noting that in PHP, the SHA-256 implementation is written in C, and the execution path takes more time in this loop, because this time we performed 1000 hash iterations).

Now let's try 5000 concurrent connections (and N = 1)-or close to this. Unfortunately, for most of these environments, the failure rate is not obvious. For this chart, we will focus on the total number of requests per second. The higher the better:

The total number of requests per second. The higher the better.

This picture looks very different. This is a guess, but it looks like for high connection volume, the cost of each connection is related to spawning new processes, and the extra memory associated with PHP + Apache seems to be the main factor and restricts PHP performance. Obviously, Go is the champion here, followed by Java and Node, and finally PHP.

in conclusion
In summary, it is clear that as the language evolves, solutions for large applications that handle large amounts of I / O also evolve.

To be fair, aside from the description in this article, PHP and Java do have non-blocking I / O implementations for Web applications. But these methods are not as common as the above methods, and need to consider using this method to maintain the accompanying operational overhead of the server. Not to mention that your code must be structured in a way that is compatible with these environments; "normal" PHP or Java web applications usually don't make major changes in such an environment.

As a comparison, if you only consider a few important factors that affect performance and ease of use, you can get:

Language Thread or Process Non-blocking I / O Ease of use
PHP Process No
Java thread available callback required
Node.js thread is callback required
Go thread (Goroutine) Yes No callback
Threads are generally more memory efficient than processes because they share the same memory space, while processes do not. Combining factors related to non-blocking I / O, when we move the list down to general startup, because it is related to improving I / O, we can see at least the same factors considered above. If I had to choose a champion in the above game, it would definitely be Go.

Even so, in practice, the environment you choose to build your application is closely related to your team ’s familiarity with the environment and the overall productivity that can be achieved. Therefore, it may not make sense for each team to just dive in and start developing web applications and services with Node or Go. In fact, finding the familiarity of developers or internal teams is often considered the main reason for not using different languages and / or different environments. In other words, in the past fifteen years, the times have changed dramatically.

I hope the above will help you understand more clearly what is happening behind the scenes and provide you with some ideas on how to deal with the scalability in the real world of the application. Happy input, happy output!

------------------------

The copyright of this translation belongs to the author dogstar.
This website adopts the CC BY-NC-SA 3.0 license agreement, please indicate Ai translation (itran.cc) for reprint.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More