Node, PHP, Java, and go service side I/O performance pk

Source: Internet
Author: User
Tags benchmark curl php server php example switches cpu usage java web

Guide Understanding the application's input/output (I/O) model means the difference between the planned processing load and the brutal actual usage scenario. If the application is small and does not serve a high load, it may have little impact. But as the application load goes up, using the wrong I/O model may cause you to stomp around and be scarred.

As with most scenarios where multiple solutions exist, the focus is not on which approach is better, but on understanding how to weigh it. Let's take a look at the I/O landscape and see what you can steal from it.

In this article, we will combine Apache to compare Node,java,go, and PHP, to discuss how these different languages model their I/O, the pros and cons of each of the models, and come up with some preliminary benchmark conclusions. If you care about the I/O performance of your next Web application, you'll find the right article. I/O Basics: Quick Review

To understand the factors that are closely related to I/O, you must first review the concepts at the bottom of the operating system. Although most of these concepts are not dealt with directly, you are dealing with them indirectly through the run-time environment of your application. And the key is the details. system Call

First, we have a system call that can be described as: Your program (in the "User zone", as they say) must have the operating system kernel perform I/O operations on its own. "System call" (Syscall) means that your program requires the kernel to do something. Different operating systems, the details of implementing system calls are different, but the basic concepts are the same. This will have some specific instructions for transferring control from your program to the kernel (like a function call but there are special sauce for dealing with this scenario). Generally speaking, system calls are blocked, meaning that your program needs to wait for the kernel to return to your code. The kernel performs low-level I/O operations on what we call physical devices (hard disks, network cards, etc.) and responds to system calls. In the real world, the kernel may need to do a lot of things to complete your request, including waiting for the device to be ready, updating its internal status, and so on, but as an application developer, you don't have to worry about that. Here's how the kernel works.

blocking calls vs . Non-blocking calls

Okay, I'm just saying that the system call is blocked, which is usually right. However, some calls are classified as "non-blocking," meaning that the kernel receives your request, puts it in a queue or somewhere buffered, and returns immediately without waiting for the actual I/O call. So it's just "blocking" a very short time, short to just row your request.

Here are some examples of (Linux system calls) that help explain this:-read () is a blocking call--you pass it a file handle and a buffer that holds the data you read, and then the call returns when the data is good. Note that this approach has the advantage of elegance and simplicity. -epoll_create (), Epoll_ctl (), and epoll_wait () These invocations are, let you create a set of handles for listening, add/remove handles from the group, and then block until there is activity. This allows you to effectively control a series of I/O operations through a single thread. This is great if you need these features, but as you can see, it's also quite complex to use.

It is important to understand the order of magnitude of the time-sharing difference here. If a CPU kernel runs at 3GHz, it performs 3 billion cycles per second (or 3 cycles per nanosecond) without optimization. Non-blocking system calls may require an order of magnitude of 10 nanoseconds to complete-or "relatively small nanoseconds". Blocking calls that are receiving information over the network may take more time-for example, 200 milliseconds (0.2 seconds). For example, if a non-blocking call consumes 20 nanoseconds, the blocking call consumes 200,000,000 nanoseconds. For blocking calls, your program waits 10 million times times more time.

The kernel provides two ways to block I/O (read and send data to me from a network connection) and non-blocking I/O (Tell me when these network connections have new data). And which mechanism is used, the blocking time of the corresponding call process is obviously different in length. Dispatch

The next third key is what to do when a large number of threads or processes start blocking.

For our purposes, there is not much difference between threads and processes. In fact, the most obvious implementation-related difference is that threads share the same memory, and each process has their own memory space, so that separate processes tend to occupy a large amount of memory. But when we talk about scheduling, it boils down to a list of events (thread and process-like), where each event requires an execution time on a valid CPU kernel. If you have 300 threads running and running on 8 cores, you have to run a short time through each kernel and then switch to the next thread, dividing the time so that each thread can get its time-sharing. This is done through context switching, allowing the CPU to switch from one of the running threads/processes to the next.

These context switches have a certain cost-they consume some time. It may be less than 100 nanoseconds fast, but it is not uncommon to consume 1000 nanoseconds or longer, depending on the details of the implementation, processor speed/architecture, CPU cache, and so on.

The more threads (or processes), the more context switches. When we talk about thousands of threads and each time it takes hundreds of nanoseconds, the speed becomes very slow.

However, Non-blocking calls are essentially telling the kernel "To call me when you have some new data or if any of these connections have an event." These non-blocking calls are designed to efficiently handle a large number of I/O loads and to reduce context switching.

Have you been reading this article so far? Because now comes the interesting part: Let's take a look at how some fluent languages use these tools and make some conclusions about trade-offs between usability and performance ... and other interesting reviews.

Please note that although the examples presented in this article are trivial (and not complete, but only show the relevant parts of the code), database access, the external caching system (memcache, and so on), and anything that requires I/O, end up with some I/O operations behind it, This has the same effect as the example shown. Similarly, for scenarios where I/O is described as "blocked" (Php,java), the Read and write of HTTP requests and responses are blocking calls: again, I/O and the attendant performance issues that are hidden in the system need to be considered.

There are many factors to consider in choosing a programming language for a project. When you think only about performance, there are even more factors to consider. However, if you are concerned that the program is primarily limited to I/O, if I/O performance is critical to your project, that's all you need to know. "Keep It Simple" method: PHP.

Back in the 90 's, a lot of people wore the converse shoes and wrote CGI scripts in Perl. Then there's PHP, which many people like to use, making it easier to make dynamic Web pages.

PHP uses a fairly simple model. Although there are some changes, but basically the PHP server looks like this:

HTTP requests come from the user's browser and access your Apache Web server. Apache creates a separate process for each request, reusing them with some optimizations to minimize the number of times it needs to be executed (the creation process is relatively slow). Apache invokes PHP and tells it to run the corresponding. php file on disk. The PHP code executes and does some blocking I/O calls. If you call file_get_contents () in PHP, it triggers the read () system call and waits for the result to return.

Of course, the actual code is simply embedded in your page, and the operation is blocked:


//Blocked file I/o
$file _data = file_get_contents ('/path/to/file.dat ');

Blocked network I/o
$curl = curl_init (' Http:// ');
$result = curl_exec ($curl);

More blocked network I/o
$result = $db->query (' SELECT ID, data from examples order by ID DESC limit ');

? >

about how it integrates with the system, just like this:

Quite simple: a request, a process. I/O is blocked. What are the advantages? Simple, feasible. What is the disadvantage? Connect with 20,000 clients at the same time, your server is dead. This approach does not extend well because the kernel provides tools for handling large capacity I/O (epoll, etc.) that are not in use. To make matters worse, running a separate process for each request often uses a lot of system resources, especially memory, which is usually the first thing to encounter in such a scenario.

Note: Ruby uses a very similar approach to PHP, which we can consider to be the same in a broad and common way. Multi-Threading approach: Java

So when you bought your first domain name, Java came in, and after a sentence casually said "dot com" is cool. Java has a built-in language for multithreading (especially when creating), which is great.

Most Java Web servers start a new thread of execution for each incoming request, and then eventually invoke the function you wrote as an application developer in that thread.

Performing I/O operations in the Java servlet often looks like this:

public void doget (HttpServletRequest request,  
    HttpServletResponse response) throws Servletexception, IOException

    //blocked file I/o
    inputstream fileis = new FileInputStream ("/path/to/file");

    Blocked network I/o
    urlconnection urlconnection = (new URL ("Http://")). OpenConnection () ;
    InputStream Netis = Urlconnection.getinputstream ();

    More blocked network I/O
    out.println ("...");

Because the Doget method above corresponds to a request and runs on its own thread, rather than each request corresponding to a separate process with its own dedicated memory, we have a separate thread. There are some good benefits, such as sharing state between threads, sharing cached data, and so on, because they can access each other's memory, but how it interacts with scheduling is still almost exactly the same as what was done in the previous PHP example. Each request produces a new thread, and the various I/O operations in this thread block until the request is fully processed. To minimize the cost of creating and destroying them, threads are pooled, but still, thousands of connections mean thousands of threads, which is bad for dispatchers.

An important milestone is the ability to perform non-blocking I/O calls in the Java 1.4 release (and again, the 1.7 version that is dramatically upgraded). Most applications, Web sites, and other programs do not use it, but at least it is available. Some Java Web servers try to exploit this in a variety of ways; However, most of the already deployed Java applications still work as described above.

Java has taken us a step further, of course there are some good "out-of-the-box" features for I/O, but it still doesn't really solve the problem: what to do when you have a critical I/O binding application that is being dragged by thousands of blocking threads to the ground. non-blocking I/o:node as a first-class citizen

When it comes to better I/O, Node.js is definitely the new favorite. Anyone who has ever had the simplest knowledge of node is told that it is "non-blocking" and that it handles I/O effectively. In the general sense, this is correct. But the devil hides in the details, and when it comes to performance, the way this sorcery is achieved is crucial.

Essentially, the paradigm of the node implementation is not basically saying "write code here to process requests," but instead turn to "write code here to start processing requests." Each time you need to do something that involves I/O, make a request or provide a callback function that node will call when it is finished.

A typical node code for I/O operations in the request, as follows:

Http.createserver (function (request, response) {  
    fs.readfile ('/path/to/file ', ' UTF8 ', function (err, data) {
        Response.End (data);});

As you can see, here are two callback functions. The first is invoked at the start of the request, and the second is invoked when the file data is available.

This basically gives the node a chance to effectively handle I/O between these callback functions. A more relevant scenario is to make database calls in node, but I don't want to list this annoying example because it's the exact same principle: Start a database call and provide a callback function to node, which performs I/O operations by using non-blocking calls. Then call the callback function when the data you requested is available. This I/O call queue, the mechanism for node to process, and then the callback function is called the "event loop." It works very well.

However, there is a hurdle in the model. Behind the scenes, the reason is more about how to implement the JavaScript V8 engine (Chrome's JS engine, for node) 1, and not anything else. The JS code you write is all running in one thread. Think about it. This means that when I/O is performed using effective non-blocking technology, JS, which is performing CPU binding operations, can be running in a single thread, blocking the next each block of code. A common example is the circular database records, which are processed in some way before being exported to the client. Here's an example that shows how it works:

var handler = function (Request, response) {

    connection.query (' SELECT ... ', function (err, rows) {

        if (err) {throw ERR};

        for (var i = 0; i < rows.length i++) {
            //to process each row of records

        response.end (...);//Output



Although node does handle I/O effectively, the for loop in the example above uses the CPU cycles in your main thread. This means that if you have 10,000 connections, the loop might make your entire application slow as a snail, depending on how long each cycle takes. Each request must be shared over a period of time in the main thread, one after another.

The premise of this overall concept is that I/O operations are the slowest part, so it is most important to effectively handle these operations, even if it means serial processing. This is true in some cases, but not all right.

Another point is that while this is just an opinion, writing a bunch of nested callbacks can be quite annoying, and some people think it makes the code clearly unsigned. In the depths of the node code, it is not uncommon to see nested four layers, nested five layers, or even more levels of nesting.

We are back to the balance again. If your main performance problem is I/O, then the node model works well. However, its Achilles heel, which is a fatal weakness from Greek mythology, is that if you are not careful, you may be able to handle HTTP requests in a function and place CPU-intensive code, and finally make each connection as slow as a snail. Real non-blocking: go

Before going into the Go section, I should disclose that I am a went fan. I've used go in a number of projects, an open supporter of its productivity advantages, and I've seen them in my work when I'm using them.

In other words, let's look at how it handles I/O. A key feature of the go language is that it contains its own scheduler. Not every thread's execution corresponds to a single OS thread, and go uses the concept of "goroutines". The Go runtime can assign a goroutine to an OS thread and make it execute, or suspend it without associating with an OS thread, depending on what Goroutine does. Each request from the HTTP server for GO is processed in a separate goroutine.

A schematic diagram of how this scheduler works, as follows:

This is done through the points in the Go runtime, by making the I/O call to the request write/read/connect/etc, allowing the current goroutine to go to sleep and then using the information to wake the Goroutine when further action can be taken.

In fact, in addition to the callback mechanism built into the I/O call implementation and automatically interacting with the scheduler, the go runtime does not differ much from what node does. It is also not subject to the restriction that all the handler code must be running on the same thread, and go will automatically map goroutine to the OS thread it deems appropriate, based on the logic of its scheduler. The last code looks like this:

Func servehttp (w http. Responsewriter, R *http. Request) {

    //The underlying network call is non-blocking
    rows, err: = db. Query ("Select ...")

    for _, Row: = Range rows {
        //Process rows
        ///each request in its own goroutine

    w.write (...) Output response result, also non-blocking


As you see above, our basic code structure is a simpler way to implement non-blocking I/O on the back.

In most cases, this is ultimately "the best of both worlds". Non-blocking I/O is used for all the important things, but your code looks like a block, so it's often easier to understand and maintain. The interaction between the Go scheduler and the OS Scheduler handles the rest. This is not complete magic, and if you build a large system, it is worthwhile to spend more time understanding the details of how it works; But at the same time, the "out-of-the-box" environment works well and expands well.

Go may have its drawbacks, but generally, it does not handle I/O in the same way. lies, cursed lies and benchmarks

It is difficult to make exact timing for the context switches of these various modes. It can also be said that this does not have much effect on you. So instead, I'll give you some benchmark for comparing the HTTP server performance of these server environments. Keep in mind that the performance of an End-to-end HTTP request/response path is related to a number of factors, and here I put together the data provided is just some sample so that basic comparisons can be made.

For each of these environments, I wrote the appropriate code to read a 64k sized file in random bytes, run a SHA-256 hash n times (n specify in the URL's query string, for example .../test.php?n=100), and print the resulting hash in 16. I chose this example because it is a very easy way to run the same benchmark test with some consistent I/O and a controlled way to increase CPU usage.

For more details on the use of the environment, please refer to these benchmark points.

First, let's look at some examples of low concurrency. Run 2000 iterations, 300 requests, and hash only once per request (N = 1), you can get:

The time is the average number of milliseconds to complete the request in all concurrent requests. The lower the better.

It's hard to draw a conclusion from a chart, but for me it seems to be related to connection and computational volume, and we see time more related to the general execution of the language itself, so much more to I/O. Note that languages that are considered "scripting language" (input at will, dynamically interpreted) are the slowest to perform.

But if you increase N to 1000 and still 300 requests, what happens--the same load, but the hash iteration is 100 times times the previous one (significantly increasing the CPU load):

The time is the average number of milliseconds to complete the request in all concurrent requests. The lower the better.

All of a sudden, node performance dropped dramatically, because CPU-intensive operations in each request blocked each other. Interestingly, PHP performs much better in this test (compared to other languages) and defeats Java. (It's worth noting that in PHP, the SHA-256 implementation is written in C, and the execution path takes more time in this loop, because this time we did 1000 hash iterator generations).

Now let's try 5,000 concurrent connections (and n = 1)--or close to this. Unfortunately, for most of these environments, failure rates are not obvious. For this chart, we will focus on the total number of requests per second. The higher the better:

The total number of requests per second. The higher the better.

This picture looks very different. This is a guess, but it looks like for a high connection, the overhead of each connection is related to the creation of a new process, while the additional memory associated with PHP + Apache seems to be a major factor and restricts PHP performance. Obviously, go is the winner here, followed by Java and node, and finally PHP. Conclusions

To sum up, it is clear that as language evolves, solutions for large applications that handle large amounts of I/O are evolving.

For the sake of fairness, aside from the description of this article, PHP and Java do have an implementation of non-blocking I/O that can be used for Web applications. However, these methods are not as common as the above methods, and you need to consider using this method to maintain the attendant operational overhead of the server. Not to mention that your code must be structured in a way that is compatible with these environments; "Normal" PHP or Java Web applications typically do not make significant changes in such environments.

As a comparison, if you consider only a few important factors that affect performance and ease of use, you can get:

language thread or process non-blocking I/O Ease of
Php Process Whether
Java Thread Available Need callback
Node.js Thread Is Need callback
Go Thread (goroutine) Is No callback required

Threads typically have a higher memory efficiency than processes because they share the same memory space and the process does not. Combined with non-blocking I/o factors, when we move the list down to a normal startup, because it is related to improving I/O, you can see at least the same considerations as above. If I had to choose a championship in the game above, it would definitely go.

Even so, in practice, choosing an environment to build an application is closely related to how well your team is familiar with the environment and the overall productivity that can be achieved. So it might not make sense for each team to just stick in and start developing Web applications and services with node or go. In fact, the familiarity of looking for a developer or in-house team is often considered a major reason for not using a different language and/or different environments. That is to say, the times have changed dramatically over the past 15 years.

Hopefully, this will help you get a clearer picture of what's happening behind the scenes and give you some ideas on how to handle the scalability of your application's real world. Happy input, happy output.

This article is reproduced from:

Free to provide the latest Linux technology tutorials Books, for the open source technology enthusiasts to do more and better:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.