Server-side I/O performance big battle: Node, PHP, Java, go

Source: Internet
Author: User
Tags php server switches cpu usage java web

Original : server-side I/O performance:node vs. PHP vs. Java vs. go
author : BRAD PEABODY
The Wild goose startled the cold

Absrtact: This article first briefly introduces I/O-related basic concepts, and then compares the I/O performance of node, PHP, Java, go, and gives the selection suggestions. The following is the translation.

Understanding the application's input/output (I/O) model can better understand the difference between the ideal and the actual situation when dealing with loads. Perhaps your application is small and does not need to support too much load, so there is less to consider. However, with the increase in application traffic load, using the wrong I/O model can cause very serious consequences.



In this article, we'll compare node, Java, go, and PHP with Apache, and discuss how different languages model I/O, the pros and cons of each model, and some basic performance metrics. If you're more concerned about your next Web application's I/O performance, this article will help you. I/O Basics: Quick Review

To understand the I/O-related factors, we must first understand these concepts at the operating system level. Although it is not likely to come up directly with too many concepts, but in the application of the operation process, whether directly or indirectly, always encounter them. The details are very important. system Call

First, let's take a look at the system call, as described below:

The application requests the operating system kernel to perform I/O operations on it.

System calls are programs that request the kernel to perform certain actions. The implementation details vary depending on the operating system, but the basic concepts are the same. When you perform a system call, there will be some control program specific instructions transferred to the kernel. In general, system calls are blocked, which means that the program waits until the kernel returns results.

The kernel performs low-level I/O operations on physical devices (disks, network cards, etc.) and responds to system calls. In the real world, the kernel may need to do many things to satisfy your request, including waiting for the device to be ready, updating its internal state, and so on, but as an application developer, you don't have to care about this, it's the kernel thing.



blocking calls vs . Non-blocking calls

As I said above, system calls are generally blocked. However, some calls are "non-blocking", which means that the kernel puts the request in a queue or buffer and then returns immediately without waiting for the actual I/O to occur. Therefore, it will only "block" a short time, but the queue takes a certain amount of time.

To illustrate this point, here are a few examples (Linux system calls):

Read () is a blocking call. We need to pass a file handle and buffer to hold the data to it, and return it when the data is saved to the buffer. It has the advantage of being elegant and simple.

Epoll_create (), Epoll_ctl (), and epoll_wait () can be used to create a set of handles to listen, add/Remove handles in this group, and block programs until the handle has any activity. These system calls allow you to efficiently control a large number of I/O operations with a single thread. These features, while useful, are quite complex to use.

It is important to understand the order of magnitude of the time lag here. If an optimized CPU kernel runs at 3GHz, it can perform 3 billion cycles per second (that is, 3 cycles per nanosecond). A non-blocking system call may take approximately 10 cycles, or several nanoseconds. Blocking calls that receive information from the network may take longer, such as 200 milliseconds (1/5 seconds). For example, a non-blocking call takes 20 nanoseconds, and a blocking call takes 200,000,000 nanoseconds. In this way, a process may have to wait 10 million cycles to block calls.



The kernel provides two ways to block I/O (read data from the network) and non-blocking I/O ("Tell me when there is new data on the network Connection"), and the two mechanisms block the call process for a completely different length of time. Dispatch

The third very important thing is what happens when there are a lot of threads or processes starting to clog up.

For us, there is not much difference between threads and processes. In reality, the most significant difference in performance is that because threads share the same memory, and each process has its own memory space, a single process tends to consume more memory. But when we talk about scheduling, it's actually about accomplishing a series of things, and everything needs to get a certain amount of execution time on the available CPU cores. If you have 8 cores to run 300 threads, then you have to fragment the time so that each thread gets its own time slice, each kernel runs for a short time, and then switches to the next thread. This is done through context switching, which allows the CPU to switch from one thread/process to the next thread/process.

This context switch has a certain cost, that is, it takes a certain amount of time. It may be less than 100 nanoseconds fast, but it is normal to spend 1000 nanoseconds or longer if the implementation details, processor speed/architecture, CPU cache, and other hardware and software are different.

The more threads (or processes) there are, the more times the context switches. If there are thousands of threads, and each thread consumes hundreds of nanoseconds of switching time, the system becomes very slow.

However, Non-blocking calls essentially tell the kernel to call me only when new data or events arrive on those connections. These non-blocking calls can effectively handle large I/O loads and reduce context switching.

It's worth noting that, while the examples are small, database access, external caching systems (memcache, and so on) and anything that requires I/O will eventually perform some type of I/O call, as the example would have done.

There are many factors affecting the choice of programming languages in a project, even if you only consider performance, there are many factors. However, if you are concerned that your program is primarily limited by I/O and that performance is an important factor in determining the success or failure of your project, then the following are some of the recommendations that you need to focus on. "Keep it Simple": PHP

Back in the 90 's, a lot of people wore converse shoes using Perl to write CGI scripts. Then, PHP came, a lot of people like it, it makes the production of dynamic Web pages easier.

The model used by PHP is very simple. Although not exactly the same, but the general principle of the PHP server is this:

The user's browser makes an HTTP request to enter the Apache Web server. Apache creates a separate process for each request and reuses the processes with some optimizations to minimize the actions that would otherwise be needed (the creation process is relatively slow).

Apache invokes PHP and tells it to run a. php file on the disk.

The PHP code starts executing and blocks I/O calls. The file_get_contents () you call in PHP is actually called by the read () system call and waits for the return result.

<?php//blocking File I/o$file_data = file_get_contents ('/path/to/file.dat ');

Blocking Network I/o$curl = Curl_init (' Http://example.com/example-microservice ');
$result = curl_exec ($curl);

Some more blocking network I/o$result = $db->query (' SELECT ID, data from examples order by ID DESC limit ');

?>

The integration diagram with the system is this:



Simple: Each request is a process. The I/O call is blocked. So the advantage. Simple and effective. Shortcomings. If there are 20,000 clients concurrency, the server will be paralyzed. This approach is difficult to extend, because the tools that the kernel provides for handling large amounts of I/O (epoll, etc.) are not fully utilized. To make things worse, running a separate process for each request often consumes a lot of system resources, especially memory, which is usually the first to run out.

* Note: At this point, Ruby is very similar to PHP. Multithreading: Java

So, Java is there. and Java has multiple threads built into the language, especially when creating threads.

Most Java Web servers start a new thread of execution for each request, and then invoke a function written by the developer in this thread.

Performing I/O in the Java servlet is often the case:

Publicvoiddoget (httpservletrequest request,
    HttpServletResponse response) throws Servletexception, IOException
{

    //blocking file I/o
    inputstream fileis = new FileInputStream ("/path/to/file");

    Blocking network I/o
    urlconnection urlconnection = (new URL ("Http://example.com/example-microservice")). OpenConnection ();
    InputStream Netis = Urlconnection.getinputstream ();

    Some more blocking network I/O
out.println ("...");
}

Since the Doget method above corresponds to a request and runs in its own thread rather than in a separate process that requires separate memory, we will create a separate thread. Each request gets a new thread and blocks various I/O operations inside the thread until the request processing is complete. Applications create a thread pool to minimize the cost of creating and destroying threads, but thousands of connections mean thousands of threads, which is not a good thing for dispatchers.

It is noteworthy that the 1.4 version of Java (again upgraded in version 1.7) increases the ability to non-blocking I/O calls. Although most applications do not use this feature, it is at least available. Some Java Web servers are trying to use this feature, but most of the already deployed Java applications still work according to the principles described above.



Java provides a lot of out-of-the-box functionality for I/O, but Java does not have a good solution when it comes to creating a large number of blocked threads performing a large number of I/O operations. non-blocking I/O as a top priority: Node

In the I/O aspect performance is relatively good, the comparison is popular with the user is node.js. Anyone with a simple understanding of node knows that it is "non-blocking" and can efficiently handle I/O. This is true in the general sense. But the details and ways of implementation are critical.

When you need to do something that involves I/O, you need to make a request and give a callback function, and node will call the function after it has finished processing the request.

Typical code that performs I/O operations in a request is as follows:

Http.createserver (function (request, response) {
    fs.readfile ('/path/to/file ', ' UTF8 ', function (err, data) {
        Response.End (data);});

As shown above, here are two callback functions. When the request starts, the first function is invoked, and the second function is invoked when the file data is available.

In this way, node can more efficiently handle I/O to these callback functions. There is a better example of a problem: invoking a database operation in node. First, your program starts calling the database operation and gives Node a callback function that node uses non-blocking to perform I/O operations separately, and then calls your callback function when the requested data is available. This mechanism for queuing I/O calls and letting node process I/O calls and then get a callback is called an "event loop." The mechanism is very good.



However, there is a problem with this model. At the bottom, the reason for this problem is related to the implementation of the V8 JavaScript engine (node using the Chrome JS engine), that is, the JS code you write is running in one thread. Please think about it. This means that, despite the use of efficient non-blocking technology to perform I/O, the JS code runs a CPU-based operation in a single thread operation, and each block of code blocks the next block of code from running. There is a common example: looping through database records, processing records in some way, and then outputting them to the client. The following code shows the rationale for this example:

var handler = function (Request, response) {

    connection.query (' SELECT ... ', function (err, rows) {if (err) {Throw err} ;

        for (var i = 0; i < rows.length; i++) {
            //does processing on each row
        }

        response.end (...);//write out the Results

    })

};

Although node processing I/O is highly efficient, the for loop in the example above uses CPU cycles in one main thread. This means that if you have 10,000 connections, this cycle may take up the entire application time. Each request must occupy a short period of time in the main thread.

The premise of this whole concept is that I/O operations are the slowest part, so even though serial processing is a last resort, it is important to deal with them effectively. This is true in some cases, but not in a rut.

Another point is that writing a bunch of nested callbacks is cumbersome, and some people think the code is ugly. It is not uncommon for a callback to embed four, five, or more layers in the node code.

It's time to weigh the pros and cons again. If your main performance problem is I/O, then this node model will help you. The downside, however, is that if you put CPU-intensive code in a function that handles HTTP requests, it can inadvertently make every connection congested. native without obstruction: go

Before I introduced go, I disclosed that I was a go fan. I have used go in a number of projects.

Let's see how it handles I/O. A key feature of the go language is that it contains its own scheduler. It does not correspond to an operating system thread for each thread of execution, but rather uses the concept of "goroutines". The Go runtime assigns an operating system thread to a goroutine and controls its execution or suspension. Each request for the go HTTP server is processed in a separate goroutine.

The scheduler works as follows:



In fact, in addition to the callback mechanism being built into the I/O call implementation and automatically interacting with the scheduler, the go runtime is doing something different than node. It is also not subject to the limitations of having all the processing code run on the same thread, and the go automatically maps your goroutine to the operating system threads it deems appropriate, based on the logic in its scheduler. So the code for it is this:

Func servehttp (w http. Responsewriter, R *http. Request) {

    //The underlying network call this is non-blocking
    rows, err: = db. Query ("Select ...")

    for _, Row: = Range rows {
        //does something with the rows,//per request in its own goroutine< c4/>}

    w.write (...)//Write The response, also non-blocking

}

As shown above, this basic code structure is simpler and also implements non-blocking I/O.

In most cases, this really is the "best of both worlds". non-blocking I/O can be used for all important things, but the code looks like it's blocked, so it's often easier to understand and maintain. The rest is the interaction between the Go scheduler and the OS Scheduler. It's not magic, and if you're building a large system, it's worth taking the time to understand how it works. At the same time, "out-of-the-box" features enable it to better work and expand.

Go may also have a number of drawbacks, but in general there is no obvious downside to the way it handles I/O. Performance Evaluation

For these different models of context switching, it is difficult to accurately timing. Of course, I can also say that it is not very useful to you. Here, I will perform a basic performance evaluation comparison of the HTTP services in these server environments. Keep in mind that end-to-end HTTP request/Response performance involves a number of factors.

I write a piece of code for each environment to read random bytes in 64k files, and then run N-time SHA-256 hashes on them (specify N in the URL's query string, for example .../test.php?n=100) and print the results in hexadecimal. I chose this because it makes it easy to run some ongoing I/O operations and can increase CPU usage in a controlled way.

First, let's look at some examples of low concurrency. Run 2000 iterations with 300 concurrent requests, each hash once (n=1), and the result is as follows:



The Times is the average number of milliseconds to complete all concurrent requests. The lower the better.

It is difficult to get a conclusion from this alone, but I personally think that in the case of such a large number of connections and computations, the results we see are more related to the execution of the language itself. Note that the scripting language is the slowest performer.

But what happens if we increase N to 1000, but still 300 concurrent requests, that is, to increase the number of iterations of the hash by 1000 times times (significantly higher CPU load) under the same load?



The Times is the average number of milliseconds to complete all concurrent requests. The lower the better.

Suddenly, node performance dropped significantly because CPU-intensive operations in each request were blocking each other. Interestingly, in this test, PHP's performance became better (as opposed to others) and even better than Java. (It's worth noting that in PHP, the implementation of SHA-256 is written in C, but the execution path takes more time in this loop, because we did this 1000 times hash iterator generation).

Now, let's try 5,000 concurrent connections (n=1). Unfortunately, for most environments, failure rates are not obvious. Let's take a look at the number of requests processed per second in this chart, the higher the better:



The number of requests processed per second, the higher the better.

This picture looks different from the above. I suspect that the request for new processes and memory in PHP + Apache seems to be a major factor in the performance of PHP at a higher number of connections. Obviously, go is the winner, followed by Java,node, and finally PHP.

While there are many factors involved in overall throughput, and there are significant differences between applications and applications, the more you understand the underlying principles and the trade-offs involved, the better your application will behave. Summary

To sum up, with the development of language, a large number of I/O large-scale applications to deal with the solution has evolved.

To be fair, PHP and Java have a non-blocking I/O implementation available for Web applications. But these implementations are not as extensive as the methods described above and need to be considered for maintenance overhead. Not to mention that the code for the application must be built in a way that suits this environment.

Let's compare several important factors that affect performance and ease of use:

language Threads and Processes non-blocking I/O Easy to use
Php Process Whether -
Java Thread Effective Need callback
Node.js Thread Is Need callback
Go Thread (goroutines) Is No callback required

Because threads share the same memory space, and processes do not, threads are generally much more efficient than process memory. In the above list, from the top down, I/O-related factors are better than one. So if I had to choose a winner in the comparison above, I would definitely go.

Even so, in practice, choosing an environment to build an application is closely related to how well your team is familiar with the environment and the overall productivity that the team can achieve. So, for a team, using node or go to develop Web applications and services may not be the best choice.

Hopefully these will help you get a better idea of what's going on at the bottom and give you some advice on how to deal with application scalability.

SDCC 2017 of the database online summit will be strong to attack, adhering to the content of dry goods (case) principle, invited from Alibaba, Tencent, Weibo, NetEase and many other enterprises database experts and university research scholars, around Oracle, MySQL, PostgreSQL, Redis, such as hot-spot database technology, from the core technology deep digging to the high availability of practical analysis, to create the essence of compression-sharing, extrapolate, speculative mutual, registration and more details can be viewed here.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.