Node, PHP, Java, and Go server I/O performance pk

Source: Internet
Author: User
Tags benchmark php server php example
This is a creation in Article, where the information may have evolved or changed.

Just as most scenarios where there are multiple solutions, the focus is not on which one is better, but on understanding how to weigh it. Let's take a look at the I/O landscape and see what you can steal from it.

In this article, we will combine Apache to compare Node,java,go, and PHP, to discuss how these different languages model their I/O, the pros and cons of the various models, and draw conclusions from some preliminary benchmarks. If you care about the I/O performance of your next Web application, you'll find the right article.

I/O Fundamentals: A quick review

To understand the factors that are closely related to I/O, you must first review the concepts underlying the operating system. Although most of these concepts are not processed directly, you have been dealing with them indirectly through the application's runtime environment. And the key is the details.

System calls

First, we have a system call, which can be described as:

    • Your programs (in the "User area", as they say) must have the operating system kernel perform I/O operations on its own.
    • A "system call" (Syscall) means that your program requires the kernel to do something. Different operating systems, the details of implementing system calls differ, but the basic concepts are the same. This will have some specific instructions to transfer control from your program to the kernel (similar to a function call but with some special sauce for dealing with this scenario). In general, system calls are blocked, which means your program needs to wait for the kernel to return to your code.
    • The kernel performs the underlying I/O operations on what we call a physical device (hard disk, network card, etc.) and responds to system calls. In the real world, the kernel may need to do a lot to complete your request, including waiting for the device to be ready, updating its internal state, and so on, but as an application developer, you don't have to worry about that. The following are the kernel's working conditions.

Blocking calls and non-blocking calls

Okay, I just said that the system call is blocked, which is usually right. However, some calls are categorized as "non-blocking", meaning that the kernel receives your request, puts it in a queue or buffer somewhere, and then returns without waiting for the actual I/O call. So it just "blocks" for a very short period of time, short to just into row your request.

Here are some examples of (Linux system calls) that help to explain clearly:-read () is a blocking call-you pass it a file handle and a buffer that holds the data read, and then the call returns when the data is good. Note that this approach has the advantages of elegance and simplicity. -epoll_create (), Epoll_ctl (), and epoll_wait () these calls are, respectively, let you create a set of handles for listening, add/remove handles from that group, and then block until there is activity. This allows you to effectively control a series of I/O operations through a thread. This is great if you need these features, but as you can see, it's also quite complicated to use.

It is important to understand the magnitude of the tick difference here. If a CPU core is running at 3GHz, it performs 3 billion cycles per second (or 3 cycles per nanosecond) without optimization. Non-blocking system calls may take up to 10 nanoseconds to complete-or "relatively small nanoseconds"-for a period of magnitude. Blocking calls that are receiving information over the network may take more time-for example, 200 milliseconds (0.2 seconds). For example, assuming a non-blocking call consumes 20 nanoseconds, the blocking call consumes 200,000,000 nanoseconds. For blocking calls, your program waits more than 10 million times times.

The kernel provides both methods of blocking I/O ("Let me know from a network connection") and non-blocking I/O ("Tell me when these networks are connected with new data"). However, the blocking time of the corresponding calling process is significantly different from the mechanism used.

Scheduling

The third key thing to do next is what to do when a large number of threads or processes start blocking.

For our purposes, there is not much difference between threads and processes. In fact, the most obvious implementation-related difference is that threads share the same memory, and each process has their own memory space, so that the detached process often occupies a lot of memory. But when we talk about scheduling, it boils down to a list of events (threads and processes are similar), where each event requires an execution time on a valid CPU core. If you have 300 threads running and running on 8 cores, then you have to run a short period of time on each core and switch to the next thread, dividing the time so that each thread can get its ticks. This is achieved through context switching, allowing the CPU to switch from one running thread/process to the next.

These context switches have a certain cost-they consume some time. It may be less than 100 nanoseconds in the fast time, but it is not uncommon to consume 1000 nanoseconds or even longer depending on the implementation details, processor speed/architecture, CPU cache, etc.

The greater the number of threads (or processes), the more context switches. When we talk about thousands of threads, and each switch requires hundreds of nanoseconds, the speed becomes very slow.

However, non-blocking calls essentially tell the kernel to "call me when you have some new data or any one of these connections has an event." These non-blocking calls are designed to efficiently handle large amounts of I/O loads and reduce context switching.

Have you been reading this article so far? Because now comes the interesting part: Let's look at some fluent language how to use these tools, and make some conclusions about the tradeoff between usability and performance ... and other interesting reviews.

Note that although the examples shown in this article are trivial (and are incomplete, just show the code for the relevant section), database access, external cache systems (Memcache, and so on) and anything that requires I/O are ended with some I/O operations behind them, These have the same effect as the example shown. Similarly, for scenarios where I/O is described as "blocking" (Php,java), the read and write of the HTTP request and response is itself a blocking call: again, more I/O hidden in the system and its accompanying performance issues need to be considered.

There are a number of factors to consider in choosing a programming language for your project. When you consider performance only, there are even more factors to consider. However, if you are concerned that the program is primarily limited to I/O, if I/O performance is critical to your project, these are all you need to know. "Keep It Simple" method: PHP.

Back in the 90 's, many people wore converse shoes and wrote CGI scripts in Perl. Then came PHP, a lot of people like to use it, it makes it easier to make dynamic Web pages.

PHP uses a fairly simple model. Although there are some changes, basically the PHP server looks like:

The HTTP request comes from the user's browser and accesses your Apache Web server. Apache creates a separate process for each request, reusing them with some optimizations to minimize the number of times it needs to be executed (the creation process is relatively slow). Apache calls PHP and tells it to run the appropriate. php file on disk. The PHP code executes and makes some blocking I/O calls. If File_get_contents () is called in PHP, it will trigger the read () system call and wait for the result to return.

Of course, the actual code is simply embedded in your page, and the operation is blocked:

<?php//blocked File I/o$file_data = file_get_contents ('/path/to/file.dat ');//blocked Network I/o$curl = Curl_init (' http:// Example.com/example-microservice '); $result = Curl_exec ($curl);//More congested network I/o$result = $db->query (' SELECT ID, data From examples ORDER by ID DESC limit 100 ');? >

about how it integrates with the system, like this:

Quite simple: a request, a process. I/O is blocked. What are the advantages? Simple and feasible. What is the disadvantage? When you connect to 20,000 clients, your server hangs up. This approach does not scale well because the tools provided by the kernel to handle large-capacity I/O (epoll, etc.) are not being used. Worse, running a separate process for each request tends to use a lot of system resources, especially memory, which is usually the first thing to encounter in such a scenario.

Note: Ruby uses a very similar approach to PHP, and in a wide and pervasive way, we can see it as the same.

Multi-threaded way: Java

So just when you bought your first domain name, Java came, and after a sentence casually say "dot com" is very cool. And Java has the language built-in multithreading (especially at the time of Creation), which is great.

Most Java Web server servers start a new thread of execution for each incoming request, and then eventually invoke the function that you wrote as an application developer in that thread.

Performing I/O operations in a Java servlet often looks like this:

public void doget (HttpServletRequest request,      HttpServletResponse response) throws Servletexception, ioexception{    //Blocked file I/O    InputStream Fileis = new FileInputStream ("/path/to/file");    Blocked network I/o    urlconnection urlconnection = (new URL ("Http://example.com/example-microservice")). OpenConnection () ;    InputStream Netis = Urlconnection.getinputstream ();    More blocked network I/O    out.println ("...");}

Because the Doget method above corresponds to a request and runs in its own thread, instead of each request that corresponds to a separate process that requires its own dedicated memory, we have a separate thread. This has some good advantages, such as the ability to share state between threads, share cached data, and so on, because they can access each other's memory, but how it interacts with the schedule is still almost identical to what was done in the previous PHP example. Each request generates a new thread, and the various I/O operations in the thread are blocked until the request is fully processed. To minimize the cost of creating and destroying them, threads are pooled, but still, thousands of connections mean thousands of threads, which is bad for the scheduler.

An important milestone is the ability to perform non-blocking I/O calls in the Java 1.4 release (and the 1.7 version that is significantly upgraded again). Most applications, websites and other programs do not use it, but at least it is available. Some Java Web server attempts to exploit this in a variety of ways; However, the vast majority of deployed Java applications still work as described above.

Java has taken us a step further and, of course, has some good "out-of-the-box" functionality for I/O, but it still doesn't really solve the problem: what to do when you have a heavily I/O bound application that is being dragged down to the ground by thousands of of blocked threads.

Non-blocking I/o:node as a class citizen

When it comes to better I/O, node. js is undoubtedly the favorite. Anyone who has ever had the simplest knowledge of node has been told that it is "non-blocking" and that it can handle I/O effectively. In a general sense, this is true. But the devil hides in the details, and when it comes to performance, the way this sorcery is implemented is crucial.

In essence, the paradigm of node implementation is not basically saying "write code here to handle requests", but instead turn to "write code here to start processing requests". Every time you need to do something that involves I/O, make a request or provide a callback function that node calls when it is finished.

The typical node code for I/O operations in the request is as follows:

Http.createserver (function (request, response) {      fs.readfile ('/path/to/file ', ' UTF8 ', function (err, data) {        Response.End (data);})    ;

As you can see, there are two callback functions. The first one is called when the request begins, and the second is called when the file data is available.

This basically gives Node a chance to effectively handle I/O between these callback functions. A more relevant scenario is making a database call in node, but I don't want to list this annoying example because it's the exact same principle: Start a database call and provide a callback function to node, which uses non-blocking calls to perform I/O operations individually. Then call the callback function when the data you requested is available. The mechanism for this I/O call queue to let node handle and then get the callback function is called the "event loop". It works very well.

However, there is a level in this model. Behind the scenes, the reason is more about how to implement the JavaScript V8 engine (the Chrome JS Engine for node) 1, not anything else. The JS code you write is all running in one thread. Think for a second. This means that when I/O is performed using a valid non-blocking technique, the JS that is in the process of CPU binding can run in a single thread and each block of code blocks the next. A common example is looping the database records and processing them in some way before outputting them to the client. Here's an example that shows how it works:

var handler = function (Request, response) {    connection.query (' SELECT ... ', function (err, rows) {        if (err) {throw ERR};        for (var i = 0; i < rows.length; i++) {            //processing of each row of records        }        response.end (...);//Output    })};

Although node can actually handle I/O effectively, the for loop in the example above uses the CPU cycles in your main thread. This means that if you have 10,000 connections, the loop might make your entire application slow as a snail, depending on how long each cycle takes. Each request must be shared in the main thread for a period of time, one after the other.

The premise of this holistic concept is that I/O operations are the slowest part, so the most important thing is to handle these operations effectively, even if it means serial processing. This is true in some cases, but not all right.

Another point is that while this is just an opinion, writing a bunch of nested callbacks can be quite annoying, and some people think it makes the code obviously out of the loop. In the depths of the node code, it is not uncommon to see nested four layers, nested five layers, or even more layers of nesting.

We're back on the balance again. If your main performance problem is I/O, then the node model works well. However, its Achilles heel (translator note: From Greek mythology, which indicates a deadly weakness) is that if you are not careful, you may be able to handle HTTP requests in a function and place CPU-intensive code, and finally make each connection slow as a snail.

True non-blocking: Go

I should have disclosed that I am a go fan before going to the go chapter. I have used go in many projects, a public supporter of its productivity advantage, and I saw them in my work when I was using it.

In other words, let's look at how it handles I/O. A key feature of the go language is that it contains its own scheduler. Not every thread's execution corresponds to a single OS thread, and go uses the concept of "goroutines". Depending on what Goroutine does, the go runtime can assign a goroutine to an OS thread and make it execute, or hang it without being associated with an OS thread. Each request from the go HTTP server is processed in a separate goroutine.

This scheduler works as follows:

This is achieved by various points at the go runtime, by implementing an I/O call to the request write/read/connect/etc, so that the current goroutine goes to sleep, and when further action can be taken, the goroutine is woken up again with information.

In fact, in addition to the callback mechanism built into the implementation of the I/O call and automatically interacting with the scheduler, the go runtime does not differ much from what node does. It is also not subject to the restriction that all handler code must be run in the same thread, and go will automatically map the goroutine to the OS thread it deems appropriate based on the logic of its scheduler. The final code looks like this:

Func servehttp (w http. Responsewriter, R *http. Request) {    //here the underlying network call is non-blocking    rows, err: = db. Query ("Select ...")    for _, Row: = Range rows {        //processing rows//        each request in its own goroutine    }    w.write (...) Output response result, also non-blocking}

As you see above, our basic code structure is like a simpler way, and it implements non-blocking I/O behind the scenes.

In most cases, this is ultimately "the best of both worlds". Non-blocking I/O is used for all important things, but your code looks like blocking, so it's often easier to understand and maintain. The interaction between the Go scheduler and the OS Scheduler deals with the rest of the section. This is not a complete magic, if you are building a large system, then it is worthwhile to spend more time understanding the details of how it works. But at the same time, "out-of-the-box" environments can work well and scale well.

Go may have its drawbacks, but generally speaking, the way it handles I/O is not in it.

Lies, curses, lies, and benchmarks.

It is difficult to make accurate timing of the context switches for these various modes. It can also be said that it does not make much difference to you. So instead, I'll give some benchmarks for comparing the performance of HTTP servers in these server environments. Keep in mind that the performance of the entire end-to-end HTTP request/response path is related to a number of factors, and the data that I put together here are just a few samples so that a basic comparison can be made.

For each of these environments, I wrote the appropriate code to read a 64k size file in random bytes, run a SHA-256 hash n times (n specified in the URL's query string, for example .../test.php?n=100), and print the resulting hash in 16 binary form. I chose this example because using some consistent I/O and a controlled way of increasing CPU usage to run the same benchmark is a very simple way.

For more details, please refer to these benchmark points for environmental use.

First, consider some examples of low concurrency. Run 2000 iterations, 300 requests concurrently, and each request is hashed only once (N = 1), you can get:

The time is the average number of milliseconds to complete the request in all concurrent requests. The lower the better.

It's hard to draw a conclusion from a chart, but for me it seems to be related to connections and computational quantities, and we see that time is more about the general execution of the language itself, so more I/O. Note that the language that is considered "scripting language" (input random, dynamic interpretation) performs the slowest.

But if you add N to 1000 and still have 300 requests, what happens-the same load, but the hash iteration is 100 times times the previous one (significantly increasing the CPU load):

The time is the average number of milliseconds to complete the request in all concurrent requests. The lower the better.

All of a sudden, node performance has dropped significantly, because CPU-intensive operations in each request are blocking each other. Interestingly, in this test, PHP performs much better (relative to other languages) and defeats Java. (It's worth noting that in PHP, the SHA-256 implementation is written in C, and the execution path takes more time in this loop because we've done it 1000 times in the hash iterator generation).

Now let's try 5,000 concurrent connections (and n = 1)--or close to this. Unfortunately, for most of these environments, failure rates are not obvious. For this chart, we will focus on the total number of requests per second. The higher the better:

The total number of requests per second. The higher the better.

This picture looks very different. This is a guess, but it looks like for high connections, the overhead of each connection is related to the creation of a new process, and the extra memory associated with PHP + Apache seems to be the main factor and restricts PHP performance. Obviously, go is the winner here, followed by Java and node, and finally PHP.

Conclusion

In summary, it is clear that with the evolution of language, solutions to large-scale applications dealing with large numbers of I/O are evolving.

For the sake of fairness, aside from the description of this article, PHP and Java do have a non-blocking I/O implementation that can be used for Web applications. However, these methods are not as common as the methods described above, and it is necessary to consider using this method to maintain the attendant operational overhead of the server. Not to mention that your code has to be structured in a way that adapts to these environments; "Normal" PHP or Java Web applications typically do not make significant changes in such environments.

As a comparison, if you consider only a few important factors that affect performance and ease of use, you can get:

language thread or process non-blocking I/O Ease of Use
Php Process Whether
Java Thread Available Callback Required
node. js Thread Is Callback Required
Go Thread (goroutine) Is No callback required

Threads are typically more memory efficient than processes because they share the same memory space and processes do not. In conjunction with non-blocking I/O related factors, when we move the list down to the general startup, as it relates to improved I/O, you can see at least as much as the factors considered above. If I had to pick a winner in the game above, it would be go.

Even so, in practice, the environment in which you choose to build your application is closely related to your team's familiarity with the environment and the overall productivity that can be achieved. Therefore, it may not make sense for each team to simply plunge in and start developing Web applications and services with node or go. In fact, finding the familiarity of a developer or internal team is often considered to be the main reason for not using different languages and/or different environments. In other words, the times have changed dramatically over the past 15 years.

Hopefully, this will help you better understand what's going on behind the scenes and give you some ideas on how to handle the scalability of your application's real world. Happy input, happy output!

Originally from: http://www.codeceo.com/article/server-i-o-performance-competition.html

This address: http://www.linuxprobe.com/server-io-pk.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.