Last update time: 2015-10-23
Even if you understand how the non-clogging features of Java NIO work (Selector,channel,buffer, etc.), designing a non-clogging server is rare. Non-clogging IO contains several challenges relative to clogging IO. This non-clogging server tutorial will discuss the main challenges of non-clogging servers and describe some potential solutions for them.
About designing a non-clogging server to find some good information is rare. So the solution provided in this tutorial is based on my own work experience and ideas. If you have some alternative or better ideas, I would be happy to hear about them. You can write a comment under this article or email me, or find me on Twitter.
The ideas described in this tutorial are designed around the Java NIO. However, I believe that these ideas can also be reused in other languages as long as they have something like the concept of selector. As far as I know, some concepts are provided at the bottom of the operating system so that there are some good opportunities that you can use in other languages as well.
Non-clogging server-GitHub resource Library
I have created some simple concepts for these ideas to be validated in this tutorial, and in order for you to see that I have put them on the GitHub repository. Here is the GitHub repository address:
Https://github.com/jjenkov/java-nio-server
Non-clogging IO pipeline
A non-clogging IO pipeline is a series of components that handle non-clogging IO. This includes both read IO and write IO in non-clogging mode. Here is a simplified diagram of the non-clogging IO pipeline:
A component uses a selector to check when a channel has data to read. The component then reads the data in and builds some output based on the input. This output is once again written to a channel.
A non-clogging IO pipeline does not require both reading data and writing data. Some pipelines may just read data, some may just write data.
The diagram above simply shows a separate component. A non-clogging IO pipeline may not just have a component to process incoming data. The length of a non-clogging IO pipe depends on what the pipeline needs to do.
A non-clogging IO pipeline may also read data from multiple channels at the same time. For example, read data from multiple Socketchannel.
The control flow shown above is also simplified. It is the component from which the channel begins to read data through selector. It is not the channel that pushes data into the selector, nor does it go from the channel to the component, although that is what the above diagram suggests.
Non-clogging IO VS blocked IO pipeline
The biggest difference between a non-clogging IO and a clogged IO pipeline is how the data is read from a potential channel (socket or file).
The IO pipeline typically reads data from some streams (from a socket or file) and divides the data into coherent information. This is similar to using a word breaker for analysis to break a data stream into a symbol. Instead, you break a stream of data into greater information. I'm going to call the component to break this stream into the message for a message reader. Here is a diagram of a message reader breaking a stream into a message:
A pipeline that blocks IO can use a InputStream interface, where only one byte can be read from a potential channel at a time, and in this place like the InputStream interface it is blocked until there is data to read. This result is implemented in a blocked message reader.
Using a clogged IO interface for a stream will simplify the implementation of many of the message reader. A blocked message reader does not have to deal with this scenario where no data is read from the stream, or just a scenario where some information is read from the stream, and the information parsing needs to be restarted at a later date.
Similarly, a blocked message Writer (a component that writes data into a stream) does not have to deal with only a portion of the information being written, and the information being written has to be recovered later.
Blockage of IO pipeline defects
When a blocked message reader is easily implemented, a separate thread is an unlucky flaw for each stream that needs to be separated. This is necessary because the IO interface of each stream is blocked to have data that can be read. That means that a single thread cannot attempt to read data from one stream, and if there is no data, read the data from another stream. As soon as a thread tries to read data from a stream, the thread is blocked until some data is actually readable.
If this IO pipeline is part of a server that has to handle many concurrent connections, then the server needs a thread for every active incoming connection. This may not be a problem if the server has hundreds of concurrent connections at any time. However, if the server has thousands of concurrent connections, this type of design is not a good measure. Each thread will be spent on their stack between 320K (32-bit JVM) and 1024K (64-bit JVM) memory. So that 1000000 threads will cost 1T of memory. And that is before the server in order to process any memory that is already used by the incoming message (for example, memory allocations used by the object during message processing).
To make the number of threads drop, many servers use such a design that the server keeps a thread pool (for example, 100) to each time one of the incoming connections to the read message. Incoming connections are kept inside a queue, and these threads process each of the connected messages that enter into the queue sequentially. The diagram of this design is as follows:
However, these connections, which need to come in this design, often send data reasonably. If the incoming connection may be inactive for a long time, then there are many inactive connections that may actually clog all threads in the thread pool. This means that the server response can become slow or even unresponsive.
Some servers try to mitigate this problem by having a number of flexible threads in the thread pool. For example, if the thread pool runs out of these threads, this thread pool will open more threads to handle the load. This solution means that there will be a number of slow connections to make the server unresponsive. However, keep in mind that there is still a limit here to control how many threads you can have. That, along with 1 million slow connections will not be a good measure.
Foundation of non-clogging IO piping design
A non-clogging IO pipeline can use a separate thread to read data from multiple streams. This will need to turn this into a non-clogging mode. When in non-clogging mode, when you try to read the data, a stream may return 0 or more bytes. If the stream has no data to read, it will return 0 bytes. When there is really some data in this stream that can be read, it returns bytes greater than 1.
To avoid checking the stream with 0 bytes to read, we use the Java NIO Selector. One or more Selectablechannel instances can be registered with selector. When you call the Select () or Selecnow () method, selector will only return you to the Selectablechannel instance, which actually has data to read. The graphic of this design is shown below:
Read part of the information
When we read a batch of data from a selectablechannel, we do not know whether the data block contains more data than the original information or less. A block of data may potentially only contain a subset of information (less than the original information), a complete message, or more than the original information, such as 1.5 times or 2.5 times times the information. A variety of information may be as follows:
There are two challenges when dealing with some of the information:
- Detects whether the information is complete in the data block.
- How to process some information until the remaining information arrives.
Detecting complete Information This message reader looks at the data in this block to see if it contains a complete message. If the data block contains one or more complete information, the information can be sent to the pipeline for processing. This process of searching for complete information will be repeated many times, so that the process is as fast as possible. Whenever there is a part of the information in the data block, itself or one or more of the complete information, that part of the information needs to be stored until the remaining information arrives from the channel. It is the responsibility of message reader to detect complete information and to store part of the information. To avoid mixing information data from different channel instances, we will use a message Reader for each channel. The design is as follows:
After retrieving a channel instance with data read from selector, the message reader is then associated with the channel read data and attempts to break it down into information. If the result is read in any complete message, the information can be passed to the read pipeline so that any component needs to process them.
A message reader is of course a protocol specific. A message reader needs to know the format of messages it is trying to read. If our server implementation can be reused in the protocol, it needs to be able to have a message reader implementation access-possibly in some way by receiving a message reader factory as a configuration parameter.
Storing part of the information
Now that we've built that, it's the responsibility of message reader to store part of the information until the complete information has arrived, and we need to point out how these pieces of information storage should be implemented.
Here are two things that design should be considered:
- We want to copy the message data as little as possible. Because the more copies, the poorer the performance.
- We want the entire information to be stored in a sequential sequence of bytes to make it easier to analyze the information.
one buffer per message readerObviously, some of the information needs to be stored in a buffer. The simple implementation is simply that there is a buffer within each of the message reader's internals. However, how big should that buffer be? It will need to be large enough to allow information to be stored for greater allowable. So that if the larger allowable information is 1M, then the internal buffers in each message reader will require at least 1M of space. When we reach thousands of connections, using 1M for each connection really doesn't work. 1000000 * 1MB is still 1TB memory. And what if the biggest message is 16MB? What about 128MB?
variable BuffersAnother option would be to implement a variable-size buffer that is used inside the message reader. A variable-size buffer will start to be small, and if this information is too large for this buffer, the buffer is expanded. In this way each connection will not need a buffer such as 1MB. Each connection simply takes as much memory as they need to hold on to the next message.
There are several ways to implement a variable-sized buffer. All of them have advantages and disadvantages so that we will discuss them in the next section.
Resizing by copy
The first way to implement a variable-size buffer is to start with a small buffer such as 4KB. If a message does not conform to this 4KB buffer, a larger buffer, such as 8KB, will be allocated, and the data from the 4KB buffer will be copied into the larger buffer.
The benefit of implementing a variable-size buffer by copying is that all data for a single message is kept together in a single contiguous byte array. This will make it easier to parse the information.
The disadvantage of implementing a variable-size buffer by copying is that more data will be copied for larger information.
To reduce the data copy, you can parse the size of the information that flows through your system to find some buffer sizes, which can reduce the number of copies. For example, you might see that most of the information is less than 4KB because they contain very small requests and responses. That means the first buffer size should be 4KB.
Then you may see if a message is larger than 4KB, it is often because it contains a file. You may then notice that most of the files that pass through your system are less than 128KB. Then you make sense that the size of the second buffer is 128KB.
Finally you may see that once a message exceeds 128KB, there is no real pattern to determine how big the information is so that the last buffer size should be the maximum information size.
With the size of these 3 buffers based on the size of the information flowing through your system, you will slightly lower some copies of the data. The information below 4KB will not be copied. For 1000000 concurrent connections, that would result in 1000000 * 4KB = 4GB, which is justified on most servers today (2015). Information between 4KB and 128KB will be copied once, and only 4KB of data will need to be copied into the 128KB buffer. The information will be copied two times between 128KB and the maximum information. For the first time, 4KB will be copied, and the second 128KB will be copied so that the larger information is a 132KB copy altogether. Assuming there's not that much information over 128KB, this may be acceptable.
Once a message has been fully executed, the allocated memory should be freed again. In that case, the next message from the same connection is received again, starting with the smallest buffer. This is needed to make sure that there is a connection between the existing connections that can be more efficiently shared. It is possible that not all connections will require large buffers at the same time.
I have a completed tutorial on how to implement such a memory buffer to support mutable arrays, address here: http://tutorials.jenkov.com/java-performance/resizable-array.html. This tutorial also contains a connection to the GitHub code repository, which shows the implementation of a work.
resizing by appending
Another way to adjust the size of a buffer is to make the buffer consist of multiple arrays. When you need to adjust the size of this buffer, you need to just allocate another byte array and write the data in it.
There are two ways to generate such a buffer. One way is to allocate a separate byte array, and keep a series of these byte arrays. Another way is to allocate a larger number of slices, a shared byte array, and then keep a series of such fragments assigned to this buffer. Personally, I feel this slice is a little bit better, but the distinction is small.
The benefit of creating a buffer by appending separate arrays or slices is that no data is required to be copied during writing. All data can be copied directly from the socket (Channel) into an array or a slice.
The disadvantage of generating a buffer in this way is that the data is not stored in a separate contiguous array. This makes information parsing more difficult, because the parser needs to look not only at the end of each individual array, but also at the end of all arrays. Because you need to find the end of a message in the data you're writing, it's not too easy to implement.
TLV encoded information
Some protocol information formats are encoded using a TLV format (type, length, value). That means that when a message arrives, the entire length of the message is stored at the beginning of the message. That means you'll immediately know how much memory is allocated for this entire message.
TLV encoding makes memory management simpler. You can immediately know how much memory to allocate to this information. There will not be a waste at the end of the buffer inside.
A bad thing about TLV coding is that you have allocated all the memory of this information before all the data for that information arrives. Several slow connections send large messages to allocate all the memory that you have available, making your server unresponsive.
One scenario for this problem is that you can use an information format that contains multiple TLV fields internally. Thus, the memory is assigned to each field, not the entire information, and the memory is allocated only when the field arrives. Similarly, a large field in a large message can have the same effect in your memory management.
Another option is to set a timeout for information that has not yet arrived, such as 10-15 seconds. This will allow your server to recover from the coincidence and the simultaneous arrival of many large messages. But this will still make your server unresponsive for a while. In addition, a deliberate Dos attack will still cause your server to be allocated the entire memory.
There are different variations in TLV coding. The exact number of bytes is used so that each individual TLV encoding can be relied upon to specify the type and length of a field. There will also be a TLV code that first places the length of the field, then the type, and finally the data (a LTV encoding). Although the order of the fields is different, it is still a TLV variant.
In fact, TLV coding makes memory management easier, which is one of the reasons why HTTP1.1 is such a bad deal. That's what they're trying to fix. A problem in HTTP2.0 data transmission in the LTV encoding framework. This is why we have designed our own network protocols for our projects using a TLV codec.
Write some information
In a non-clogging IO pipeline, writing data is also a challenge. When you are in non-clogging mode, write to the channel using write (Bytebuffer). There is no guarantee that the approximate number of bytes in the Bytebuffer is written. This write (Bytebuffer) method returns how many bytes are written so that it is possible to track the number of letters in the section. And that's a challenge: keep track of part of the message so that all the bytes of a message are sent.
To manage the writing of some of the information to the pipeline, we will create a message writer, just like message reader, where we need a message writer for every channel we write. Within each of the message writers, we will accurately keep track of how many bytes of information currently being written have been written.
If more information arrives at message writer than it can write directly to the channel, the information inside the message writer will be queued. This message writer writes the information to the channel as quickly as possible.
Here is a diagram showing how this piece of information has been written so far.
In order for message writer to send part of the message earlier, this message writer needs to be called from now and then, so that it can send more data.
If you have a lot of connections, then you will have a lot of message writer instances. Check for example a million message writer instance to see if they write any data is slow, first, many of the message writer instances may not have any information to send. We don't want to check out those instances. Second, not all channel instances are ready to write data. We don't want to waste time trying to write data to a channel that doesn't accept any data anyway.
To check if a channel is ready to write the data, you can use selector to register a channel. However, we do not want to register all channel instances. Imagine if you have 1 million connections, and they are partly idle, and all 1000000 connections are registered to selector. Then, when you invoke most of the Select () methods in these channel instances prepare to write (most of them are free, remember?) You will have to check all those connected message writers for data that can be written.
To avoid checking all of the message writer instances and, in any case, no information is sent to their channel instances, we use this two-step approach:
- When a message is written to a message writer, the message writer registers its associated channel with selector (if it has not yet been registered).
- When your server has time, it checks the selector to see which registered channel instance is ready to write. For each channel that is ready to write, its associated message writer is required to write the data into the channel. If a message writer writes all of its information into the channel, the channel will be unregistered again.
This little two-step approach will ensure that only the channel instance that has the message will be written is determined to be registered with selector.
put all of them together.As you can see, a non-clogging server needs to check incoming data from time to time to see if any new complete information is received. This server may need to be checked multiple times until one or more of the complete information has been received. It is not enough to check once. Similarly, a non-clogging server needs to be checked from time to time to see if there is any data to write. If there is, this server needs to check if any of the connections in the response are ready to have data written to them. Just checking when a message is queued for the first time is not enough, because this information is only partially written. In summary, a non-clogging server ends up with three "pipelines", which requires regular execution.
- This read pipeline check comes in from the incoming data of the open connection.
- This execution of the pipeline executes any one received complete message.
- This write pipeline checks whether it can write any external information to any one of the open connections.
These three pipes are executed repeatedly in a loop. You may be able to optimize the execution process slightly. For example, if there is no information waiting in the queue, you can skip this write pipeline. Or, if there is no new complete information to receive, you may be able to skip this execution of the pipeline. Here is a diagram showing the process that the entire server loops through:
If you still find this a bit complicated, remember to check the GitHub repository: Https://github.com/jjenkov/java-nio-server
Perhaps seeing this code in real work might help you understand how to achieve this.
Server threading ModelThis non-clogging server in the GitHub repository implements a threading model that uses two threads. This first thread receives aThe incoming connection of the Serversocketchannel. The second thread handles this received connection, which means read the message, processes the information, and writes the response back to the connection. Here's a description of the two threading models:
The server execution loop explained in the previous section is executed by the executing thread.
Translation Address: http://tutorials.jenkov.com/java-nio/non-blocking-server.html
Java nio 11, Java NIO: non-clogging server