the difference between 1.BIO, NiO and AIO. BIO: A connection to a thread, the client has a connection request, the server side needs to start a thread to process. The thread overhead is large. Pseudo-Asynchronous IO: Put the request connection into the thread pool, one-to-many, but the thread is still a valuable resource. NIO: One thread is requested, but the connection requests sent by the client are registered to the multiplexer, and multiplexer polls to connect with I/O requests before starting a thread for processing.
AIO: A valid request for a thread, the client's I/O requests are completed by the OS first and then notify the server application to start the thread for processing,
The bio is flow-oriented, NIO is buffer-oriented, and the various streams of the bio are blocked. NiO is non-blocking, the stream of the bio is unidirectional, and the channel of NIO is two-way.
Characteristics of NiO: event-driven model, single-threaded multitasking, non-blocking i/o,i/o read-write no longer blocking, but return 0, block based transmission is more efficient and advanced IO function zero-copy than stream based transmission, Io multiplexing greatly improves the scalability and practicality of Java network applications. Based on the reactor threading model.
In reactor mode, the event dispatcher waits for the state of an event or an application or operation to occur, and the event dispatcher passes this event to a previously registered event handler or callback function, which does the actual read-write operation. If read in reactor: Register read-ready events and corresponding event handlers, event dispatcher wait events, event arrival, activate distributor, dispatcher call event corresponding processor, event handler completes actual read operation, processes read data, registers new event, and returns control. the composition of the 2.NIO.
Buffer: Interacting with channel, the data is read from the channel into the buffer, write the flip method from the buffer to the channel: reverse this buffer, position to limit, and then the position to 0, in fact, switching read and write mode Clear method: Clears the buffer, resets the position to 0, and capacity the value to limit. Rewind method: Re-wrap this buffer and place position to 0
Directbytebuffer can reduce the system space to a copy of user space once. However, the cost of creating and destroying the buffer is more expensive and uncontrollable, and typically uses a memory pool to improve performance. Direct buffers are primarily allocated to large, persistent buffers that are susceptible to native I/O operations on the underlying system. If the data volume is smaller than the small application case, consider using Heapbuffer, which is managed by the JVM.
Channel: means the IO source and the target open connection, is bidirectional, but cannot directly access the data, only can interact with the buffer. By source, FileChannel's Read method and write method all cause data to replicate two times.
Selector enables a separate thread to manage multiple Channel,open methods to create Selector,register methods to register channels with multiplexer, the types of events that can be monitored: Read, write, connect, accept. Registering an event results in a Selectionkey: it represents the registration relationship between Selectablechannel and selector, wakeup method: To return the first selected operation that has not yet been returned immediately, the reason for the wake-up call is: Registering a new channel or event Channel shutdown, unregister, higher priority event triggers (such as timer events), and want to be processed in time.
The selector implementation class in Linux is Epollselectorimpl, delegated to Epollarraywrapper implementations, where three native methods are encapsulated Epoll, and Epollselectorimpl. The Implregister method, which registers the event in the Epoll instance by calling Epoll_ctl, adds the registered file descriptor (FD) to the Selectionkey's corresponding relationship to the Fdtokey. This map maintains the mapping of file descriptors and Selectionkey.
Fdtokey sometimes become very large, because the channel on the selector is very much (millions of connections), the expired or invalid channel is not closed in time. Fdtokey is always read sequentially, while reading is done in the Select method, which is not thread-safe.
Pipe: One-way data connection between two threads, data will be written to the sink channel, read from the source channel
Service-side establishment process for NIO: Selector.open (): Open a Selector;serversocketchannel.open (): Create a service-side channel;bind (): Bind to a port. and configure non-blocking mode; register (): Register channel and attention events to selector; Select () poll to get the features of the event 3.Netty ready . a high-performance, asynchronous event-driven NIO framework that provides support for TCP, UDP, and file transfers using a more efficient socket bottom layer, the soaring CPU footprint caused by Epoll Null polling is handled internally, avoiding the direct use of NIO traps, Simplifies the way NIO is handled. Using a variety of Decoder/encoder support to automate TCP packaging/subcontracting can use the Accept/process thread pool, improve connection efficiency, simple support for reconnection, heartbeat detection configurable IO thread count, TCP parameters, TCP receive and send buffer using direct memory instead of heap memory, By using the memory pool to recycle bytebuf by referencing counters in time for releasing objects that are no longer referenced, and reducing GC frequency using single-threaded serialization, the efficient reactor threading model uses Volitale extensively, uses CAs and atomic classes, thread-safe classes, A 4.Netty threading model is used for reading and writing locks .
Netty receives and processes user requests through the reactor model, and internally implements two thread pools, the boss thread pool, and the work thread pool, where the thread of the boss thread pool handles the requested accept event, and when a request is received for the Accept event, The corresponding socket is encapsulated into a niosocketchannel and given to the work thread pool, where the work thread pool is responsible for the requested read and write events, which are handled by the corresponding handler.
Single-threaded Model: All I/O operations are done by one thread, that is, multiplexing, event distribution, and processing are done on a reactor thread. Both receive the client's connection request, initiate a connection to the server, and send/read the request or Reply/reply message. A NIO thread handles hundreds of links at the same time, the performance is not supported, slow, if the thread into the dead loop, the entire program is not available, for high load, large concurrent application scenario is not appropriate.
Multithreaded model: There is a NIO thread (acceptor) that is only responsible for listening to the server, receive client TCP connection requests; The NIO thread pool is responsible for network IO operations, that is, read, decode, encode, and send messages; 1 NIO threads can handle n-link simultaneously, but 1 links only correspond to 1 NiO thread, which is intended to prevent concurrent operation problems. However, a acceptor thread may have a poor performance problem in the case of concurrent millions of client connections or requiring security authentication.
Master-Slave multithreaded Model: acceptor threads are used to bind listening ports, receive client connections, remove Socketchannel from the reactor thread of the main pool, re-register to threads on the sub thread pool, and handle I/O read-write operations, In order to ensure that Mainreactor is only responsible for access authentication, handshake and other operations, 5.TCP Packaging/unpacking reasons and solutions.
TCP processes data in a streaming way, and a complete package can be sent by TCP split into multiple packets, or a small package that can be packaged into a large packet.
TCP Sticky/Subcontract reasons: The size of the byte that the application writes is larger than the socket send buffer, and the unpacking occurs, and the application writes data less than the socket buffer size, and the network card sends data that is written more than once to the net, which will cause a sticky phenomenon; TCP segmentation for MSS size When the TCP message length-tcp the head length >MSS, the payload (net load) of the unpacking Ethernet frame is greater than the MTU (1500 bytes) for IP fragmentation.
Solution Message Fixed Length: Fixedlengthframedecoder class wrapping Tail adds special character segmentation: Row Separator class: Linebasedframedecoder or Custom separator class: Delimiterbasedframedecoder Divides messages into message headers and message bodies: the Lengthfieldbasedframedecoder class. It is divided into the head of the unpacking and sticky, the length of the field before and have the head of the unpacking and sticky, multiple expansion of the head of the unpacking and sticky package. 6. Know what serialization protocols are.
Serialization (encoding) is the serialization of an object into a binary form (an array of bytes). Mainly used for network transmission, data persistence and so on, and deserialization (decoding) is to restore from the network, disk read byte array to the original object, mainly for network transfer object decoding, in order to complete the remote call.
Key factors that affect serialization performance: serialized stream size (usage of network bandwidth), serialized performance (CPU resource footprint), and support for cross-language (docking and development language switching for heterogeneous systems).
Java Default-provided serialization: Unable to cross language, serialized code stream too large, serialized poor performance
XML, Advantages: Human machine readable, you can specify the name of the element or attribute. Disadvantage: Serialization data contains only the data itself and the structure of the class, excluding type identities and assembly information, only public properties and fields, not serialization methods, large files, complex file formats, and bandwidth transfer. Applicable scenario: As a configuration file to store data, real-time data conversion.
JSON, is a lightweight data interchange format, advantages: High compatibility, the data format is relatively simple, easy to read and write, the data is small after serialization, scalability, good compatibility, and XML compared to the protocol is relatively simple, faster resolution. Disadvantage: Data is less descriptive than XML, not suitable for performance requirements for MS level, extra space overhead. Applicable scenarios (alternative XML): Access across firewalls, high scalability requirements, web-based browser AJAX requests, relatively small amounts of transferred data, and relatively low real-time requirements (e.g., second level) services.
Fastjson, a "hypothetical ordered fast matching" algorithm is adopted. Advantages: The interface is simple to use and the fastest JSON library in the Java language. Disadvantages: Too much attention to fast, and deviated from the "standard" and functional, code quality is not high, the document is not complete. Scenario: protocol interaction, Web output, Android client
Thrift is not only a serialization protocol, but also an RPC framework. Advantages: After serialization of small volume, fast, support a variety of languages and rich data types, for data field additions and deletions have strong compatibility, support binary compression coding. Disadvantages: Less users, access across the firewall, unsafe, not readable, debugging code is relatively difficult to use with other Transport layer protocol (such as HTTP), can not support to the persistence layer directly read and write data, that is not suitable for data persistence serialization protocol. Application Scenario: RPC Solution for Distributed system
A subproject of Avro,hadoop solves the problem of JSON's verbosity and no IDL. Advantages: Support rich data type, simple dynamic language combination function, have self-describing attribute, improve data parsing speed, fast compressible binary data form, can implement remote procedure call RPC, support cross programming language implementation. Disadvantage: It is not intuitive for users who are accustomed to static type language. Scenario: Make hive, pig, and mapreduce persisted data formats in Hadoop.
PROTOBUF, the data structure is described as a. proto file, and the Pojo object and Protobuf related methods and properties of the corresponding data structure can be generated by the code generation tool. Advantages: The serialization of small stream, high performance, structured data storage format (XML JSON, etc.), by identifying the order of the fields, you can implement the protocol forward-compatible, structured documents easier to manage and maintain. Disadvantages: Need to rely on tool generation code, supported by relatively few languages, the official only support Java, C + +, Python. Scenario: RPC calls with high performance requirements, good access properties across firewalls, persistence for application-tier objects
Other Protostuff based on PROTOBUF protocol, but do not need to configure Proto files, direct package can Jboss marshaling can directly serialize Java classes, no real java.io.Serializable Interface Message pack An efficient binary serialization format Hessian Lightweight remoting Onhttp tool with binary protocol Kryo is based on the PROTOBUF protocol, supports only the Java language, requires registration (registration), and then serializes (Output), Deserialization (Input) 7. How to select a serialization protocol.
Specific scenarios for the system calls between companies, if the performance requirements of more than 100ms services, xml-based SOAP protocol is a worthwhile consideration. Based on web browser Ajax, and the communication between mobile app and server, the JSON protocol is preferred. JSON is also a great choice for performance requirements that are not very high, either in dynamic-type languages, or in applications where data loads are small. For environments with bad debugging environment, using JSON or XML can greatly improve debugging efficiency and reduce the cost of system development. When there are very high requirements for performance and simplicity, there is a certain competitive relationship between the Protobuf,thrift,avro. Protobuf and Avro are the primary options for persistent scenarios for T-level data. If the persisted data is stored in a Hadoop subproject, Avro will be a better choice. For a persistent layer non-Hadoop project, a static type language based application scenario, PROTOBUF will be more consistent with the development habits of static type language engineers. Because the design idea of Avro is in favor of dynamic type language, Avro is a better choice for dynamic language-oriented application scenarios. Thrift is a good choice if you need to provide a complete RPC solution. If you need to support different transport layer protocols after serialization, or a high-performance scenario that requires access across firewalls, PROTOBUF can be a priority.
There are several types of data for PROTOBUF: bool, double, float, Int32, Int64, String, Bytes, enum, message. Protobuf qualifier: Required: Must be assigned a value, cannot be empty, optional: A field can be assigned, not assigned, repeated: The field can be repeated any number of times (including 0), enumerated, and only one value in the specified set of constants;
The basic rule of protobuf: Each message must have at least one required type of field, 0 or more optional types, and repeated can contain 0 or more data; [1,15] The identification number within the encoding will occupy a byte (commonly used), the identification number within [16,2047] occupies 2 bytes, the identification number must not be repeated, use the message type, or you can nest the message to any multi-layer, use nested message types to replace the group.
Protobuf message escalation principle: Do not change the numeric identity of any existing fields, you cannot remove required fields that already exist, and optional and repeated type fields can be removed, but the labels cannot be reused. The newly added field must be optional or repeated. The field for the new required qualifier could not be read or written by the old version of the program.
The compiler generates a. java file for each message type, and a special builder class (which is used to create the message class interface). such as: UserProto.User.Builder Builder = UserProto.User.newBuilder (); Builder.build ();
Use in Netty: Protobufvarint32framedecoder is a decoding class for processing half packets of messages; Protobufdecoder (UserProto.User.getDefaultInstance ()) This is the decoding class in the Userproto.java file that is created; Protobufvarint32lengthfieldprepender the message header of the PROTOBUF protocol with a 32-length reshaping field that flags the length of the message class ; Protobufencoder is a coded class
Convert StringBuilder to BYTEBUF type: the 0 copy implementation of the Copiedbuffer () method 8.Netty.
Netty Receive and send Bytebuffer using direct buffers, use the outside of the heap directly memory to read and write socket, do not need to do two copies of the byte buffer. Heap memory One more memory copy, the JVM copies the heap memory buffer into direct memory before writing to the socket. Bytebuffer is assigned by Channelconfig, and Channelconfig creates bytebufallocator by default using direct Buffer
The Compositebytebuf class can combine multiple bytebuf into a single logical bytebuf, avoiding the traditional way of merging several small buffer into a large buffer by means of a memory copy. The Addcomponents method merges the header with the body into a logical bytebuf, where the two bytebuf exist separately within COMPOSITEBYTEBUF, and Compositebytebuf is only logically a whole
Through the Filechannel.tranferto method of Fileregion packaging, the file transfer can be sent directly to the target Channel to avoid the problem of memory copy caused by the traditional cyclic write method.
Through the Wrap method, we can wrap the byte[] array, bytebuf, Bytebuffer, etc. into a Netty bytebuf object, thus avoiding the copy operation.
Selector BUG: If Selector polling result is empty, there is no wakeup or new message processing, there will be null polling, CPU utilization 100%,
Netty Solution: The selector Select operation Cycle of statistics, each completed an empty select operation for a count, if in a period of continuous occurrence of n empty polling, triggering the Epoll dead loop bug. Rebuild the selector to determine if it was a reconstruction request initiated by another thread, if not the original Socketchannel removed from the old selector, re-register to the new selector, and shut down the original selector. 9.Netty performance in what way.
Heartbeat , on the server side: will periodically clear idle session inactive (NETTY5), on the client: to detect whether the session is disconnected, whether the restart, detect network latency, where the Idlestatehandler class used to detect session state
Serial non-locking design , that is, the processing of messages as much as possible within the same thread, without the thread switching, so as to avoid multithreading competition and synchronization lock. On the surface, the serialization design seems to be low in CPU utilization and less concurrent. However, by adjusting the thread parameters of the NIO thread pool, you can simultaneously start multiple serialized threads running concurrently, and this locally unlocked serial thread design is better than a queue-multiple worker threading model performance.
reliability , link validity detection: Link idle detection mechanism, read/write idle timeout mechanism; memory protection mechanism: reusing bytebuf through memory pool; Bytebuf decoding protection; graceful downtime: No longer receive new messages, preprocessing operations before exiting, and resource release operations.
Netty Security : Supported security protocols: SSL V2 and V3,tls,ssl one-way authentication, two-way authentication and Third-party CA authentication.
the embodiment of efficient concurrent programming : A large number of volatile, proper use, the extensive use of CAs and atomic classes, the use of thread-safe containers, and increased concurrency performance through read-write locks. IO Communication Performance Three principles: Transfer (AIO), Protocol (Http), Thread (master-slave multithreading)
The function of the flow integral (transformer): Prevent the downstream network element from being crushed due to the unbalanced performance of the upper and lower network elements, and the interruption of traffic flow; Prevent the communication module from accepting the message too fast, the back-end business thread processing does not lead to the deadlock problem in time.
TCP parameter Configuration : So_rcvbuf and SO_SNDBUF: Usually the recommended value is 128K or 256k;so_tcpnodelay:nagle algorithm by automatically connecting small packets within the buffer to form larger packets, Block a large number of packets sent to block the network, thereby improving network application efficiency. However, it is necessary to turn off the optimization algorithm for time delay sensitive applications . 10.NIOEventLoopGroup source code.
Nioeventloopgroup (actually multithreadeventexecutorgroup) internal maintenance of a type of eventexecutor children [], the default size is the processor core * 2, which constitutes a thread pool, The Nioeventloopgroup overload Newchild method is initialized when Eventexecutor, so the actual type of the children element is nioeventloop.
When a thread starts, it calls the Singlethreadeventexecutor constructor, executes the Nioeventloop class's Run method, and first calls the Hastasks () method to determine whether the current Taskqueue has an element. If there are elements in the Taskqueue, the Selectnow () method is executed and the Selector.selectnow () is eventually executed, and the method returns immediately. If Taskqueue has no elements, execute the Select (Oldwakenup) method
The Select (oldwakenup) method resolves bug,selectcnt used in Nio to record the number of executions of the Selector.select method and whether the identity has been executed Selector.selectnow (). If a epoll null polling bug is triggered, the Selector.select (Timeoutmillis) is executed repeatedly, and the variable selectcnt gradually becomes larger when selectcnt reaches the threshold (default 512). The Rebuildselector method is executed, and selector is rebuilt to solve the CPU-occupied 100% bug.
The Rebuildselector method first creates a new selector through the Openselector method. The selectionkey of the old selector is then executed by cancel. Finally, the old selector channel are registered to the new selector. After rebuild, you need to rerun the method Selectnow to check for ready Selectionkey.
Next, call the Processselectedkeys method (process I/O tasks), call the Processselectedkeysoptimized method when Selectedkeys!= null, iterate Selectedkeys get the Ready The Selectkey of IO events is stored in the array selectedkeys, and then the Processselectedkey is invoked for each event to handle it, Processselectedkey the Op_read;op_write;op_ Connect event.
Finally, the Runalltasks method (non-IO task) is called, which first invokes the Fetchfromscheduledtaskqueue method, Move tasks in the scheduledtaskqueue that have exceeded the deferred execution time to taskqueue waiting to be executed, then take the task from Taskqueue, perform a time-consuming check for each of the 64 tasks, and if the execution time exceeds the pre-set execution time, Stop performing non-IO tasks and avoid too many non-IO tasks that affect the execution of IO tasks.
Each nioeventloop corresponds to a thread and a selector,nioserversocketchannel is actively registered to a Nioeventloop selector, Nioeventloop is responsible for event polling.
Outbound events are request events, the initiator is Channel, the processor is unsafe, the outbound event is notified, the direction of transmission is tail to head. The Inbound event Initiator is unsafe, the handler of the event is Channel, is the notification event, the direction of transmission is from beginning to end.
Memory management mechanism, first of all, pre-request a large chunk of memory Arena,arena by a number of chunk, and each chunk default by 2048 page composition. Chunk organizes page in the form of an AVL tree, each leaf node represents a page, and the middle node represents an area of memory in which the node records its offset address throughout the arena. When the zone is assigned, the tag bit on the middle node is labeled, which means that all nodes below the middle node are allocated. More than 8k of memory is allocated in Poolchunklist, and poolsubpage is used to allocate memory less than 8k, which splits a page into multiple segments for memory allocation.
BYTEBUF Features: Support automatic expansion (4M), ensure put method does not throw an exception, through the built-in composite buffer type, achieve 0 copy (zero-copy), do not need to invoke flip () to switch read/write mode, read and write index separate; method chain ; The reference count is used for memory recycling based on Atomicintegerfieldupdater; Pooledbytebuf uses a binary tree to implement a memory pool, centrally manages the allocation and release of memory, and creates a new buffer object without each use. Unpooledheapbytebuf each time a new buffer object is created.
I talents, if wrong, please point out, thank you.
If you have a better suggestion, you can leave a message for us to discuss and make progress together.
I sincerely thank you for your patience in reading this blog post.