Recently in doing a keep-hand captain online business, design requirements support mobile phone long connection 1 million, currently we only made a stand-alone version, not the development of the cluster version.
Hardware: 2U * 8C * 2.4G, 16G MEM, Raid5.
Software: The operating system is Linux RHEL 5.5 64x,jdk1.6_30 64x, and MySQL 5.5 64x.
The program uses Java development, using the Netty NIO framework. As you know, a JVM starts a limited number of threads, also tens of thousands of, if the traditional IO, a connection to start two threads, one read, 500,000 connection is 1 million threads, which is far from the ability of the Java Virtual machine.
In the test, the server keeps the number of connections directly related to the size of the JVM memory, the larger the memory, the greater the number of connections to keep. In the server startup is also added some parameters such as parallel collection. When we finally reached 500,000 long connections, the JVM's xmx was 12G.
Because our business needs to keep some business information, so it will be more memory, if not to do business, may also increase the number of connections. But it is certainly impossible to increase indefinitely.
The system is in the connection phase of the CPU is not high, on the connection to maintain almost no CPU, the main cost of memory.
Our client is developed with ERL, each machine built 63,000 connected to the server, need more than 8 client test machine. Because the ERL is not very familiar with, so the single machine using virtual network card to establish more than 65535 connection method has not been successful, resulting in the pressure machine needs too much.
The overall test situation is expected, but it is also very important. When the JVM uses more than 8G of memory, a full GC takes 8 seconds, almost 1G a second. Sometimes full GC competes for more than 10 seconds. As we all know, the service is completely suspended during this time. Of course, we can shorten the JVM pause time by setting the JVM parameter of the maximum garbage collection time, but because I configure the throughput to be the highest priority, the full GC will do an all-garbage collection. But this time is still much more than I expected.
Server stand-alone long connection 500,000 +