Java Development high-performance Web site (high concurrency)

Source: Internet
Author: User
Tags connection pooling memcached mixed redis log4j java se

Jvm
The correct use of the JVM parameter configuration parameters running in the Jee container is directly related to the performance and processing capability of the whole system, and the tuning of the JVM is mainly for the tuning of memory management, and the direction of optimization is divided into the following 4 points:
The size of the 1.HeapSize heap, or the strategy for Java virtual machines to use memory, is critical.
2.GarbageCollector uses configuration-related parameters for 4 algorithms (policies) of the garbage collector in Java.
The 3.StackSize stack is the memory instruction area of the JVM, and each thread has its own stack,stack size that limits the number of threads.
4.debug/log in the JVM you can also set the log output to the JVM runtime and the JVM after it hangs, which is critical, depending on the log output of each JVM to configure the appropriate parameters.
The JVM's configuration skills can be seen on the web, but I recommend reading 2 articles from Sun's official, which still have an understanding of the configuration parameters
1.Java HotSpot VM Options
Java HotSpot VM Options
2.Troubleshooting Guide for Java SE 6 with HotSpot vmtroubleshooting Guide to Java SE 6 with HotSpot VM
In addition, I believe that not everyone is facing these JVM parameters every day, if you forget those key parameters you can enter Java-x (capital X) to prompt.

The parameters of

jdbc
for MySQL JDBC are also described in previous articles, and the use of configuration parameters in JDBC in a single machine or cluster environment can have a significant impact on the operation of the database.
Some of the so-called High-performance Java ORM Open Source Framework is the default parameters that are open in many jdbc:
1. For example: AutoReConnect, Prepstmtcachesize, Cacheprepstmts, Usenewio, Blobsendchunksize,
2. For example, under the cluster environment: Roundrobinloadbalance, Failoverreadonly, Autoreconnectforpools, Secondsbeforeretrymaster.
details can be found in MySQL's official user manual for JDBC:
http://dev.mysql.com/doc/refman/5.1/zh/connectors.html#cj-jdbc-reference

The frequent interaction between

database connection pool (DataSource)
Applications and database connections can lead to system bottlenecks and significant overhead that can affect system performance, and the JDBC Connection pool is responsible for allocating, managing, and releasing database connections. It allows an application to reuse an existing database connection, rather than re-establish a connection, so the application does not need to connect to the database switch frequently, and can release a database connection that has more idle time than the maximum idle time to avoid missing the database connection caused by not releasing the database connection. This technique can significantly improve the performance of database operations.
Here I think there is a point to note:
The use of connection pooling also needs to be shut down, because the database connection pool is started in advance and the database to get the appropriate connection, and then no longer need the application to deal directly with the database, because the application use database connection pool is a "borrow" concept The application to obtain resources from the database connection pool is "loaned", still need to return, as if there are 20 buckets here, need to take water people can use these barrels from the pool to take water, if 20 people have to take the water, not to return the bucket back, then the people back to the need to take water, Only to wait for someone to return the cask, before the use of people need to put back, or the people behind will be waiting, resulting in resource congestion, the same, the application to get the database connection when the connection connection object is from the "pool" to allocate a database connection out, in the use of the finished, Return the database connection so that you can keep the database connection "There is a debit" guideline.
Resources:
Http://dev.mysql.com/doc/refman/5.1/zh/connectors.html#cj-connection-pooling

Data access
Database server optimization and data access, what kind of data to put in a better place is worth thinking about, the future of storage is likely to be mixed, cache,nosql,dfs,database in a system will have, the life of tableware and weekdays wear clothes need to be placed at home, But not the same type of furniture storage, it seems that there is no other people put cutlery and clothes in the same closet. As with different types of data in the system, it is necessary to use the appropriate storage environment for different types of data. Files and pictures are stored, sorted first by the heat of the access, or by the size of the file. Strong relationship type and need to use the traditional database of transaction support, weak relational type does not need transaction support can consider NoSQL, mass file storage can consider the support of network storage DFS, as far as the cache depends on the size of your individual data storage and read and write ratio.
It is also noteworthy that data read and write separation, whether in the database or NoSQL environment Most of the reading is greater than write, so at design time to consider not only the need to let the data read scattered across multiple machines, but also need to consider the data consistency between multiple machines, MySQL, a master more from, Add Mysql-proxy or borrow some of the parameters from JDBC (roundrobinloadbalance, failoverreadonly, Autoreconnectforpools, Secondsbeforeretrymaster) for subsequent application development, you can separate read and write separations, spread a lot of read pressure across multiple machines, and ensure data consistency.

Caching
In the macroscopic view the cache generally divides into 2 kinds: the local cache and the distributed cache
1. Local caching, in the case of Java local caching, is the combination of data into static data, which is then taken out of the static data when needed, and recommended for high concurrent environments. Concurrenthashmap or Copyonwritearraylist as the local cache. The use of caching more specifically is the use of system memory, the use of the amount of memory resources need to have an appropriate proportion, if more than appropriate use of storage access, will be counterproductive, resulting in inefficient operation of the entire system.
2. Distributed caching, commonly used in distributed environments, centralized storage for each machine cache, and not only for use in the cache, but also as a means of data synchronization/transmission in distributed systems, most commonly used by memcached and Redis.
The efficiency of data storage on different media is read/written differently, in the system how to use the cache, so that your data closer to the CPU, there is a picture below you need to always keep in mind, from Google's technology, Daniel Jeff Dean (REF) masterpiece, as shown in the picture:

Concurrency/Multithreading
In high concurrency environments, developers are advised to use the concurrency packages (java.util.concurrent) that are brought in the JDK. Using the tool classes under Java.util.concurrent to simplify multithreaded development after JDK1.5, the java.util.concurrent tools are divided into the following main sections:
1. Thread pool, thread pool interface (Executor, executorservice) and implementation class (Threadpoolexecutor, Scheduledthreadpoolexecutor), The thread pool framework with the JDK can manage the queues and scheduling of tasks and allow controlled shutdowns. Because running a thread consumes system CPU resources, creating and ending a thread also has overhead on the system's CPU resources, using a thread pool is not only an effective way to manage multithreading, it can also improve the efficiency of the thread.
2. Local queues provide efficient, scalable, thread-safe, non-blocking FIFO queues. The five implementations in Java.util.concurrent support the extended Blockingqueue interface, which defines the blocked version of Put and take: Linkedblockingqueue, Arrayblockingqueue, Synchronousqueue, Priorityblockingqueue and Delayqueue. These different classes cover most common uses of producer-consumer, messaging, parallel task execution, and associated concurrent design.


Jvm
The correct use of the JVM parameter configuration parameters running in the Jee container is directly related to the performance and processing capability of the whole system, and the tuning of the JVM is mainly for the tuning of memory management, and the direction of optimization is divided into the following 4 points:
The size of the 1.HeapSize heap, or the strategy for Java virtual machines to use memory, is critical.
2.GarbageCollector uses configuration-related parameters for 4 algorithms (policies) of the garbage collector in Java.
The 3.StackSize stack is the memory instruction area of the JVM, and each thread has its own stack,stack size that limits the number of threads.
4.debug/log in the JVM you can also set the log output to the JVM runtime and the JVM after it hangs, which is critical, depending on the log output of each JVM to configure the appropriate parameters.
The JVM's configuration skills can be seen on the web, but I recommend reading 2 articles from Sun's official, which still have an understanding of the configuration parameters
1.Java HotSpot VM Options
Http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
2.Troubleshooting Guide for Java SE 6 with HotSpot VM http://www.oracle.com/technetwork/java/javase/index-137495.html
In addition, I believe that not everyone is facing these JVM parameters every day, if you forget those key parameters you can enter Java-x (capital X) to prompt.

The parameters of

jdbc
for MySQL JDBC are also described in previous articles, and the use of configuration parameters in JDBC in a single machine or cluster environment can have a significant impact on the operation of the database.
Some of the so-called High-performance Java ORM Open Source Framework is the default parameters that are open in many jdbc:
1. For example: AutoReConnect, Prepstmtcachesize, Cacheprepstmts, Usenewio, Blobsendchunksize,
2. For example, under the cluster environment: Roundrobinloadbalance, Failoverreadonly, Autoreconnectforpools, Secondsbeforeretrymaster.
details can be found in MySQL's official user manual for JDBC:
http://dev.mysql.com/doc/refman/5.1/zh/connectors.html#cj-jdbc-reference

The frequent interaction between

database connection pool (DataSource)
Applications and database connections can lead to system bottlenecks and significant overhead that can affect system performance, and the JDBC Connection pool is responsible for allocating, managing, and releasing database connections. It allows an application to reuse an existing database connection, rather than re-establish a connection, so the application does not need to connect to the database switch frequently, and can release a database connection that has more idle time than the maximum idle time to avoid missing the database connection caused by not releasing the database connection. This technique can significantly improve the performance of database operations.
Here I think there is a point to note:
The use of connection pooling also needs to be shut down, because the database connection pool is started in advance and the database to get the appropriate connection, and then no longer need the application to deal directly with the database, because the application use database connection pool is a "borrow" concept The application to obtain resources from the database connection pool is "loaned", still need to return, as if there are 20 buckets here, need to take water people can use these barrels from the pool to take water, if 20 people have to take the water, not to return the bucket back, then the people back to the need to take water, Only to wait for someone to return the cask, before the use of people need to put back, or the people behind will be waiting, resulting in resource congestion, the same, the application to get the database connection when the connection connection object is from the "pool" to allocate a database connection out, in the use of the finished, Return the database connection so that you can keep the database connection "There is a debit" guideline.
Resources:
Http://dev.mysql.com/doc/refman/5.1/zh/connectors.html#cj-connection-pooling

Data access
Database server optimization and data access, what kind of data to put in a better place is worth thinking about, the future of storage is likely to be mixed, cache,nosql,dfs,database in a system will have, the life of tableware and weekdays wear clothes need to be placed at home, But not the same type of furniture storage, it seems that there is no other people put cutlery and clothes in the same closet. As with different types of data in the system, it is necessary to use the appropriate storage environment for different types of data. Files and pictures are stored, sorted first by the heat of the access, or by the size of the file. Strong relationship type and need to use the traditional database of transaction support, weak relational type does not need transaction support can consider NoSQL, mass file storage can consider the support of network storage DFS, as far as the cache depends on the size of your individual data storage and read and write ratio.
It is also noteworthy that data read and write separation, whether in the database or NoSQL environment Most of the reading is greater than write, so at design time to consider not only the need to let the data read scattered across multiple machines, but also need to consider the data consistency between multiple machines, MySQL, a master more from, Add Mysql-proxy or borrow some of the parameters from JDBC (roundrobinloadbalance, failoverreadonly, Autoreconnectforpools, Secondsbeforeretrymaster) for subsequent application development, you can separate read and write separations, spread a lot of read pressure across multiple machines, and ensure data consistency.

Caching
In the macroscopic view the cache generally divides into 2 kinds: the local cache and the distributed cache
1. Local caching, in the case of Java local caching, is the combination of data into static data, which is then taken out of the static data when needed, and recommended for high concurrent environments. Concurrenthashmap or Copyonwritearraylist as the local cache. The use of caching more specifically is the use of system memory, the use of the amount of memory resources need to have an appropriate proportion, if more than appropriate use of storage access, will be counterproductive, resulting in inefficient operation of the entire system.
2. Distributed caching, commonly used in distributed environments, centralized storage for each machine cache, and not only for use in the cache, but also as a means of data synchronization/transmission in distributed systems, most commonly used by memcached and Redis.
The efficiency of data storage on different media is read/written differently, in the system how to use the cache, so that your data closer to the CPU, there is a picture below you need to always keep in mind, from Google's technology, Daniel Jeff Dean (REF) masterpiece, as shown in the picture:

Concurrency/Multithreading
In high concurrency environments, developers are advised to use the concurrency packages (java.util.concurrent) that are brought in the JDK. Using the tool classes under Java.util.concurrent to simplify multithreaded development after JDK1.5, the java.util.concurrent tools are divided into the following main sections:
1. Thread pool, thread pool interface (Executor, executorservice) and implementation class (Threadpoolexecutor, Scheduledthreadpoolexecutor), The thread pool framework with the JDK can manage the queues and scheduling of tasks and allow controlled shutdowns. Because running a thread consumes system CPU resources, creating and ending a thread also has overhead on the system's CPU resources, using a thread pool is not only an effective way to manage multithreading, it can also improve the efficiency of the thread.
2. Local queues provide efficient, scalable, thread-safe, non-blocking FIFO queues. The five implementations in Java.util.concurrent support the extended Blockingqueue interface, which defines the blocked version of Put and take: Linkedblockingqueue, Arrayblockingqueue, Synchronousqueue, Priorityblockingqueue and Delayqueue. These different classes cover most common uses of producer-consumer, messaging, parallel task execution, and associated concurrent design.
3. Synchronizer, four classes can assist in the implementation of common private synchronization statements. Semaphore is a classic concurrency tool. Countdownlatch is an extremely simple but extremely common utility used to block execution before a given number of signals, events, or conditions are maintained. Cyclicbarrier is a reusable multi-channel synchronization point that is useful in some parallel programming styles. Exchanger allows two threads to swap objects at the collection point, which is useful in multiple pipelined designs.
4. And contract Collection, this package also provides the design for multi-threaded context in the Collection implementation: Concurrenthashmap, Concurrentskiplistmap, Concurrentskiplistset, Copyonwritearraylist and Copyonwritearrayset. When many threads are expected to access a given collection, the concurrenthashmap is usually better than the synchronized hashmap,concurrentskiplistmap is usually better than the synchronized TreeMap. Copyonwritearraylist is better than synchronized ArrayList when the expected readings and traversal are much greater than the number of updates in the list.

queues
about queues can be divided into: local queues and distributed Queues 2 classes
Local queues: Commonly used for non-timeliness data bulk writes, You can cache the obtained data at a certain amount of time in a number of times in a batch write, you can use Blockingqueue or list/map to achieve.
Related information: Sun Java API.
distributed queues: Generally as message middleware, building a distributed environment of the system and subsystem communication between the bridge, JEE environment is the most used is the Apache AVTIVEMQ and Sun Company's OPENMQ.
Lightweight MQ middleware has been introduced to you before, for example: Kestrel and Redis (Ref http://www.javabloger.com/article/mq-kestrel-redis-for-java.html), It is also recently heard that LinkedIn's search technology team has launched an MQ product-kaukaf (Ref Http://sna-projects.com/kafka), which remains a concern.
Related information:
1.activemq http://activemq.apache.org/getting-started.html
2.openmq  http:// mq.java.net/about.html
3.kafka       http://sna-projects.com/kafka      &NBSP
4.JMS article   http://www.javabloger.com/article/category/jms

NiO
NiO is present in the JDK1.4 version, and before Java 1.4, the JDK provided a stream-oriented I/O system, such as a read/write file that processes data one byte at a time. One input stream produces one byte of data, one output stream consumes one byte of data, the flow-oriented I/O is very slow, and a packet is either received by the entire datagram or not yet. Java NiO non-blocking technology is actually taking reactor mode, the content will be automatically notified, do not have to Deng, dead cycle, greatly enhance the system performance. In the real scene, the NIO technology mostly uses two aspects, 1 is the file reads and writes the operation, 2 is the network data flow operation. There are several core objects in NiO that need to be mastered: 1 selector (Selector), 2 channels (Channel), 3 buffers (buffer).
my nonsense:
1. In the technical category of Java NIO, memory-mapped files are an efficient way to isolate cold/thermal data stored in the cache, and to handle some of the cold data in the cache. This approach is much faster than regular streaming or channel based I/O, by making the data in the file appear as the contents of the memory array, the part of the file that is actually read or written is mapped into memory, and not the entire file is read into memory.
2. In the MySQL JDBC driver, you can also use NIO technology to manipulate the database to improve the performance of the system.

Long Connection/servlet3.0
The long connection here is long polling, the previous browser (client) needs to focus on server-side data changes need to continue to access the server, so the number of clients will inevitably give the server side of a lot of pressure, such as: The station in the Forum message. A new feature is now available in the SERVLET3.0 specification: Asynchronous IO Communication, which maintains a long connection. This technique, which utilizes Servlet3 asynchronous requests, can greatly ease the pressure on the server side.
The principle of Servlet3.0 is to open a request requests a thread suspend, the middle set wait timeout time, if the background event triggers request requests, the results returned to the client request requests, if the set wait time exceeded no event occurred will return the request to the client, the customer End will initiate request requests again, and the client-server interaction can be reciprocating.
It's like, you come over here and tell me that if someone's looking for you, I will inform you immediately you come to see him, originally you need to constantly ask me if I have to find you, and no matter if there is no one to you, you need to constantly ask me if there is no one to find you, so whether the people who ask or be asked will be exhausted.

Log
Log4j is commonly used by people, the system in the first time the log is generally set in the info level, the real on-line after the general setting in the error level, but no matter at any time, the entry of the log is to be concerned about, Developers can generally rely on the output of the log to find problems or rely on the output of the log to optimize the performance of the system, the log is the system running status report and the basis of the error.
Simply put, the logs are exported to different environments according to the different policies and levels defined, so that we could analyze and manage them. On the contrary, you do not have the output of the strategy, then a lot of machines, a long time, there will be a big push messy log, will let you wrong time to start, so the log output strategy is to use the key point of the log.
References: http://logging.apache.org/log4j/1.2/manual.html

Package/Deploy
It is a good idea to have different types of functional modules in the IDE environment that can be granular into different projects, making it easier to deploy different jar packages in different environments. There is such an application scenario: need to regularly remotely from the SP to get the day 100 news and some of the city's weather forecast, although the amount of data per day is not much, but the front-end access to a large number of concurrent, it is clear that the system architecture to do the separation of read and write.
If the functional modules of Web engineering and timing capture are fully packaged in one project, it will result in the need for expansion of both Web applications and timers on each machine, since functional modules are not separated, and the timer work on each machine will result in duplication of data within the database.
If you develop the Web and timer into 2 projects, packaging can be deployed separately, 10 of the web corresponding to a timer, decomposition of the pressure of the front-end request, the data write will not repeat.
Another benefit of doing this is that it can be shared, in which the web and timers need to read the database, so the Web and timer projects have code to manipulate the database, and the logic of the code still feels messy. If you pull out a DAL layer of the Jar,web and Timer application module developers only need to reference the DAL layer of the jar, the development of relevant business logic, interface-oriented programming, no need to consider the specific database operations, the specific database operations by other developers, can be in the development of the Division of labor is very clear, and Non-interference.

The

Framework
The so-called popular SSH (struts/spring/hiberanet) lightweight framework is not lightweight for many small and medium projects, and developers need to maintain code, You also need to maintain a cumbersome XML configuration file, and you might not be able to run the entire project if a configuration file isn't written correctly. No configuration files can replace the SSH (struts/spring/hiberanet) framework of the product is really too much, I have introduced to you a number of products (REF).
This I am not blindly opposed to the use of the SSH (struts/spring/hiberanet) framework, in my view the SSH framework is really the role of the specification development, and does not use SSH (struts/spring/hiberanet) How much performance the framework can improve. The
SSH framework is for companies with a very large number of project numbers and need to continue to increase the size of the team, it is necessary to select some of the commercially recognized and familiar technologies, SSH (struts/spring/hiberanet) The framework is more mature so it is the first product.
But for some small team with a technical expert team can choose a more concise framework, real speed up your development efficiency, early abandonment of the SSH framework to choose more concise technology in small team development is a more aware of the choice.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.