What you need to focus on developing high-performance websites with Java

Source: Internet
Author: User
Tags connection pooling java se

Recent IT media held in the Industry technology conference so many sites are in the disclosure of their own technology insider share with peers, big to Facebook, Baidu, small to just start the site. Facebook, Baidu and other large sites using technology and extraordinary processing power does give a refreshing feeling, but not every site is like Facebook, Baidu has billions of users to access traffic, there is a huge amount of data need to store, need to use to mapreduce/parallel computing, hbase/Columnstore These technologies are not available. Technical means is always the support of the operation, for the current operating environment applicable to the good, there is no need to rush a fashionable, must and a popular technology to produce a point of relationship before it.

In the recent technical conference we are more focused on these large sites, in fact, small and medium-sized portal technology system is also worth exploring and attention. All the world's siege division is not all for these large portal services, more siege division is unknown to some of the fledgling small and medium-sized website services, and occupy the siege team of more than 60% of the crowd. In the focus on large-scale portal sites, small and medium-sized web portal technology development and practical experience is worth sharing.

Whether a large portal site or a small or medium-sized vertical type site will pursue stability, performance and scalability. The technical experience of large-scale web site sharing is worth learning and borrowing, but the implementation to more specific practice is not applicable to all sites, other language development sites I dare not say more, but the Java development system, I can still you to plug in a few words:

The correct use of JVM parameter configuration parameters running in the

JVM
Jee container directly affects the performance and processing power of the entire system, and the tuning of the JVM is mainly about the tuning of memory management, the direction of optimization is divided into the following 4 points:
1. heapsize             heap Size, it can also be said that the Java virtual machine uses the memory strategy, This is very crucial. The
2.garbagecollector  is used by configuring the relevant parameters for the 4 algorithms (policies) of the garbage collector in Java. The
3.stacksize             stack is the memory instruction area of the JVM, Each thread has the size of his own stack,stack that limits the number of threads. The
4.debug/log           can also set the log output to the JVM runtime and the JVM after it has been hung out. This is very critical, depending on the log output of the various JVMs to configure the appropriate parameters.
The JVM's configuration skills are ubiquitous on the web, but I recommend reading the official Sun 2 articles that can still have an understanding of the configuration parameters
1.Java HotSpot VM Options
http://www.oracle.com/ technetwork/java/javase/tech/vmoptions-jsp-140102.html
2.Troubleshooting Guide for Java SE 6 with HotSpot vmhttp:// Www.oracle.com/technetwork/java/javase/index-137495.html
In addition, I believe that not everyone in the Siege division is on these JVM parameters every day, If you forget those key parameters you can enter java-x (uppercase X) to prompt.

The

JDBC
parameters for MySQL jdbc are also described in the previous article, and the rational use of configuration parameters in JDBC in a single machine or cluster environment also has a significant impact on the operational database.
Some of the so-called high-performance Java ORM Open-source frameworks have opened many of the default parameters in JDBC:
1. For example: AutoReConnect, Prepstmtcachesize, Cacheprepstmts, Usenewio, Blobsendchunksize, etc.,
2. For example, in a clustered environment: Roundrobinloadbalance, Failoverreadonly, Autoreconnectforpools, Secondsbeforeretrymaster.
For details, refer to the JDBC Official user's manual for MySQL:
http://dev.mysql.com/doc/refman/5.1/zh/connectors.html#cj-jdbc-reference

Database connection pooling (DataSource)
Applications with frequent database connections can bring bottlenecks and significant overhead to the system, and the JDBC Connection pool is responsible for allocating, managing, and releasing database connections. It allows an application to reuse an existing database connection and not to reestablish a connection, so the application does not need to be frequently connected to the database switch and can free up database connections that have been idle for more than the maximum idle time to avoid missing database connections due to not releasing the database connection. This technology can significantly improve the performance of database operations.
Here I think there is a point to note:
The use of connection pooling needs to be shut down, because the database connection pool is started in advance and the database to obtain a corresponding connection, and then no longer need the application directly to the database, because the application uses the database connection pool is a "borrow" concept, The application from the database connection pool to obtain resources is "lent", but also need to return, just like there are 20 buckets here, need to get water people can use these barrels from the pool to get water, if 20 people have finished the water, not the bucket back to the place, then the people back to need to get water, Can only wait for someone to return the wooden barrels, before the people need to put back, otherwise the people will be waiting, causing the resource jam, similarly, the application gets the database connection when the connection connection object is from the "pool" to allocate a database connection, after the use is complete, Return this database connection in order to keep the database connection "There is no more" guidelines.
Reference:
http://dev.mysql.com/doc/refman/5.1/zh/connectors.html#cj-connection-pooling

Data access
Database server optimization and data access, what type of data in what place better is worth thinking about the problem, the future of storage is likely to be mixed, cache,nosql,dfs,database in a system will have, life of tableware and everyday wear clothes need to be placed at home, But not with the same type of furniture storage, it seems that no one else put the tableware and clothes in the same cupboard inside. This is like the different types of data in the system, the need to use the appropriate storage environment for different types of data. The storage of files and pictures is first categorized by the popularity of the access, or by the size of the file. Strong relationship types and require transactional support for traditional databases, weak relational type does not require transactional support to consider NoSQL, massive file storage can consider the DFS that supports networked storage, as far as the cache depends on the size of your individual data storage and the ratio of read and write.
It is also worth noting that the data read and write separation, whether in the database or NoSQL environment, most of the reading is greater than write, so in the design should also consider not only need to let the data read scattered on multiple machines, but also to consider the data consistency between multiple machines, MySQL, a master many from, Add Mysql-proxy or borrow some parameters from JDBC (roundrobinloadbalance, failoverreadonly, Autoreconnectforpools, Secondsbeforeretrymaster) for subsequent application development, it is possible to separate read and write, to spread a large amount of read pressure across multiple machines, and also to ensure data consistency.

Cache
On the macro level, the cache is generally divided into 2 types: local cache and distributed cache
1. Local cache, for the local cache of Java is to say the data into static data in combination, and then need to use it from the static data in combination to take out, for high concurrency environment recommended Concurrenthashmap or Copyonwritearraylist as the local cache. The use of the cache is more specific to the use of system memory, the use of how much memory resources need to have an appropriate proportion, if more than the appropriate use of storage access, will be counterproductive, resulting in inefficient operation of the entire system.
2. Distributed cache, generally used in distributed environment, the cache on each machine centralized storage, and not only for the use of the cache category, but also as a distributed system data synchronization/transmission of a means, generally the most used is memcached and Redis.
Data stored on different media read/write efficiency is different, how to use the cache in the system, so that your data closer to the CPU, there is a picture you need to always remember in mind, from Google technology Daniel Jeff Dean (REF) masterpiece,:

Concurrent/Multithreaded
In highly concurrent environments, developers are advised to use the concurrency package (java.util.concurrent) that comes with the JDK. Using the tool class under java.util.concurrent after JDK1.5 can simplify multithreaded development, which is divided into the following main parts in Java.util.concurrent tools:
1. Thread pool, thread pool interface (Executor, executorservice) and implementation class (Threadpoolexecutor, Scheduledthreadpoolexecutor), Using the thread pool framework that comes with the JDK, you can manage the queue and schedule of tasks and allow controlled shutdowns. Because running a thread consumes system CPU resources, and creating and ending a thread also has overhead on the system's CPU resources, using the thread pool can not only effectively manage the use of multithreading, but it can also improve the efficiency of threading.
2. Local queues provide an efficient, scalable, thread-safe, non-blocking FIFO queue. The five implementations in Java.util.concurrent support the extended Blockingqueue interface, which defines the blocking versions of Put and take: Linkedblockingqueue, Arrayblockingqueue, Synchronousqueue, Priorityblockingqueue and Delayqueue. These different classes cover the most common use contexts for producer-consumer, messaging, parallel task execution, and associated concurrency design.
3. Synchronizer, four classes can assist in the implementation of common private synchronization statements. Semaphore is a classic concurrency tool. Countdownlatch is an extremely simple but extremely common utility used to block execution before a given number of signals, events, or conditions are maintained. Cyclicbarrier is a multi-path synchronization point that can be reset, which is useful in some parallel programming styles. Exchanger allows two threads to Exchange objects at the collection point, which is useful in multi-pipelined designs.
4. And contracted Collection, this package also provides a Collection implementation designed for use in multi-threaded contexts: Concurrenthashmap, Concurrentskiplistmap, Concurrentskiplistset, Copyonwritearraylist and Copyonwritearrayset. When many threads are expected to access a given collection, Concurrenthashmap is usually better than synchronous hashmap,concurrentskiplistmap usually better than synchronous TreeMap. Copyonwritearraylist is better than synchronous ArrayList when the desired readings and traversal are far greater than the number of updates in the list.

Queue
about queues can be divided into: local queue and Distributed Queue 2 class
Local queue: Generally common for non-timely data bulk write, The obtained data can be cached in an array medium to a certain number of times when the bulk of the write, you can use Blockingqueue or list/map to achieve.
Related information: Sun Java API.
Distributed queue: Generally as a message middleware, to build a distributed environment sub-system and subsystem communication between the bridge, JEE environment is most used in the Apache Avtivemq and Sun Company's OPENMQ.
Lightweight MQ middleware has been introduced to you for example: Kestrel and Redis (Ref http://www.javabloger.com/article/mq-kestrel-redis-for-java.html), I've recently heard that LinkedIn's search technology team has launched an MQ product,-kaukaf (Ref Http://sna-projects.com/kafka), to keep an eye on it.
Related information:
1.activemq http://activemq.apache.org/getting-started.html
2.openmq  http:// mq.java.net/about.html
3.kafka       http://sna-projects.com/kafka       
4.JMS article   HTTP://WWW.JAVABLOGER.COM/ARTICLE/CATEGORY/JMS

NiO
NiO occurs in the post-JDK1.4 version, and before Java 1.4, the JDK provides a stream-oriented I/O system, such as a read/write file that processes data one byte at a time, An input stream produces one byte of data, one output stream consumes one byte of data, the I/O speed toward the stream is very slow, and a packet either has been received by the entire datagram, or not yet. Java NiO non-clogging technology is actually to take reactor mode, there is the content in the automatic notification, do not have to death, dead cycle, greatly improve the system performance. In the real situation, NIO technology uses two aspects, 1 is the file read and write operation, and 2 is the operation of the data stream on the network. There are several core objects in NiO that need to be mastered: 1 selectors (Selector), 2 channels (channel), 3 buffers (buffer).
my nonsense:
1. In the Java NIO Technology category, memory-mapped files are an efficient way to isolate the cold/hot data stored in the cache, and to handle a portion of the cache's cold data. This approach is much faster than regular stream-based or channel-based I/O, by making the data in the file appear as the contents of an array of memory, and the actual read or write portions of the file are mapped into memory, not the entire file in memory.
2. The database can also be manipulated using NIO technology in MySQL's JDBC driver to improve the performance of the system.

Long Connection/servlet3.0
The long connection here is long polling, the previous browser (client) need to focus on server-side data changes need to constantly access the server, so that the number of clients will inevitably cause a lot of pressure on the server side, such as: In the forum in the station messages. A new feature is now available in the SERVLET3.0 specification: Asynchronous IO Communication, which maintains a long connection. This technique of using SERVLET3 asynchronous request can greatly alleviate the pressure on the server side.
The principle of Servlet3.0 is to open a thread pending request requests, the middle set wait time to timeout, if the background event triggers request requests, the resulting results are returned to the client request requests, if there is no event occurred in the time set wait timeout to return the request to the client, the customer The client will initiate the request again, and the interaction with the server side can be repeated.
Like, you come to me first said if someone to find you, I immediately inform you that you come to see him, originally you need to constantly ask me whether I want to find you, and whether there is someone to find you, you need to constantly ask me if there is someone to find you, so that whether the person asked or asked people will be exhausted.

Log
Log4j is usually used by people's tools, the system is just on the line when the log is generally set at the level of info, real on-line after the general set in the error level, but no matter at any time, the input of the log is to be concerned about, Developers can generally rely on the output of the log to find the problem or rely on the output of the log to optimize the performance of the system, the log is the system's operational status of the report and the basis for troubleshooting.
In simple terms, the logs are exported to different environments according to the different policies and levels defined, so that we could analyze and manage them easily. On the contrary you do not have the output of the strategy, then a lot of machines, a long time, there will be a big push a messy log, will let you wrong when the error, so the output strategy of the log is to use the key point of the log.
Reference: http://logging.apache.org/log4j/1.2/manual.html

Package/Deploy
When designing a code, it is best to have different types of functional modules that are coarse-grained into different projects in the IDE, making it easy to deploy different jar packages in different environments. There is such a scenario: a daily scheduled remote from the SP to obtain the day 100 news and some of the city's weather forecast, although the daily data volume is not much, but the concurrency of the front-end access is very large, obviously need to do in the system architecture read and write separation.
If the functional modules of Web engineering and timed fetching are fully packaged in one project, it will result in the need for both Web applications and timers on each machine, since the function modules are not separated and the timer work on each machine will result in duplication of data within the database.
If the development of the web and the timer is divided into 2 projects, packaging can be deployed separately, 10 of the web for a timer, decomposition of the front-end request pressure, data writing will not be repeated. The other benefit of the
is that it can be shared, in which the Web and the timer both need to read the database, and the Web and timer projects have code to manipulate the database, and the logic of the code is confusing. If you pull out a DAL layer of the Jar,web and Timer application module developers only need to reference the DAL layer jar, develop related business logic, interface-oriented programming, no need to consider the specific database operations, specific database operations by other developers, can be in the development Task Division is very clear, and do not interfere.

Framework
The so-called popular SSH (struts/spring/hiberanet) lightweight framework, for many small and medium-sized projects is not lightweight, developers need not only maintain code, but also need to maintain cumbersome XML configuration files, And maybe a configuration file is not written so that the whole project will not work. No configuration files can replace the SSH (struts/spring/hiberanet) framework the product is really too much, and I've introduced some of the products (REF) to you before.
This I am not blindly against the use of SSH (struts/spring/hiberanet) framework, in my eyes the SSH framework really is to do the normative development, and do not use the SSH (struts/spring/hiberanet) framework can improve how much performance.
SSH framework just for a very large number of people on the team, but also need to continue to increase the size of the company, it is necessary to select some of the market is recognized, and familiar with the technology, SSH (struts/spring/hiberanet) framework is more mature, so it is the first product.
But for some small teams have a technical tall team can choose a more concise framework, really to speed up your development efficiency, the early abandonment of the SSH framework selection of more concise technology in small team development is a more knowingly choice.

What you need to focus on developing high-performance websites with Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.