Java Web Development High concurrency processing

Source: Internet
Author: User
Tags database load balancing memcached mysql host jboss

Java's approach to designing databases in high-concurrency high-load Web sites (Java tutorials, Java processes large amounts of data, Java High-load data)

One: High-concurrency high-load web site focus on the database

Yes, the first is the database, which is the first spof that most applications face. Especially the application of Web2.0, the database response is the first to solve.
In General, MySQL is the most commonly used, may initially be a MySQL host, when the data increased to more than 1 million, then the performance of MySQL dropped sharply. The common optimization measures are synchronous replication of m-s (master-slave) mode, and operation of queries and operations on separate servers. I recommend the M-m-slaves way, 2 main MySQL, multiple slaves, it should be noted that although there are 2 master, but at the same time only 1 is active, we can switch at a certain time. The reason for using 2 m is to ensure that M will not become a system spof.  
The slaves can be further load balanced and can be combined with LVS to properly balance the select operation to different slaves.  
The above architecture can contend for a certain amount of load, but as the user grows further, your user table data exceeds 10 million, and that M becomes spof. You can not arbitrarily expand the slaves, otherwise the cost of replication synchronization will go straight up, how to do? My method is table partitioning, which is partitioned from the business level. The simplest, take the user data as an example. According to a certain way of segmentation, such as ID, segmentation to a different database cluster.  

The global database is used for the Meta data query. The disadvantage is that each query, will be added once, for example, you want to check a user Nightsailer, you first to the global database group to find nightsailer corresponding cluster ID, and then to the specified cluster to find nightsailer actual data.  
each cluster can be in m-m mode, or M-m-slaves way. This is an extensible structure, and as the load increases, you can simply add the new MySQL cluster to go in.  

It is important to note that:
1. Disable all auto_increment fields
2, the ID needs to use the common algorithm centralized allocation
3, to have a better way to monitor the load of the MySQL host and the running state of the service. If you have more than 30 MySQL databases running, you know what I mean.  
4, do not use persistent links (not with pconnect), instead, use sqlrelay such third-party database link pool, or simply do it yourself, because the link pool of MySQL in php4 often problems.  

Second: HTML Static of the system architecture of high concurrent high load Web site

In fact, we all know that the most efficient, the least expensive is the pure static http://www.ablanxue.com/shtml/201207/776.shtml HTML page, so we try to make the page on our site using static pages to achieve, The simplest method is actually the most effective way. But for a lot of content and frequently updated sites, we can not all manually to achieve, so we have a common information distribution system CMS, like we often visit the various portals of the news channel, and even their other channels, are through the information distribution system to manage and implement, Information Publishing system can achieve the simplest information input automatically generated static pages, but also with channel management, rights management, automatic capture and other functions, for a large web site, has a set of efficient, manageable CMS is essential.  
   
In addition to the portal and the type of information publishing site, for the interactive requirements of the Community type site, as much as possible static is also to improve the performance of the necessary means, the community posts, articles in real-time static, there is a renewal of the time and re-static is a lot of use of the strategy, A hodgepodge like mop is the use of such strategies, such as the NetEase community.  
   
at the same time, HTML static is also the use of some caching policies, for the system frequently using database queries but the content of small updates, you can consider the use of HTML static, such as forum public settings information, This information is currently the mainstream forum can be managed in the background and stored in the database, which is actually a lot of the foreground program calls, but the update frequency is very small, you can consider this part of the background update the time to static, so as to avoid a large number of database access requests high concurrency.  
   

Web site HTML static solution
when a servlet resource request arrives at the Web server, we populate the specified JSP page to respond to the request:

HTTP Request---Web server---servlet--business logic processing--access data--populate jsp--response request

after the HTML is statically initialized:

HTTP Request---Web server---servlet--html--response request

Third: High-concurrency high-load site focus on the cache, load balancing, storage

cache is another big problem, I generally use memcached to do cache clustering, generally deployed about 10 units around the same (10g memory pool). Be aware that you must not use
swap, it's best to turn off Linux swap.  


load Balancing/acceleration

It may be said that when the cache, someone first thought is the page static, so-called static HTML, I think this is common sense, does not belong to the point. The static of the page is followed by the static service
load balancing and acceleration. I think Lighttped+squid is the best way to do it.  
LVS <------->lighttped====>squid (s) ====lighttpd

I often use it on top. Note that I do not use Apache, unless specific needs, otherwise I do not deploy Apache, because I generally use php-fastcgi with lighttpd ,
performance is much stronger than apache+mod_php.  

squid can be used to solve the synchronization of files and so on, but you need to be aware that you have to monitor the cache hit rate, as much as possible to improve more than 90%.  
Squid and lighttped also have a lot of topics to discuss, here do not repeat.  


Storage
storage is also a big problem, a small file storage, such as tablets. The other is large file storage, such as search Engine index, the general single file is more than 2g.  
The simplest way to store small files is to combine lighttpd to distribute them. Or simply use Redhat's GFS, the advantage is the application is transparent, the disadvantage is the high cost. I mean,
you're buying a problem with a disk array. In my project, the storage capacity is 2-10TB, and I used the distributed store. Here to resolve the file duplication and redundancy.  
This allows for different redundancy for each file, which can be referenced by Google's GFS paper.  
large file storage can refer to the Nutch scheme, which is now standalone for Hadoop sub-projects. (You can Google it)  

four: high concurrent high-load Web Site system architecture Picture Server separation  
 


using Apache to achieve Image server separation  
 
The initial phase of the application may be deployed on a single server (for cost reasons)  
 
What will the second separate be? Each have their own consideration, my project team focused on the savings of bandwidth, server performance is good, the bandwidth is high, concurrency comes, also easy to hold.

V: High concurrency High load Web site's system architecture database cluster and library table hash

large Web sites have complex applications, which must use databases, and in the face of a large number of accesses, the bottleneck of the database can soon be revealed, when a database will soon be unable to meet the application, so we need to use the database cluster or library table hash.  
   
in the database cluster, many databases have their own solutions, Oracle, Sybase and so on have a good solution, the common MySQL provided by the Master/slave is a similar scenario, you use what kind of db, refer to the corresponding solutions to implement.  
   
The database cluster mentioned above is constrained by the DB type used in architecture, cost, and extensibility, so we need to consider improving the system architecture from the perspective of the application, and the library table hashing is the most common and effective solution. We install the business and application in the application or function module to separate the database, different modules corresponding to different databases or tables, and then according to a certain policy on a page or function of a smaller database hash, such as the user table, according to user ID for the table hash, This makes it possible to improve the performance of the system at a low cost and has a good scalability. Sohu Forum is the use of such a framework, the Forum users, settings, posts and other information database separation, and then to the post, the user in accordance with the plate and ID hash database and table, finally can be configured in the configuration file simple configuration will allow the system at any time to add a low-cost database to supplement the system performance.  


Classification of cluster software:
generally speaking, the cluster software is divided into three categories according to the direction of focus and the problem to be solved: High performance cluster (performance CLUSTER,HPC), load Balancing cluster (load balance cluster, LBC), High Availability cluster (HI availability cluster,hac).  
High-performance cluster (performance CLUSTER,HPC), which uses multiple machines in a cluster to accomplish the same task, making it much faster and more reliable to accomplish tasks than the single-machine operation. compensate for the performance of single-machine deficiencies. The cluster has a lot of data, such as weather forecast, environmental monitoring and so on, and it is more widely used in computing complex environment .
Load Balancer cluster (load balance cluster, LBC), it is the use of a cluster of multiple machines, the completion of many parallel small work. In general, if an application uses more people, then the response time of the user request will be increased, the performance of the machine will be affected, if the use of load Balancing cluster, then any machine in the cluster can respond to the user's request, so that the cluster will be after the user issued a service request, select the load is minimal, This machine is able to provide the best service to accept requests and corresponding, so that the cluster can be used to increase the availability and stability of the system. This kind of cluster is used more in the website;
High Availability cluster (HI availability CLUSTER,HAC), which utilizes the redundancy of the system in the cluster, when a machine in the system is damaged, the other backup machine can quickly replace it to start the service, waiting for the fault machine to repair and return. Maximize the availability of services in your cluster. This kind of system generally in the bank, the telecommunication service this kind to the system reliability has the high request domain to have the widespread application.  
2 status of the database cluster
DB cluster is the introduction of computer clustering technology into the database to achieve, although the manufacturers claim that their architecture is perfect, but always can not change the Oracle yourselves, we chase the fact that the cluster solution on the Oracle RAC or other database vendors, including Microsoft , it can meet the requirements of high availability, high performance, database load balancing and easy expansion.  

IO-based third-party ha (high availability) cluster
At present, the main database cluster technology has the above six categories, the database vendors have developed their own; there are also third-party cluster companies developed, as well as database vendors and third-party cluster companies to develop, various types of cluster implementation of the function and architecture are not the same.  
RAC (Real application Cluster, true application cluster) is a new technology used in oracle9i database and the core technology of Oracle database supporting grid computing environment. Its emergence solves an important problem in the traditional database application: High performance, high scalability and low price contradiction. For a long time, Oracle dominated the cluster database market with its real-time application clustering technology (real application Cluster,rac)

VI: Caching of system architectures for high-concurrency high-load Web sites

The word cache has been touched by technology, and caches are used in many places. Caching in the Web site architecture and Web development is also very important. Here we first describe the two most basic caches. The advanced and distributed caches are described later.  
architecture Cache, people familiar with Apache can know that Apache provides its own cache module, can also use the addition of Squid module for caching, both of which can effectively improve the access response of Apache.  
Web site program development cache, the memory cache provided on Linux is a common cache interface, can be used in web development, such as Java development can call MemoryCache to some data caching and communication sharing, some large Community to use such a framework. In addition, in the use of web language development, all kinds of languages have their own cache modules and methods, PHP has pear cache module, Java more,. NET is not very familiar with, I believe there is certainly.  



Java Open Source caching Framework
Jbosscache/treecache JBossCache is a replicated transaction cache that allows you to cache enterprise application data to better improve performance. Cached data is automatically copied, allowing you to easily perform cluster work between JBoss servers. Jbosscache can run an Mbean service through JBoss application service or another Java EE container, and of course it can run independently. The Jbosscache consists of two modules: Treecache and TREECACHEAOP. Treecache--is a tree-structured transactional cache of replication. TREECACHEAOP--is an "object-oriented" cache that uses AOP to dynamically manage Pojo
designed by Opensymphony, the Oscache oscache tag Library is a groundbreaking JSP custom tagging application that provides the ability to implement fast memory buffering within existing JSP pages. Oscache is a broad-based, high-performance, Oscache cache framework that can be used in a common caching solution for any Java application. Oscache has the following features: Caching any object, you can cache part of the JSP page or HTTP request without restriction, any Java object can be cached. Having a comprehensive Api--oscache API gives you a comprehensive program to control all the Oscache features. Persistent caching-caches can write to the hard disk at will, so it allows expensive creation (expensive-to-create) of data to keep the cache, even allowing the app to restart. Support cluster--the cluster cache data can be configured by a single parameter without the need to modify the code. Expiration of cached records-you can have maximum control over expiration of cached objects, including pluggable refresh policies (if default performance is not required).  
Jcache Jcache is an upcoming standard specification (JSR 107) that illustrates a way to temporarily cache Java objects in memory, including object creation, shared access, spooling (spooling), invalidation, consistency of each JVM, and so on. It can be used to cache the most frequently read data in a JSP, such as a product catalog and a price list. With Jcache, the response time of most queries can be accelerated by having cached data (internal tests indicate a response time of about 15 times times faster).  
Ehcache Ehcache is derived from hibernate and is used in hibernate as a solution for data caching.  
The Java Caching System JCS is a subproject of the Jakarta Project turbine. It is a composite buffer tool. You can buffer objects to memory, hard disks. Has buffer object time expiration setting. You can also build a distributed, buffered architecture with JCS for high-performance applications. For some objects that require frequent access and are very resource-intensive each time they are accessed, they can be temporarily stored in a buffer, which can improve the performance of the service. And JCS is a good buffer tool. The buffer tool is significantly more useful for read operations than write operations.  
Swarmcache Swarmcache is a simple and powerful distributed caching mechanism. It uses IP multicast to effectively communicate between cached instances. It is an ideal choice to quickly improve the performance of clustered Web applications.  
Shiftone Shiftone Object Cache This Java library provides basic objects caching capabilities. The implemented strategy is first-in, in-out (FIFO), Recently Used (LRU), least frequently Used (LFU). All policies maximize the size of the element and maximize its lifetime.  
Whirlycache Whirlycache is a fast, configurable cache of objects that exist in memory. It can speed up the site or application by caching objects, or it must be established by querying the database or other expensive handlers.  
jofti Jofti can index and search objects in the cache layer (supporting Ehcache,jbosscache and Oscache) or objects in a storage structure that supports the map interface. The framework also provides an easy-to-use query function for search by providing transparent functionality for object additions and deletions in the index.  
cache4j cache4j is a Java object cache with a simple API and fast implementation. Its features include: Caching in memory, designed for multithreaded environments, two implementations: synchronization and blocking, multiple cache purge policies: LFU, LRU, FIFO, use of strong references (strong reference) and soft references (soft reference) to store objects.  
Open Terracotta is a JVM-class open-source cluster framework that provides: HTTP session replication, distributed caching, Pojo clustering, cross-cluster JVM for distributed application Orchestration (in the form of code injection, so you don't need to modify anything).  
Sccache Shop. The object cache system used by COM. Sccache is a in-process cache with level two, shared caches. It stores the cached object on disk. Supports association key, any size key and any size data. Ability to automate garbage collection.  
Shoal Shoal is a Java-extensible dynamic Cluster framework that provides infrastructure support for building fault-tolerant, reliable, and usable Java applications. This framework can also be integrated into any Java product that does not want to be bound to a specific communication protocol but requires clustering and distributed system support. Shoal is the cluster engine for GlassFish and Jonas application servers.  
simple-spring-memcached simple-spring-memcached, which encapsulates the call to Memcached, makes Memcached client development unusually simple.

Java Web Development High concurrency processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.