Java Web Development High concurrency processing

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: http://blog.csdn.net/zhangzeyuaaa/article/details/44542161

Java's approach to designing databases in high-concurrency high-load Web sites (Java tutorials, Java processes large amounts of data, Java high-load data)

One: high-concurrency high-load web site focus on the database

Yes, the first is the database, which is the first SPOF most applications face. Especially the application of Web2.0, the database response is the first to solve.
MySQL is generally the most commonly used, probably originally a MySQL host, when the data increased to more than 1 million, then the performance of MySQL dropped sharply. The common optimization measures are synchronous replication of m-s (master-slave) mode, and operation of queries and operations on separate servers. I recommend the M-m-slaves way, 2 main MySQL, multiple slaves, it should be noted that although there are 2 master, but at the same time only 1 is active, we can switch at a certain time. The reason for using 2 m is to ensure that M will not become a system spof. The
slaves can be further load balanced and can be combined with LVS to properly balance the select operation to different slaves.
The above architecture can contend for a certain amount of load, but as the user grows further, your user table data exceeds 10 million, and that M becomes spof. You can not arbitrarily expand the slaves, otherwise the cost of replication synchronization will go straight up, how to do? My method is table partitioning, which is partitioned from the business level. The simplest, take the user data as an example. According to a certain way of segmentation, such as ID, segmentation to a different database cluster.

A global database query for meta data. The disadvantage is that each query, will be added once, for example, you want to check a user Nightsailer, you first to the global database group to find nightsailer corresponding cluster ID, and then to the specified cluster to find nightsailer actual data. &NBSP
Each cluster can be m-m, or m-m-slaves. This is an extensible structure, and as the load increases, you can simply add the new MySQL cluster to go in.

Note that:
1, disable all auto_increment fields
2, the ID needs to use a common algorithm to centrally allocate
3, To have a better way to monitor the load on the MySQL host and the running state of the service. If you have more than 30 MySQL databases running, you know what I mean.
4, do not use persistent links (not with pconnect), instead, use Sqlrelay as a third-party database link pool, or simply do it yourself, because the link pool for MySQL in PHP4 is often problematic.

Second: HTML static of the system architecture of high concurrent high load Web site

In fact, we all know that the most efficient, the least expensive is the pure static http://www.ablanxue.com/shtml/201207/776.shtml HTML page, so we try to make the page on our site using static pages to achieve, The simplest method is actually the most effective way. But for a lot of content and frequently updated sites, we can not all manually to achieve, so we have a common information distribution system CMS, like we often visit the various portals of the news channel, and even their other channels, are through the information distribution system to manage and implement, Information Publishing system can achieve the simplest information input automatically generated static pages, but also with channel management, rights management, automatic capture and other functions, for a large web site, has a set of efficient, manageable CMS is essential.

In addition to the portal and the type of information publishing site, for the interactive requirements of the Community type site, as much as possible static is also to improve the performance of the necessary means, the community posts, articles in real-time static, there is a renewal of the time and re-static is a lot of use of the strategy, A hodgepodge like mop is the use of such strategies, such as the NetEase community.

At the same time, HTML static is also the use of some caching policies, for the system frequently using database queries but the content of small updates, you can consider the use of HTML static, such as forum public settings information, This information is currently the mainstream forum can be managed in the background and stored in the database, which is actually a lot of the foreground program calls, but the update frequency is very small, you can consider this part of the background update the time to static, so as to avoid a large number of database access requests high concurrency.

Web site HTML Static solution
When a servlet resource request arrives at the Web server, we populate the specified JSP page to respond to the request:

HTTP request---Web server---servlet--business logic processing--access data--populate jsp--response request

After the HTML is statically initialized:

HTTP request---Web server---servlet--html--response request

Static adapting are as follows

Servlet:

public void doget (HttpServletRequest request, httpservletresponse response)
Throws Servletexception, IOException {
if (Request.getparameter ("chapterid") = null) {
String chapterfilename = "Bookchapterread_" +request.getparameter ("Chapterid") + ". html";
String Chapterfilepath = Getservletcontext (). Getrealpath ("/") + Chapterfilename;
File Chapterfile = new file (Chapterfilepath);
if (chapterfile.exists ()) {response.sendredirect (chapterfilename); return;} If you have this file, tell the browser to turn.
Inovelchapterbiz novelchapterbiz = new Novelchapterbizimpl ();
Novelchapter novelchapter = Novelchapterbiz.searchnovelchapterbyid (Integer.parseint (Request.getParameter (" Chapterid ")));//Chapter Information
int Lastpageid = Novelchapterbiz.searchlastchapterid (Novelchapter.getnovelid (). GetId (), Novelchapter.getid ());
int Nextpageid = Novelchapterbiz.searchnextchapterid (Novelchapter.getnovelid (). GetId (), Novelchapter.getid ());
Request.setattribute ("Novelchapter", Novelchapter);
Request.setattribute ("Lastpageid", Lastpageid);
Request.setattribute ("Nextpageid", Nextpageid);
New Createstatichtmlpage (). Createstatichtmlpage (Request, Response, Getservletcontext (),
Chapterfilename, Chapterfilepath, "/bookread.jsp");
}
}
To generate an HTML static page class:

Package com.jb.y2t034.thefifth.web.servlet;
Import Java.io.ByteArrayOutputStream;
Import Java.io.FileOutputStream;
Import java.io.IOException;
Import Java.io.OutputStreamWriter;
Import Java.io.PrintWriter;
Import Javax.servlet.RequestDispatcher;
Import Javax.servlet.ServletContext;
Import javax.servlet.ServletException;
Import Javax.servlet.ServletOutputStream;
Import Javax.servlet.http.HttpServletRequest;
Import Javax.servlet.http.HttpServletResponse;
Import Javax.servlet.http.HttpServletResponseWrapper;
/**
* Create HTML static page
* Function: Create HTML static page
* Time: 2009 1011 Days
* Location: Home
* @author MAVK
*
*/
public class Createstatichtmlpage {
/**
* Methods for generating static HTML pages
* @param Request Object
* @param Response Response Object
* @param servletcontext servlet context
* @param filename File name
* @param filefullpath File full path
* @param jsppath need to generate the JSP path of the static file (relative)
* @throws IOException
* @throws servletexception
*/
public void Createstatichtmlpage (HttpServletRequest request, HttpServletResponse Response,servletcontext servletcontext,string filename,string filefullpath,string Jsppath) throws Servletexception, IOException{
Response.setcontenttype ("text/html;charset=gb2312");//Set HTML result stream encoding (i.e. HTML file encoding)
RequestDispatcher rd = Servletcontext.getrequestdispatcher (jsppath);//Get JSP Resources
Final Bytearrayoutputstream Bytearrayoutputstream = new Bytearrayoutputstream ();//used to receive resources from Servletoutputstream
Final Servletoutputstream Servletouputstream = new Servletoutputstream () {//For receiving resources from HttpServletResponse
public void Write (byte[] b, int off,int len) {
Bytearrayoutputstream.write (b, off, Len);
}
public void Write (int b) {
Bytearrayoutputstream.write (b);
}
};
Final PrintWriter printwriter = new PrintWriter (new OutputStreamWriter (Bytearrayoutputstream));//convert byte stream to character streams
HttpServletResponse httpservletresponse = new Httpservletresponsewrapper (response) {//for fetching result stream resources from response (two overriding methods)
Public Servletoutputstream Getoutputstream () {
return servletouputstream;
}
Public PrintWriter getwriter () {
return printwriter;
}
};
Rd.include (Request, httpservletresponse);//Send result stream
Printwriter.flush ();//flush buffer, put buffer data output
FileOutputStream FileOutputStream = new FileOutputStream (Filefullpath);
Bytearrayoutputstream.writeto (FileOutputStream);//write all resources in Bytearrayouputstream to Fileouputstream
Fileoutputstream.close ();//close the output stream and release the related resources
Response.sendredirect (fileName);//Send the specified file stream to the client
}
}

Third: high-concurrency high-load site focus on the cache, load balancing, storage

Cache is another big problem, I generally use memcached to do cache clustering, generally deployed about 10 units around the same (10g memory pool). Be aware that you must not use
Swap, it's best to turn off Linux swap.

Load Balancing/acceleration

It may be said that when the cache, someone first thought is the page static, so-called static HTML, I think this is common sense, does not belong to the point. The static of the page is followed by the static service
Load balancing and acceleration. I think Lighttped+squid is the best way to do it.
LVS <------->lighttped====>squid (s) ====lighttpd

I often use it on top. Note that I do not use Apache, unless specific needs, otherwise I do not deploy Apache, because I generally use php-fastcgi with LIGHTTPD,
Performance is much stronger than apache+mod_php.

Squid can be used to solve the synchronization of files and so on, but you need to be aware that you have to monitor the cache hit rate, as much as possible to improve more than 90%.
Squid and lighttped also have a lot of topics to discuss, here do not repeat.

Store
Storage is also a big problem, a small file storage, such as tablets. The other is large file storage, such as search Engine index, the general single file is more than 2g.
The simplest way to store small files is to combine lighttpd to distribute them. Or simply use Redhat's GFS, the advantage is the application is transparent, the disadvantage is the high cost. I mean,
You're buying a problem with a disk array. In my project, the storage capacity is 2-10TB, and I used the distributed store. Here to resolve the file duplication and redundancy.
This allows for different redundancy for each file, which can be referenced by Google's GFS paper.
Large file storage can refer to the Nutch scheme, which is now standalone for Hadoop sub-projects. (You can Google it)

Other:

In addition, passport and so on are also considered, but all belong to the relatively simple.

Four: High-concurrency high-load Web Site system architecture Picture Server separation

You know, for the Web server, whether it is Apache, IIS or other containers, the picture is the most consumption of resources, so we have to separate the picture and the page, which is basically a large site will adopt the strategy, they have a separate picture server, and even many picture server. This architecture can reduce the server system pressure to provide page access requests, and can ensure that the system does not crash due to picture problems, on the application server and picture server, can be different configuration optimization, such as Apache in the configuration of contenttype can be as little as possible to support, LoadModule as little as possible to ensure higher system consumption and execution efficiency.

Separation of image servers using Apache
Reason:
Start-up applications are likely to be deployed on a single server (for cost reasons)
The first separation of priorities is certainly the database and application server.
What would be the second one to separate? Each have their own consideration, my project team focused on the savings of bandwidth, server performance is good, the bandwidth is high, concurrency comes, also easy to hold. So the focus of this article is here. The focus here is to introduce practice, not necessarily meet all the circumstances, for the reference of the people,
Environment Introduction:
Web Application Server: 4CPU Dual core 2G, memory 4G
Deployment: Win2003/apache Http Server 2.1/tomcat6
Database server: 4CPU Dual core 2G, memory 4G
Deployment: win2003/mssql2000
Steps:
Step One: Add 2 units configured to: 2CPU Dual core 2G, memory 2G normal server, do resource server
Deployment: Tomcat6, ran a simple image upload application, (remember to specify the <distributable/> of Web. xml), and specify the domain name as res1.***.com,res2.***.com, using the AJP protocol
Step two: Modify the Apache httpd.conf configuration
The original application file Upload function URL is:
1,/fileupload.html
2,/otherupload.html
Add the following configuration in the httpd.conf

<virtualhost *:80>
ServerAdmin [Email protected]***.com
proxypass/fileupload.html balancer://rescluster/fileupload lbmethod=byrequests Stickysession=JSESSIONID nofailover =off timeout=5 maxattempts=3
proxypass/otherupload.html balancer://rescluster/otherupload.html lbmethod=byrequests Stickysession=JSESSIONID Nofailover=off timeout=5 maxattempts=3
#<!--Load Balancing-
<proxy balancer://rescluster/>
Balancermember ajp://res1.***.com:8009 smax=5 max=500 ttl=120 retry=300 loadfactor=100 route=tomcat1
Balancermember ajp://res2.***.com:8009 smax=5 max=500 ttl=120 retry=300 loadfactor=100 route=tomcat2
</Proxy>

</VirtualHost>
Step three, modify the business logic:
All uploaded files are stored in the database in a full URL, such as the product picture path: http://res1.***.com/upload/20090101/product120302005.jpg

Now, you can rest easy, when the bandwidth is not enough, add a dozens of image server, just a slight modification of the Apache configuration file, you can.

V: High concurrency High load Web site's system architecture database cluster and library table hash

Large Web sites have complex applications, which must use databases, and in the face of a large number of accesses, the bottleneck of the database can soon be revealed, when a database will soon be unable to meet the application, so we need to use the database cluster or library table hash.

In the database cluster, many databases have their own solutions, Oracle, Sybase and so on have a good solution, the common MySQL provided by the Master/slave is a similar scenario, you use what kind of db, refer to the corresponding solutions to implement.

The database cluster mentioned above is constrained by the DB type used in architecture, cost, and extensibility, so we need to consider improving the system architecture from the perspective of the application, and the library table hashing is the most common and effective solution. We install the business and application in the application or function module to separate the database, different modules corresponding to different databases or tables, and then according to a certain policy on a page or function of a smaller database hash, such as the user table, according to user ID for the table hash, This makes it possible to improve the performance of the system at a low cost and has a good scalability. Sohu Forum is the use of such a framework, the Forum users, settings, posts and other information database separation, and then to the post, the user in accordance with the plate and ID hash database and table, finally can be configured in the configuration file simple configuration will allow the system at any time to add a low-cost database to supplement the system performance.

Classification of cluster Software:
In general, cluster software is divided into three main categories: High performance cluster (performance CLUSTER,HPC), load Balancing cluster (load balance cluster, LBC), and high availability cluster based on the direction of focus and the problem to be solved Availability Cluster,hac).
High-performance cluster (performance CLUSTER,HPC), which uses multiple machines in a cluster to accomplish the same task, making it much faster and more reliable to accomplish tasks than the single-machine operation. compensate for the performance of single-machine deficiencies. The cluster has a lot of data, such as weather forecast, environmental monitoring and so on, and it is more widely used in computing complex environment.
Load Balancer cluster (load balance cluster, LBC), it is the use of a cluster of multiple machines, the completion of many parallel small work. In general, if an application uses more people, then the response time of the user request will be increased, the performance of the machine will be affected, if the use of load Balancing cluster, then any machine in the cluster can respond to the user's request, so that the cluster will be after the user issued a service request, select the load is minimal, This machine is able to provide the best service to accept requests and corresponding, so that the cluster can be used to increase the availability and stability of the system. This kind of cluster is used more in the website;
High Availability cluster (HI availability CLUSTER,HAC), which utilizes the redundancy of the system in the cluster, when a machine in the system is damaged, the other backup machine can quickly replace it to start the service, waiting for the fault machine to repair and return. Maximize the availability of services in your cluster. This kind of system generally in the bank, the telecommunication service this kind to the system reliability has the high request domain to have the widespread application.
2 Status of the database cluster
DB cluster is the introduction of computer clustering technology into the database to achieve, although the manufacturers claim that their architecture is perfect, but always can not change the Oracle yourselves, we chase the fact that the cluster solution on the Oracle RAC or other database vendors, including Microsoft, It can meet the requirements of high availability, high performance, database load balancing and easy expansion.
Oracle ' s Real application Cluster (RAC)
Microsoft SQL Cluster Server (MSCS)
IBM ' s DB2 udb high Availability Cluster (UDB)
Sybase ASE High Availability Cluster (ASE)
MySQL High availability Cluster (MySQL CS)
IO-based third-party ha (high availability) cluster
At present, the main database cluster technology has the above six categories, the database vendors have developed their own; there are also third-party cluster companies developed, as well as database vendors and third-party cluster companies to develop, various types of cluster implementation of the function and architecture are not the same.
RAC (Real application Cluster, true application cluster) is a new technology used in oracle9i database and the core technology of Oracle database supporting grid computing environment. Its emergence solves an important problem in the traditional database application: High performance, high scalability and low price contradiction. For a long time, Oracle dominated the cluster database market with its real-time application clustering technology (real application Cluster,rac)

VI: Caching of system architectures for high-concurrency high-load Web sites

The word cache has been touched by technology, and caches are used in many places. Caching in the Web site architecture and Web development is also very important. Here we first describe the two most basic caches. The advanced and distributed caches are described later.
Architecture cache, people familiar with Apache can know that Apache provides its own cache module, can also use the addition of Squid module for caching, both of which can effectively improve the access response of Apache.
Web application development cache, the memory cache provided on Linux is a common cache interface, can be used in web development, such as Java development can call MemoryCache to some data caching and communication sharing, some large communities use such a framework. In addition, in the use of web language development, all kinds of languages have their own cache modules and methods, PHP has pear cache module, Java more,. NET is not very familiar with, I believe there is certainly.

Java Open Source Caching framework
Jbosscache/treecache JBossCache is a replicated transaction cache that allows you to cache enterprise application data to better improve performance. Cached data is automatically copied, allowing you to easily perform cluster work between JBoss servers. Jbosscache can run an Mbean service through JBoss application service or another Java EE container, and of course it can run independently. The Jbosscache consists of two modules: Treecache and TREECACHEAOP. Treecache--is a tree-structured transactional cache of replication. TREECACHEAOP--is an "object-oriented" cache that uses AOP to dynamically manage Pojo
Designed by Opensymphony, the Oscache oscache tag Library is a groundbreaking JSP custom tagging application that provides the ability to implement fast memory buffering within existing JSP pages. Oscache is a broad-based, high-performance, Oscache cache framework that can be used in a common caching solution for any Java application. Oscache has the following features: Caching any object, you can cache part of the JSP page or HTTP request without restriction, any Java object can be cached. Having a comprehensive Api--oscache API gives you a comprehensive program to control all the Oscache features. Persistent caching-caches can write to the hard disk at will, so it allows expensive creation (expensive-to-create) of data to keep the cache, even allowing the app to restart. Support cluster--the cluster cache data can be configured by a single parameter without the need to modify the code. Expiration of cached records-you can have maximum control over expiration of cached objects, including pluggable refresh policies (if default performance is not required).
Jcache Jcache is an upcoming standard specification (JSR 107) that illustrates a way to temporarily cache Java objects in memory, including object creation, shared access, spooling (spooling), invalidation, consistency of each JVM, and so on. It can be used to cache the most frequently read data in a JSP, such as a product catalog and a price list. With Jcache, the response time of most queries can be accelerated by having cached data (internal tests indicate a response time of about 15 times times faster).
Ehcache Ehcache is derived from hibernate and is used in hibernate as a solution for data caching.
The Java Caching System JCS is a subproject of the Jakarta Project turbine. It is a composite buffer tool. You can buffer objects to memory, hard disks. Has buffer object time expiration setting. You can also build a distributed, buffered architecture with JCS for high-performance applications. For some objects that require frequent access and are very resource-intensive each time they are accessed, they can be temporarily stored in a buffer, which can improve the performance of the service. And JCS is a good buffer tool. The buffer tool is significantly more useful for read operations than write operations.
Swarmcache Swarmcache is a simple and powerful distributed caching mechanism. It uses IP multicast to effectively communicate between cached instances. It is an ideal choice to quickly improve the performance of clustered Web applications.
Shiftone Shiftone Object Cache This Java library provides basic objects caching capabilities. The implemented strategy is first-in, in-out (FIFO), Recently Used (LRU), least frequently Used (LFU). All policies maximize the size of the element and maximize its lifetime.
Whirlycache Whirlycache is a fast, configurable cache of objects that exist in memory. It can speed up the site or application by caching objects, or it must be established by querying the database or other expensive handlers.
Jofti Jofti can index and search objects in the cache layer (supporting Ehcache,jbosscache and Oscache) or objects in a storage structure that supports the map interface. The framework also provides an easy-to-use query function for search by providing transparent functionality for object additions and deletions in the index.
Cache4j cache4j is a Java object cache with a simple API and fast implementation. Its features include: Caching in memory, designed for multithreaded environments, two implementations: synchronization and blocking, multiple cache purge policies: LFU, LRU, FIFO, use of strong references (strong reference) and soft references (soft reference) to store objects.
Open Terracotta is a JVM-class open-source cluster framework that provides: HTTP session replication, distributed caching, Pojo clustering, cross-cluster JVM for distributed application Orchestration (in the form of code injection, so you don't need to modify anything).
Sccache Shop. The object cache system used by COM. Sccache is a in-process cache with level two, shared caches. It stores the cached object on disk. Supports association key, any size key and any size data. Ability to automate garbage collection.
Shoal Shoal is a Java-extensible dynamic Cluster framework that provides infrastructure support for building fault-tolerant, reliable, and usable Java applications. This framework can also be integrated into any Java product that does not want to be bound to a specific communication protocol but requires clustering and distributed system support. Shoal is the cluster engine for GlassFish and Jonas application servers.
Simple-spring-memcached simple-spring-memcached, which encapsulates the call to Memcached, makes Memcached client development unusually simple.

Java Web Development High concurrency processing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java Web Development High concurrency processing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java Web Development High concurrency processing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support