Java Big Data Processing tuning

Source: Internet
Author: User
Tags mysql query

Overall, for large web sites, such as portals, in the face of a large number of user access, high concurrent requests, the basic solution is concentrated in such a few aspects:
1. First of all need to solve the network bandwidth and the high concurrency of Web requests, need to reasonably increase the input of the server and bandwidth, and the need to fully utilize the system software, hardware caching mechanism, the content can be cached cache storage, reduce the pressure of the compute and storage layers.

2. Secondly, the business Server and the business support server should be reasonably layered, and the parallel computing and distributed algorithms are used to deal with a large number of computations, and in the process of development, the Java SDK and the package (Concurrency) are used to encode the implementation.

3. The storage layer needs to be built with Distributed file servers and Columnstore servers to support the storage and reading of large amounts of data, and to optimize the configuration parameters of relational data.

4. We also need to be aware that the future is adjusted and optimized according to the state of the system operation and the different business scenarios in the platform.

For large-scale systems, the technology used is very wide, from hardware to software, programming language, database, WebServer, firewalls and other fields have a very high requirements. In the face of a large number of user access, high concurrent requests, the basic solution is focused on a number of aspects: the use of high-performance servers, high-performance databases, high-efficiency programming language, as well as high-performance web containers. However, in addition to these aspects, there is no solution to the high-load and high concurrency problems, so it is necessary to load the computation and the pressure of loads on each computer, using different server cluster units for distributed and parallel computing, in the face of the pressure generated.

Some of the most common means of each link:

One, application server load Balancing
1. Link Load Balancing
When resolving a domain name through DNS, the client's access is resolved to a different IP, assigned to a different entry, and as far as possible ensure that the access is the one that may be faster in all the portals.
2. Software Load Balancing
The task of generating a page during access is assigned to one of the servers to complete, guaranteeing fairness, fairness and averaging. Software four-layer switching we can use the common LVS on Linux to solve, LVs is Linux Virtual Server.
3. Hardware load Balancing

The fourth layer Exchange uses the header information of the third layer and fourth layer packets, according to the application interval to identify the business flow, the entire interval segment of the business flow distribution to the appropriate application server for processing. The range of applications in layer fourth switching is determined by the source and endpoint IP addresses, TCP and UDP ports. In the hardware four-layer Exchange product field, there are some well-known products can be selected, such as Alteon, F5 and so on.

Latest: CDN Acceleration Technology. The full name of the CDN is the content distribution network. The goal is to add a new layer of network architecture to the existing Internet, publish the content of the site to the "Edge" of the network closest to the user, so that users can get the content they need and improve the responsiveness of users to the site.

Second, Image server separation

For the Web server, whether it is Apache, IIS or other containers, the picture is the most resource-intensive, so we have to separate the image and page, which is basically a large site will adopt the strategy, they have a separate, and even many of the picture server. Such architectures can reduce the pressure on the server system that provides page access requests and ensure that the system does not crash due to picture problems. In the application server and picture server, can be different configuration optimization, such as Apache in the configuration of contenttype can be as little as possible to support, as few loadmodule, to ensure higher system consumption and execution efficiency.


third, page optimization
1, reduce the number of requests
Reduce the number of requests by merging CSS and JavaScript files or distribute the resource files across multiple domain names to bypass the browser's concurrent loading restrictions.
2. Compress CSS and JavaScript code.
Reduce code storage space by removing line breaks and spaces for file code content.
3, optimize the picture
Optimize images by capturing and zooming pictures, speeding up the loading of pictures.
4. Static HTML
Use Freemarker to statically convert database data into HTML files to improve access speed. Applicable occasions: for content containing not required real-time web pages can be used, such as the home page, the module home page news, announcements and so on.

iv. optimization of Java design
1. Design mode
Singleton mode, proxy mode, enjoy meta mode, decorator mode, observer mode.
2. Cache
such as: Using Ehcache can combine AOP, do business layer method cache, with class name, method name, parameter name as key, result object as value. Applicable occasions: The data is not updated frequently, the query method is fixed.
3. Buffering
For example: JDK IO package BufferedWriter
4. Multithreading
Application occasions: mass mailing, processing pictures in large batches, writing logs. A typical consumer-producer model
5. Object Pool
For example: Database connection pool C3P0, thread pool executors, Apache object pool Jakarta Commons Pool
6. Distributed Cache
Distributed cache Framework terracotta, can realize distributed session, EHCACHC and other sharing.

Five, java Program coding optimization
1, String
The Replace and substring methods of string will have memory leak problem, replace the split method with StringTokenizer, Charat method instead of StartsWith, EndsWith method, StringBuilder replaces string and initializes the estimated capacity.
2. List
The query takes precedence using ArrayList, and inserting deletes takes precedence over LinkedList. If the data is inserted in the last digit of the array, then ArrayList performance is better than LinkedList.

Traversal implements the Randomaccess interface's collection performance from high to low sorted by: Index subscript > Iterator > Enhanced for
3. Map
The Hashcode method of the key value in the Map object determines the performance of the collection.
4. Optimize the collection access code
When you create a collection object, initializing the estimated capacity can improve performance. Try to use inner elements instead of method calls.
5. NIO
Use Mappedbytebuffer to replace the traditional IO for file reading and writing.
6. References
Caches can be implemented with weak or soft references when appropriate, such as: Weakhashmap.
7. Abnormal
Try to avoid using exception trapping in the loop body.
8, bitwise operation instead of multiplication operation

vi. Database Optimization
1. Sub-table
Tables are made by the rules of the record ID modulus or time dimension.
2. Partitioning
The Oracle database supports partitioning and can be partitioned according to the data rules of a column.
3. Intermediate table
The original data according to the desired target data for a series of processing to make a set of intermediate tables, directly from the intermediate table query, through the timing of scheduled updates to update the intermediate table. Applicable occasions: The real-time data content requirements are not high, such as: data analysis.
4. Archive history
The data set that is seldom used can be drawn to the history table based on time, and the data table only leaves the usual data, which can be implemented by serialization of objects. Application: Very little access to historical data.
5. Column-Type storage
MySQL Open source Data Warehouse Infobright, for high compression ratio data storage, query speed can be increased 5~60 times, the free version does not support DML statements, does not support high concurrency, can only support more than 10 concurrent queries, can be imported through the load CSV data file. Applicable occasions: The data is not updated frequently and the real-time requirements are not high, such as: data analysis.
6. Query Cache
MySQL query cache,oracle result cache, you can modify the database configuration file to implement the query cache, SQL statements as key, the result as a buffer of value, when the data table changes, the corresponding cache will be invalidated. Applicable occasions: The data is not updated frequently, the query method is fixed. Note: Table connections are supported, but functions are not supported
7. Build Index
For the use of more complex SQL, the large data scale query time, can be indexed in the way, will be involved in the query condition fields, can improve the query speed.
8. SQL optimization
In an SQL statement, select to reduce the query column by defining the field name as a substitute for *. Use the IN keyword sparingly and replace it with the left join and the EXISTS keyword.
9. Stored Procedures
Stored procedures can only be compiled once, where appropriate: when complex operations are performed on the database. such as: multi-table query, calculation, update.
10, database server cluster, read and write separation.

Seven, the JVM tuning.
1. Determine the heap memory size (-XMX,-XMS).
2, the rational distribution of the new generation and old generation (-xx:newratio,-xmn,-xx:survivorratio).
3, determine the size of the permanent zone (-xx:permsize,-xx:maxpermsize).
4, choose the garbage Collector (CMS, G1, etc.), the garbage collector reasonable set.
5. Disable the display of GC (-XX:+DISABLEEXPLICITGC).
6. Disable class metadata reclamation (-XNOCLASSGC).
7. Disable class validation (-xverify:none).
8, JVM increased memory parameters-xms256m-xmx1024m-xx:maxnewsize=512m-xx:maxpermsize=512m

Seven, the need to solve:If the requirements of the module for performance bottlenecks are dispensable, consider masking this requirement. 

Eight, performance tuning tools
1.JMeter, LoadRunner: Performance test, pressure test.
2.JConsole, Jprofiler: Monitoring heap information, threads, Permanent zone usage, class loading, etc.
3.Visual VM: Fault diagnosis, performance monitoring.

Java Big Data Processing tuning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.