Original quote: http://www.javabloger.com/article/java-development-concern-those-things.html
Recent IT media held by the Industry Technology Congress so many sites are in the disclosure of their own technology to share with peers, big to Facebook, Baidu, small to just start the site. Facebook, Baidu and other large sites using the technology and extraordinary processing power is really refreshing feeling, but not every site is like Facebook, Baidu has hundreds of millions of user access traffic, there is a huge amount of data need to store, need to use to mapreduce/parallel computing, hbase/columns Store these technologies. Technical means is always the support of operation, for the current operating environment is good, there is no need to catch a fashionable, must and a popular technology to create a point of relations to rest.
In the recent technical conference, we focus more attention on these large sites, in fact, small and medium-sized Portal technology system is also worth exploring and attention. All over the world, not all of the siege services for these large portals, more siege engineers are unknown to some of the fledgling small and medium-sized Web services, but also occupy the siege of the ranks of more than 60% of the crowd. In the attention of large portal sites, small and medium-sized portal technology development and combat experience is worth sharing.
Both large portal sites and small and medium-sized vertical types of Web sites will be able to pursue stability, performance, and scalability. The technical experience of large web site sharing is worth us to learn and borrow, but implement to more specific practice is not for all sites can be applied, other language development sites I dare not say more, but the Java development system, I can still give you a few words to insert:
The correct use of the JVM parameter configuration parameters that are running in the
JVM
Jee container is directly related to the performance and processing power of the entire system, and the tuning of the JVM is primarily for memory management, with the following 4 points in the direction of optimization:
1. The size of the heapsize heap, or the strategy of using memory for Java virtual machines, This is very crucial. The
2.garbagecollector uses the 4 algorithms (policies) of the garbage collector in Java by configuring related parameters. The
3.stacksize stack is the memory instruction area of the JVM, Each thread has his own stack,stack size that limits the number of threads. The
4.debug/log can also set up log output to the JVM runtime and the JVM after it is suspended in the JVM. This is critical, depending on the log output of various JVMs to configure the appropriate parameters.
The JVM's configuration tips are everywhere on the web, but I recommend reading 2 articles from Sun's official website, which can still have an understanding of the configuration parameters
1.Java HotSpot VM Options http://www.oracle.com/ technetwork/java/javase/tech/vmoptions-jsp-140102.html
2.Troubleshooting Guide for Java SE 6 with HotSpot VM http:// Www.oracle.com/technetwork/java/javase/index-137495.html
Also, I believe that not everyone in the siege is facing these JVM parameters every day, If you forget the key parameters you can enter Java-x (capital X) to prompt.
The parameters of
jdbc
for MySQL JDBC are also described in previous articles, and the use of configuration parameters in JDBC in a single machine or cluster environment can have a significant impact on the operation of the database.
Some of the so-called High-performance Java ORM Open source frameworks are open to many default parameters in JDBC:
1. For example: AutoReConnect, Prepstmtcachesize, Cacheprepstmts, Usenewio, blobsendchunksize, etc.,
2. For example, in a cluster environment: Roundrobinloadbalance, Failoverreadonly, Autoreconnectforpools, Secondsbeforeretrymaster.
details can be found in MySQL's official user manual for JDBC:
http://dev.mysql.com/doc/refman/5.1/zh/connectors.html#cj-jdbc-reference
The frequent interaction between
database connection pool (DataSource)
Applications and database connections can lead to system bottlenecks and significant overhead that can affect system performance, and the JDBC Connection pool is responsible for allocating, managing, and releasing database connections. It allows an application to reuse an existing database connection, rather than re-establish a connection, so the application does not need to connect to the database switch frequently, and can release a database connection that has more idle time than the maximum idle time to avoid missing the database connection caused by not releasing the database connection. This technique can significantly improve the performance of database operations.
Here I think there is a point to note:
The use of connection pooling also needs to be shut down, because the database connection pool is started in advance and the database to get the appropriate connection, and then no longer need the application to deal directly with the database, because the application use database connection pool is a "borrow" concept The application to obtain resources from the database connection pool is "loaned", still need to return, as if there are 20 buckets here, need to take water people can use these barrels from the pool to take water, if 20 people have to take the water, not to return the bucket back, then the people back to the need to take water, Only to wait for someone to return the cask, before the use of people need to put back, or the people behind will be waiting, resulting in resource congestion, the same, the application to get the database connection when the connection connection object is from the "pool" to allocate a database connection out, in the use of the finished, Return the database connection so that you can keep the database connection "There is a debit" guideline. &NBSP
Reference:
http://dev.mysql.com/doc/refman/5.1/zh/connectors.html#cj-connection-pooling
Data access
Database server optimization and data access, what kind of data to put in a better place is worth thinking about, the future of storage is likely to be mixed, cache,nosql,dfs,database in a system will have, the life of tableware and weekdays wear clothes need to be placed at home, But not the same type of furniture storage, it seems that there is no other people put cutlery and clothes in the same closet. As with different types of data in the system, it is necessary to use the appropriate storage environment for different types of data. Files and pictures are stored, sorted first by the heat of the access, or by the size of the file. Strong relationship type and need to use the traditional database of transaction support, weak relational type does not need transaction support can consider NoSQL, mass file storage can consider the support of network storage DFS, as far as the cache depends on the size of your individual data storage and read and write ratio.
It is also noteworthy that data read and write separation, whether in the database or NoSQL environment Most of the reading is greater than write, so at design time to consider not only the need to let the data read scattered across multiple machines, but also need to consider the data consistency between multiple machines, MySQL, a master more from, Add Mysql-proxy or borrow some of the parameters from JDBC (roundrobinloadbalance, failoverreadonly, Autoreconnectforpools, Secondsbeforeretrymaster) for subsequent application development, you can separate read and write separations, spread a lot of read pressure across multiple machines, and ensure data consistency.
Caching
In the macroscopic view the cache generally divides into 2 kinds: the local cache and the distributed cache
1. Local caching, in the case of Java local caching, is the combination of data into static data, which is then taken out of the static data when needed, and recommended for high concurrent environments. Concurrenthashmap or Copyonwritearraylist as the local cache. The use of caching more specifically is the use of system memory, the use of the amount of memory resources need to have an appropriate proportion, if more than appropriate use of storage access, will be counterproductive, resulting in inefficient operation of the entire system.
2. Distributed caching, commonly used in distributed environments, centralized storage for each machine cache, and not only for use in the cache, but also as a means of data synchronization/transmission in distributed systems, most commonly used by memcached and Redis.
The efficiency of data storage on different media is read/written differently, in the system how to use the cache, so that your data closer to the CPU, there is a picture below you need to always keep in mind, from Google's technology, Daniel Jeff Dean (REF) masterpiece, as shown in the picture:
Concurrency/Multithreading
In high concurrency environments, developers are advised to use the concurrency packages (java.util.concurrent) that are brought in the JDK. Using the tool classes under Java.util.concurrent to simplify multithreaded development after JDK1.5, the java.util.concurrent tools are divided into the following main sections:
1. Thread pool, thread pool interface (Executor, executorservice) and implementation class (Threadpoolexecutor, Scheduledthreadpoolexecutor), The thread pool framework with the JDK can manage the queues and scheduling of tasks and allow controlled shutdowns. Because running a thread consumes system CPU resources, creating and ending a thread also has overhead on the system's CPU resources, using a thread pool is not only an effective way to manage multithreading, it can also improve the efficiency of the thread.
2. Local queues provide efficient, scalable, thread-safe, non-blocking FIFO queues. The five implementations in Java.util.concurrent support the extended Blockingqueue interface, which defines the blocked version of Put and take: Linkedblockingqueue, Arrayblockingqueue, Synchronousqueue, Priorityblockingqueue and Delayqueue. These different classes cover most common uses of producer-consumer, messaging, parallel task execution, and associated concurrent design.
3. Synchronizer, four classes can assist in the implementation of common private synchronization statements. Semaphore is a classic concurrency tool. Countdownlatch is an extremely simple but extremely common utility used to block execution before a given number of signals, events, or conditions are maintained. Cyclicbarrier is a reusable multi-channel synchronization point that is useful in some parallel programming styles. Exchanger allows two threads to swap objects at the collection point, which is useful in multiple pipelined designs.
4. And contract Collection, this package also provides the design for multi-threaded context in the Collection implementation: Concurrenthashmap, Concurrentskiplistmap, Concurrentskiplistset, Copyonwritearraylist and Copyonwritearrayset. When many threads are expected to access a given collection, the concurrenthashmap is usually better than the synchronized hashmap,concurrentskiplistmap is usually better than the synchronized TreeMap. Copyonwritearraylist is better than synchronized ArrayList when the expected readings and traversal are much greater than the number of updates in the list.
queues
about queues can be divided into: local queues and distributed Queues 2 classes
Local queues: Commonly used for non-timeliness data bulk writes, You can cache the obtained data at a certain amount of time in a number of times in a batch write, you can use Blockingqueue or list/map to achieve.
Related information: Sun Java API.
distributed queues: Generally as message middleware, building a distributed environment of the system and subsystem communication between the bridge, JEE environment is the most used is the Apache AVTIVEMQ and Sun Company's OPENMQ.
Lightweight MQ middleware has been introduced to you before, for example: Kestrel and Redis (Ref http://www.javabloger.com/article/mq-kestrel-redis-for-java.html), It is also recently heard that LinkedIn's search technology team has launched an MQ product-kaukaf (Ref Http://sna-projects.com/kafka), which remains a concern.
Related information:
1.activemq http://activemq.apache.org/getting-started.html
2.openmq http:// mq.java.net/about.html
3.kafka http://sna-projects.com/kafka     &NBSP
4.JMS article http://www.javabloger.com/article/category/jms
NiO
NiO is present in the JDK1.4 version, and before Java 1.4, the JDK provided a stream-oriented I/O system, such as a read/write file that processes data one byte at a time. One input stream produces one byte of data, one output stream consumes one byte of data, the flow-oriented I/O is very slow, and a packet is either received by the entire datagram or not yet. Java NiO non-blocking technology is actually taking reactor mode, the content will be automatically notified, do not have to Deng, dead cycle, greatly enhance the system performance. In the real scene, the NIO technology mostly uses two aspects, 1 is the file reads and writes the operation, 2 is the network data flow operation. There are several core objects in NiO that need to be mastered: 1 selector (Selector), 2 channels (Channel), 3 buffers (buffer).
my nonsense:
1. In the technical category of Java NIO, memory-mapped files are an efficient way to isolate cold/thermal data stored in the cache, and to handle some of the cold data in the cache. This approach is much faster than regular streaming or channel based I/O, by making the data in the file appear as the contents of the memory array, the part of the file that is actually read or written is mapped into memory, and not the entire file is read into memory.
2. In the MySQL JDBC driver, you can also use NIO technology to manipulate the database to improve the performance of the system.
Long Connection/servlet3.0
The long connection here is long polling, the previous browser (client) needs to focus on server-side data changes need to continue to access the server, so the number of clients will inevitably give the server side of a lot of pressure, such as: The station in the Forum message. A new feature is now available in the SERVLET3.0 specification: Asynchronous IO Communication, which maintains a long connection. This technique, which utilizes Servlet3 asynchronous requests, can greatly ease the pressure on the server side.
The principle of Servlet3.0 is to open a request requests a thread suspend, the middle set wait timeout time, if the background event triggers request requests, the results returned to the client request requests, if the set wait time exceeded no event occurred will return the request to the client, the customer End will initiate request requests again, and the client-server interaction can be reciprocating.
It's like, you come over here and tell me that if someone's looking for you, I will inform you immediately you come to see him, originally you need to constantly ask me if I have to find you, and no matter if there is no one to you, you need to constantly ask me if there is no one to find you, so whether the people who ask or be asked will be exhausted.
Log
Log4j is commonly used by people, the system in the first time the log is generally set in the info level, the real on-line after the general setting in the error level, but no matter at any time, the entry of the log is to be concerned about, Developers can generally rely on the output of the log to find problems or rely on the output of the log to optimize the performance of the system, the log is the system running status report and the basis of the error.
Simply put, the logs are exported to different environments according to the different policies and levels defined, so that we could analyze and manage them. On the contrary, you do not have the output of the strategy, then a lot of machines, a long time, there will be a big push messy log, will let you wrong time to start, so the log output strategy is to use the key point of the log.
References: http://logging.apache.org/log4j/1.2/manual.html
Package/Deploy
It is a good idea to have different types of functional modules in the IDE environment that can be granular into different projects, making it easier to deploy different jar packages in different environments. There is such an application scenario: need to regularly remotely from the SP to get the day 100 news and some of the city's weather forecast, although the amount of data per day is not much, but the front-end access to a large number of concurrent, it is clear that the system architecture to do the separation of read and write.
If the functional modules of Web engineering and timing capture are fully packaged in one project, it will result in the need for expansion of both Web applications and timers on each machine, since functional modules are not separated, and the timer work on each machine will result in duplication of data within the database.
If you develop the Web and timer into 2 projects, packaging can be deployed separately, 10 of the web corresponding to a timer, decomposition of the pressure of the front-end request, the data write will not repeat.
Another benefit of doing this is that it can be shared, in which the web and timers need to read the database, so the Web and timer projects have code to manipulate the database, and the logic of the code still feels messy. If you pull out a DAL layer of the Jar,web and Timer application module developers only need to reference the DAL layer of the jar, the development of relevant business logic, interface-oriented programming, no need to consider the specific database operations, the specific database operations by other developers, can be in the development of the Division of labor is very clear, and Non-interference.
Framework
The popular SSH (struts/spring/hiberanet) lightweight framework, for many small and medium projects is not lightweight, developers need not only to maintain the code, but also to maintain the cumbersome XML configuration files, And maybe a configuration file is not written correctly and the whole project will not work. No configuration files can replace the SSH (struts/spring/hiberanet) framework of the product is really too much, I have introduced to you a number of products (REF).
I'm not just against this. Using the SSH (struts/spring/hiberanet) framework, in my view the SSH framework really works to standardize development and does not use the SSH (struts/spring/hiberanet) framework to improve much performance.
SSH framework is only for the very large number of Project hundred team, but also need to continue to increase the size of the company, it is necessary to select some of the market are recognized, and familiar with the technology, SSH (struts/spring/hiberanet) framework is more mature so is the first product.
But for some small team with a technical expert team can choose a more concise framework, real speed up your development efficiency, the early abandonment of the SSH framework to choose more concise technology in small team development is a more knowledge of the choice.
–end–