[Java Performance] Database performance best Practices-JPA and read-write optimizations

Source: Internet
Author: User

Database Performance Best Practices

When an application needs to connect to a database, the performance of the application may be impacted by the performance of the database. For example, when there is a limit to the I/O capability of the database, or because an index is missing, the SQL statement that executes will need to traverse the entire table. For these problems, it may not be enough to simply optimize the application code, but also to understand the knowledge and characteristics of the database.

Sample Database

The database represents stock price information for 128 stocks within 1 years (261 business days).

There are two tables: StockPrice and Stockoptionprice. The stock code is used as the primary key in StockPrice, as well as the date field. It has 33,408 records (128 * 261). The Stockoptionprice stores 5 options per day for each stock, the primary key is the ticker symbol, plus a date field and an integer field that represents the option number. It has 167,040 records (128 * 261 * 5).

Jpa

The most significant impact on JPA performance is the JDBC Driver it uses. In addition, there are other factors that can affect JPA performance.

JPA improves the performance of JPA by enhancing the bytecode of the entity type, which is transparent to the user in the Java EE environment. However, in a Java SE environment, it is necessary to ensure that these bytecode operations are correct. Otherwise, there will be a variety of problems affecting JPA performance, such as:

    • Fields that require lazy loading (lazy load) are immediately loaded (Eager load).
    • Unnecessary redundancy in fields saved to the database
    • Data that should be saved to the JPA cache is not saved, resulting in an unnecessary redo (refetch) operation

JPA enhancements to bytecode are generally used as part of the compilation phase. After the entity types are compiled into bytecode, they are processed by the post handlers (they are implementation-related, that is, the post handlers used by Eclipselink and hibernate) to enhance these bytecode and get the optimized bytecode file.

In some JPA implementations, there is also a way to dynamically enhance bytecode when a class is loaded into the JVM. You need to specify an agent for the JVM, provided as a startup parameter. For example, when you want to use this feature of Eclipselink, you can pass in:-javaagent:path_to/eclipselink.jar

Transaction processing (Transaction handling)

JPA can be used in Java SE and Java EE applications. The difference is in how transactions are handled.

In Java EE, JPA transactions are only part of the Java Transaction API (JTA) Implementation of the application server. It provides two ways to handle the boundary of a transaction:

    • Container Management transactions (container-managed TRANSACTION,CMT)
    • User Management transactions (user-managed Transaction, UMT)

As the name implies, CMT delegates the boundary processing of a transaction to a container, whereas UMT requires the user to specify the bounds to be handled in the application. There is no significant difference between CMT and UMT in the case of reasonable use. However, performance can vary when used improperly, especially when using UMT, where the scope of a transaction may be too large or too small to have a significant performance impact. It can be understood that CMT provides a common and eclectic way of dealing with transactional boundaries, which is usually more secure, while UMT provides a more flexible approach, but flexibility is based on the need for the user to be aware of it.

@Statelesspublic class Calculator {    @PersistenceContext (unitname= "Calc")    entitymanager em;    @TransactionAttribute (REQUIRED) public    Void Calculate () {        Parameters p = em.find (...);        ... perform expensive calculation        ... Em.persist (...) (answer ...);}    

The above code uses CMT (using @transactionattribute Annotations), and the scope of the transaction is the entire method. When the isolation level is repeatable READ (repeatable read), it means that the required data is locked for performance when the calculation (above the expensive calculation comment line) is performed.

When using UMT, you will be more flexible:

@Statelesspublic class Calculator {    @PersistenceContext (unitname= "Calc")    entitymanager em;    public void Calculate () {        usertransaction ut = ... lookup ut in application server ...;        Ut.begin ();        Parameters p = em.find (...);        Ut.commit ();        ... perform expensive calculation        ... Ut.begin ();        Em.persist (...) (answer ...);        Ut.commit ();    }}

The Calculate method of the above code does not use @transactionattribute annotations. Instead, a two-segment transaction is declared in the method, and the expensive calculation process is placed outside the transaction. Of course, it is possible to use CMT in conjunction with 3 methods to complete the above logic, but obviously UMT is more convenient and flexible.

In a Java SE environment, Entitymanager is used to provide transactional objects, but the boundaries of a transaction still need to be partitioned in the program (demarcating). For example, in the following examples:

When using UMT, you will be more flexible:

@Statelesspublic class Calculator {    @PersistenceContext (unitname= "Calc")    entitymanager em;    public void Calculate () {        usertransaction ut = ... lookup ut in application server ...;        Ut.begin ();        Parameters p = em.find (...);        Ut.commit ();        ... perform expensive calculation        ... Ut.begin ();        Em.persist (...) (answer ...);        Ut.commit ();    }}

The Calculate method of the above code does not use @transactionattribute annotations. Instead, a two-segment transaction is declared in the method, and the expensive calculation process is placed outside the transaction. Of course, it is possible to use CMT in conjunction with 3 methods to complete the above logic, but obviously UMT is more convenient and flexible.

In a Java SE environment, Entitymanager is used to provide transactional objects, but the boundaries of a transaction still need to be partitioned in the program (demarcating). For example, in the following examples:

public void Run () {for    (int i = Startstock; i < numstocks; i++) {        Entitymanager em = Emf.createentitymanager () ;        Entitytransaction Txn = Em.gettransaction ();        Txn.begin ();        while (!curdate.after (endDate)) {            StockPrice sp = Createrandomstock (curdate);            if (sp! = null) {                em.persist (sp);                for (int j = 0; J < 5; J + +) {                    Stockoptionpriceimpl sop = createrandomoption (Sp.getsymbol, Sp.getdate ());                    Em.persist (SOP);                }            }            Curdate.settime (Curdate.gettime () + msperday);        }        Txn.commit ();        Em.close ();    }}

In the above code, the entire while loop is included in the transaction. As with transactions in JDBC, there is always a tradeoff between the scope of the transaction and the frequency of the transaction being submitted, and some data is given as a reference in the next section.

Summarize
    1. With the understanding of UMT, explicit management of transactions using UMT can have better performance.
    2. When you want to use CMT for transaction management, you can make the scope of a transaction smaller by dividing the method into multiple methods.
JPA Write Optimizations

In JDBC, there are two key performance optimization methods:

    • Reusing PreparedStatement objects
    • Using the bulk Update operation

JPA is also able to do both optimizations, but these optimizations are not done by invoking the JPA APIs directly, and the way they are enabled in different JPA implementations varies. For Java SE Applications, it is often necessary to set some specific properties in the Persistence.xml file to enable these optimizations.

For example, in the JPA Reference implementation (Reference implementation) Eclipselink, reuse PreparedStatement need to add a property to Persistence.xml:

<property name= "Eclipselink.jdbc.cache-statements" value= "true"/>

Of course, if JDBC driver is able to provide a statement Pool, enabling this feature is better than enabling JPA-enabled features. After all, JPA is also built on the JDBC driver.

If you need to use this optimization for bulk updates, you can add attributes to Persistence.xml:

<property name= "eclipselink.jdbc.batch-writing" value= "jdbc"/><property name= " Eclipselink.jdbc.batch-writing.size "value=" 10000 "/>

The size of the batch update can be set not only by the above eclipselink.jdbc.batch-writing.size , but also by calling the Flush method on Entitymanager to let all current statements be executed immediately.

The following table shows the difference in execution time when different optimization options are used:

Optimization Options Time
No batch update, no statement cache 240s
No batch update, with statement cache 200s
Batch update, no statement cache 23.37s
Batch update with statement cache 21.08s
Summarize
    1. JPA applications, like JDBC applications, limit the number of write operations to the database to improve performance.
    2. The statement cache can be implemented in the JPA or JDBC layer, and if the JDBC driver provides this functionality, it takes precedence over the JDBC layer implementation.
    3. The JPA update operation is implemented in two ways, one through declarative (that is, adding attributes to Persistence.xml), and two by calling the Flush method.
JPA Read optimization

Because of the JPA cache participation, JPA read operations are a bit more complex than imagined. Also, because JPA takes the cache factor into account, the JPA-generated SQL is not optimal.

JPA read operations occur in three scenarios:

    • Call Entitymanager's Find method
    • Execute JPA Query statement
    • Other entity objects that need to be associated with an entity object

In the first two cases, it is possible to read whether the entity object corresponds to a partial column or an entire row of the table, or read other objects associated with the entity object.

Read data as little as possible

You can set a domain to lazy loading to avoid reading the same domain at the same time as the object is being read. When an Entity object is read, the domain that is declared lazy-loaded will be excluded from the generated SQL statement. After that, the JPA is prompted for a read operation only if the getter method of the domain is called. For basic types, this lazy loading is seldom used because they have a small amount of data. However, for BLOBs or clob types of objects, it is necessary:

@Lob @column (name = "IMAGEDATA") @Basic (fetch = fetchtype.lazy) Private byte[] IMAGEDATA

The above ImageData fields are set to lazy loading because they are too large and are not often used. The benefits of doing this are:

    • Make SQL Execute faster
    • Saves memory and reduces GC pressure

It is also important to note that lazy-loaded annotations (fetch = Fetchtype.lazy) are just a hint (Hint) for the JPA implementation. When a read operation is actually performed, JPA may ignore it.

As opposed to lazy loading, you can also specify that some fields are loaded immediately (Eager load). For example, when an entity is read, the entity's related entities are also read, as follows:

@OneToMany (mappedby= "stock", Fetch=fetchtype.eager) private collection<stockoptionpriceimpl> optionsprices;

For Domains of type @onetoone and @manytoone, they are loaded by default. So when you need to change this behavior, use fetch = FetchType.LAZY . Similarly, loading immediately for JPA is also a hint (Hint).

When JPA reads an object, the object contains an associated object that needs to be loaded immediately. In many JPA implementations, the join statement is not used to complete the reading of all objects in one SQL. They execute an SQL command to get to the primary object first, and then generate one or more statements to complete the reading of other associated objects. This default behavior cannot be changed when the Find method is used. When using JPQL, you can use the join statement.

When using JPQL, you cannot specify which fields of an object need to be selected, such as the following query:

Query q = Em.createquery ("Select S from Stockpriceimpl S");

The resulting SQL is this:

SELECT <enumerated List of Non-lazy fields> from stockpricetable

This also means that when you don't need certain domains, you can only declare them as lazy-loaded domains.

A join statement using JPQL can use a SQL to get an object and its associated object:

Query q = Em.createquery ("Select S from Stockoptionimpl S" + "JOIN FETCH s.optionsprices");

The above JPQL will generate the following SQL:

SELECT T1.<fields>, t0.<fields> from Stockoptionprice t0, stockprice T1 WHERE (t0. SYMBOL = t1. SYMBOL) and (t0. Pricedate = t1. Pricedate))

The JOIN fetch and the domain are lazy-loaded or immediately loaded without a direct relationship. When a join fetches lazy-loaded domains, those domains are also read, and then when the program needs to use those lazy-loaded domains, it is no longer read from the database.

When all the data obtained using the join fetch is used by the program, it can help improve the performance of the program. Because it reduces the number of SQL executions and the number of accesses to the database, this is often a bottleneck for applications that use the database.

However, the relationship between join fetch and JPA cache is somewhat subtle, as described later in the JPA cache.

Other implementations of JOIN fetch

In addition to using join FETCH directly in JPQL, you can also do this by setting hints. This approach is supported in many JPA implementations, such as:

Query q = Em.createquery ("Select S from Stockoptionimpl S"), Q.setqueryhint ("Eclipselink.join-fetch", "s.optionsprices" );

In some JPA implementations, a @joinfetch annotation is provided to provide join fetch functionality.

Get groups (Fetch group)

When an entity object has multiple lazy-loaded domains, JPA typically generates and executes an SQL statement for each of the required fields when they are needed. It is obvious that in this scenario, it is better to build and execute an SQL statement.

However, this behavior is not defined in the JPA standard. However, most JPA implementations define a fetch group to accomplish this behavior. Multiple lazy-loading domains are defined as a fetch group, and the entire group is loaded each time they are loaded. Therefore, when this behavior is required, you can refer to the documentation for the specific JPA implementation.

Batch processing and querying (batching and Queries)

JPA can also handle the results of queries as JDBC handles resultset:

    • Return all records in all result sets at once
    • Each time you get a record in the result set
    • Fetch N Records in the result set one time (similar to the fetch size for JDBC)

Similarly, this fetch size is also related to specific JPA implementations, such as in Eclipselink and Hibernate, as follows:

Eclipselinkq.sethint ("Eclipselink. Jdbc_fetch_size "," 100000 ");//hibernate@batchsize//Query here ...

Also, you can set paging-related settings for query:

Query q = em.createnamedquery ("SelectAll"); Query.setfirstresult (101); query.setmaxresults (100); list<? Implements stockprice> = Q.getresultlist ();

This will allow the data to be obtained from the 101th to No. 200 section only.

At the same time, the above uses a named query (Named query,createnamedquery ()) instead of a temporary query (Ad-hoc query,createquery ()), which is faster to name queries in many JPA implementations, Because a named query corresponds to a preparedstatement in the statement Cache pool, all that is left to do is to bind the object to the parameter. Although the same implementation can be used for ad hoc queries, JPQL is only known at run time, so it is difficult to implement, and in many JPA implementations a new statement object is created for ad hoc queries.

Summarize
    1. JPA has some optimization options to limit (increase) the amount of read data for a single database access.
    2. For fields of BLOB and CLOB type, set their load mode to lazy loading.
    3. The associated entities of JPA entities can be set to lazy loading or immediate loading, depending on the specific needs of the application.
    4. When you need to load an entity's associated entities immediately, you can combine named queries and join statements. Note the effect it has on the JPA cache.
    5. Using named queries is faster than ad hoc queries.

[Java Performance] Database performance best Practices-JPA and read-write optimizations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.