- Home
- All articles
- Information
- Web
- Architecture
- Basic technology
- Books
- Tutorial
- I want to contribute
- More channels»
-Navigation Bar-HomeAll articlesInformationWebArchitectureBasic technologyBooksTutorialI want to contributeMore channels»-IOS-Python-Android-Web Front End10 errors that Java developers often make when writing SQL
2015/03/10 | Category: Basic Technologies | 0 Reviews | Tags: SQL
share to:0 This article is translated by Importnew-zer0black from Jooq. Welcome to join the translation team. Reproduced please see at the end of the request.
I was surprised to find that my recent article, "Java Developers write SQL 10 mistakes", has recently been very popular with my blog and my partner, Dzone. The popularity of (this blog) illustrates a few things:
- How important SQL is in professional Java development.
- The basic SQL knowledge is forgotten (the case) ubiquitous.
- By embracing SQL, you can understand that SQL-centric libraries like Jooq or mybatis just reflect the needs of the market. It is surprising that a user mentioned a "Slick's mailing list" posted on my blog, Slick is a non-SQL-centric database access library in Scala, like LINQ (and Linq-to-sql), which focuses on language integration, Rather than the generation of SQL statements.
In any case, the common mistakes I've rushed to list are not finished yet. So I've prepared another 10 less common ones for you, but the same mistakes Java developers make when writing SQL statements.
1. No
preparedstatements
Interestingly, today, after many years of JDBC, this error still appears in blogs, forums, and mailing lists, even if it's easy to remember and understand it. There are several reasons why developers may not use preparedstatements:
- They don't know about preparedstatements.
- They think it's too slow to use preparedstatements.
- They think writing preparedstatements too much effort
Come on, let's get rid of the rumors above. In 96% of cases, it is better to use preparedstatement than to declare statements statically. Why is it? Here are some simple reasons:
- Using the inline binding value (inlining bind values) avoids bad statements from the source to cause syntax errors.
- Using inline binding values avoids SQL injection vulnerabilities that are caused by bad statements.
- When inserting more "complex" data types (such as timestamps, binary data, and so on), you can avoid edge phenomena (edge-cases).
- If you keep the Preparedstatements connection open and do not close immediately, you can reuse the new value as soon as it is re-bound.
- You can use adaptive cursor sharing--adaptive cursor sharing in more complex databases (Oracle's parlance). This will help you prevent SQL statements from hard parsing each time you set a new binding value.
(Translator Note: The drawbacks of hard parsing.) Hard parsing, the execution of the entire SQL statement, requires complete parsing to generate an execution plan. With hard parsing, generating execution plans consumes CPU resources, as well as SGA resources. What you have to mention here is the use of the latch in the library cache. The latch is a refinement of the lock and can be understood as a lightweight serialization device. When a process has requested a latch, the number of latches used to protect the shared memory will not be modified by more than two processes at the same time. In the case of hard parsing, the use of the latch needs to be applied, while the number of latches waits in limited cases. The use of a large number of latches results in a process that requires a latch to queue more frequently and performance is over-lowered.
In some special cases you need to bind the values inline, in order to give the cost-based performance Optimizer a hint of the data set that the query will be involved in. A typical case is a "constant" judgment:
Instead of judging by a "variable":
- First_Name like "jon%"
- AMOUNT > 19.95
It is important to note that the modern database has implemented data snooping (bind-variable peeking). Therefore, by default, you can also use bound values for all of your query parameters. When you write embedded jpql or embedded SQL, a high-level API such as JPA Criteriaquery or Jooq can easily and clearly help you generate preparedstatements statements and bind values.
For more background information:
- Additional instructions for binding data snooping (bind-variable peeking): This topic has an interesting article from Oracle Master Tanel poder hand.
- Cursor sharing. An interesting question on the StackOverflow.
Solution:
By default, preparedstatements is always used instead of static declaration statements, and never inline bound values are embedded in your SQL statement.
2. Return too many columns
This error occurs very frequently, and it affects not only your database execution plan, but also your Java application. Let's take a look at the impact on the latter:
Adverse effects on Java programs:
If you want to satisfy the data reuse between different DAO layers, select * or the default 50 columns, there will be a lot of data read from the database into the JDBC result set, even if you do not read from the result set, it is also passed to the line and loaded into memory by the JDBC Drive. If you know that you only need 2-3 columns of data, this creates a serious waste of IO and memory.
This (the seriousness of the problem) is obvious, be careful ...
Adverse effects on the database execution plan:
These effects may actually be more serious than the impact on Java applications. When a complex database is to calculate the best execution plan for your query request, it makes a lot of SQL conversions (SQL transformation). Fortunately, a subset of the requests can be omitted because they do not work for SQL-linked mappings or filtering conditions. I recently wrote a blog about the problem: How metadata schemas affect Oracle query transformations.
Now, let me show you an example of the error. Think of a complex query with two views:
1234 |
SELECT * FROM customer_view c JOIN order_view o ON c.cust_id = o.cust_id |
Each view associated with the associated table reference above may also be associated with data from other tables, such as customer_address, Order_history, Order_settlement, and so on. When you make a select * map, your database has no choice but to load all the connection tables, in fact, the only data you are interested in may be:
1234 |
SELECT c.first_name, c.last_name, o.amount FROM customer_view c JOIN order_view o ON c.cust_id = o.cust_id |
A good database will automatically remove those unwanted connections when converting your SQL statements, so that the database requires less IO and memory consumption.
Solution:
Never use SELECT * (such a query). Also do not reuse the same mappings when performing different requests. Try to reduce the data that maps to what you really need.
It is important to note that it is somewhat difficult to achieve this goal on object-relational mapping (ORMs).
3, the join as the clause of select
This is not a mistake for the correctness of performance or SQL statements. However, the SQL developer should be aware that the JOIN clause is not part of the SELECT statement. SQL Standard 1992 defines a table reference:
123456789101112131415161718192021222324252627 |
6.3 <table reference>
<table reference> ::=
<table name> [ [ AS ] <correlation name>
[ <left paren> <derived column list> <right paren> ] ]
| <derived table> [ AS ] <correlation name>
[ <left paren> <derived column list> <right paren> ]
| <joined table>
7.4 <from clause> <from clause> ::=
FROM <table reference> [ { <comma> <table reference> }... ]
7.5 <joined table>
<joined table> ::=
<cross join>
| <qualified join>
| <left paren> <joined table> <right paren>
<cross join> ::=
<table reference> CROSS JOIN <table reference>
<qualified join> ::=
<table reference> [ NATURAL ] [ <join type> ] JOIN
<table reference> [ <join specification> ]
|
The associated database is table-centric. Many of the operations are performed on a physical table, a join table, or a derived table. In order to write SQL statements effectively, understand the Select ... The FROM clause is very important to split the table reference with ",".
Based on the complexity of table references, some databases also accept other types of complex table references (table references), such as Insert, UPDATE, DELETE, MERGE. Take a look at the Oracle instance Manual, which explains how to create an updatable view.
Solution:
It is important to consider that, generally speaking, the FROM clause is also a table reference (table references). If you write a join clause, consider this join clause as part of this complex table reference:
1234 |
SELECT c.first_name, c.last_name, o.amount FROMcustomer_view c JOIN order_view o ON c.cust_id = o.cust_id |
4. Connection syntax before using ANSI 92 standard
We've already made clear how the table references work (look at a section), so we should agree that the syntax before the ANSI 92 standard should be avoided regardless of the cost. In terms of the execution plan, use join ... The ON clause or the WHERE clause to make a connection predicate is no different. But from the point of view of readability and maintainability, the WHERE clause in the filtering condition judgment and connection judgment will be bogged down in the quagmire, and look at this simple example:
123456 |
SELECT c.first_name, c.last_name, o.amount FROM customer_view c, order_view o WHERE o.amount > 100 AND c.cust_id = o.cust_id AND c.language = ‘en‘ |
Can you find the join verb? What if we add dozens of sheets? It gets worse when you use the outer join proprietary syntax, as in Oracle's (+) syntax.
Solution:
Be sure to use the ANSI 92 standard JOIN statement. Do not place the join predicate in the WHERE clause. The join syntax before the ANSI 92 standard does not have the slightest benefit.
5. Forget escape when using like judgment
SQL Standard 1992 indicates that the like decision should be as follows:
12345 |
8.5 <like predicate> <like predicate> ::= <match value> [ NOT ] LIKE <pattern> [ ESCAPE <escape character> ] |
You should use the Escape keyword when you allow the user to enter parameters for your SQL query. Although it is rare to have a percent percent (%) in the data, the underscore (_) is common:
123 |
SELECT * FROM t WHERE t.x LIKE ‘some!_prefix%‘ ESCAPE ‘!‘ |
Solution:
When you use the like decision, you also use the appropriate escape
6, the Boolean value of not (A-in (x, y)) and in (x, y) is considered opposite
For nulls, this is an important detail! Let's see what A in (X, Y) really means:
A in (X, Y)
Is the same as A = any (X, Y)
Is the same as a = X OR a = Y
When at the same time, no (A in (X, Y)) really means:
Likewise, the true meaning of not (A-in (X, Y)):
Not (A in (X, Y))
Is the same as A not in (X, Y)
Is the same as A! = any (X, Y)
Is the same as a! = X and A! = Y
Does it look the same as the Boolean value you said earlier? Not really. If either x or Y produces a null,not in condition, the result will be unknown, but the in condition may still return a Boolean value.
Or in other words, when a in (x, y) result is true or false, the not (A-in (x, y)) result is still unknown instead of false or true. Note that if the on right side of the in condition is a subquery, the result is still the same.
Don't believe me? You see SQL fiddle yourself. It says that the following query does not give the result:
12345 |
SELECT 1 WHERE 1 IN ( NULL ) UNION ALL SELECT 2 WHERE NOT (1 IN ( NULL )) |
For more details, refer to my previous blog, which reads some SQL dialects that are incompatible in the same region.
Solution:
When it comes to nullable columns, note the not in condition.
7. Believes that
Not (A is NULL) and
A is isn't null is the same
Yes, we remember that when we deal with null values, SQL implements the three-valued logic. This is why we can use NULL conditions to detect null values. Are you right? That's right.
But in cases where the null condition is easy to omit. Be aware that the following two conditions are equal only when the row value expression (row value expressions) is 1:
Not (A is NULL)
Is isn't the same as A is not NULL
If a is a row value expression greater than 1 (row value expressions), the correct table will be converted as follows:
- True if all values of a are null,a is null
- False if all values of A are null,not (A is NULL)
- True if all values of a are not null,a are not NULL
- False if all values of a are not null,not (A was not NULL)
More details can be found in my previous blog post.
Solution:
When using row-valued expressions (row value expressions), be aware that the null condition does not necessarily achieve the desired effect.
8. Do not use line-valued expressions
A row-valued expression is a great feature of SQL. SQL is a table-centric language, and the table is the center of behavior. By creating a point-to-point row model that can be compared at the same level or row type, the row-value expression makes it easier to describe complex criteria. A simple example is to request the customer's name at the same time
123 |
SELECT c.address FROM customer c, WHERE (c.first_name, c.last_name) = (?, ?) |
As you can see, the syntax for a row-valued expression is more concise when you compare this syntax to the left of the predicate on each line and to the right side of it. This is especially effective when there are many independent conditions that are connected through and. Row-valued expressions allow you to put together conditions that relate to each other. It is more useful for join expressions that have foreign keys:
1234 |
SELECT c.first_name, c.last_name, a.street FROM customer c JOIN address a ON (c.id, c.tenant_id) = (a.id, a.tenant_id) |
Unfortunately, not all databases support row-valued expressions. But the SQL standard has been defined in 1992 pairs of row-valued expressions, and if you use them, complex databases like Oracle or Postgres can use them to calculate a better execution plan. On the use the Index, Luke has parsing on this page.
Solution :
Whatever you do, you can use a row-valued expression. They will make your SQL statements more concise and efficient.
9. Do not define sufficient restrictions (
Constraints
)
I again quote Tom Kyte and use the Index, Luke. You can't use restrictions on your metadata. First of all, the constraints can help you prevent the data qualitative, light this is very useful. But what's more important to me is that the constraints can help the database make SQL statement conversions, and the database can be determined.
- Which values are equivalent
- Which clauses are redundant
- Which clauses are not valid (for example, a statement that returns a null value)
Some developers may think that limiting conditions can cause (the database) to become slower. But instead, unless you insert a lot of data, for a large operation you can disable the restriction, or use a temporary "load table" without restrictions, and then transfer the data to the real table.
Solution:
Define as many restrictive conditions as possible (constraints). They will help you perform database requests better.
10, think 50ms is a fast query speed
The NoSQL hype continues, and many companies think they need faster, more scalable solutions like Twitter or Facebook to scale out of acid and relational models. Some may succeed (like Twitter or Facebook), while others may go astray:
Read this article: https://twitter.com/codinghorror/status/347070841059692545.
For companies that are still forced (or stuck) to use relational databases, don't delude yourself: "The relational databases are slow, but they are quickly advertised by hype." In fact, they are really fast, parsing a 20Kb query document, calculating a 2000-line execution plan, such a large execution, takes less than 1ms, and if you and the Data Manager (DBA) continue to tune the database, you get the maximum run.
There are two reasons why they can become slow: one is that your application abuses the popular ORM, and the other is that ORM cannot produce fast SQL statements for your complex query logic. In this case, you should consider choosing an API that is closer to the SQL core, such as JDBC, Jooq, or mybatis, to better control your SQL statements.
Therefore, do not assume that the query speed of 50ms is fast or acceptable. Not at all! If your program is running at this time, check your execution plan. This potential danger can erupt in more complex contexts or data that you perform.
Summary
SQL is interesting, but also subtle in a variety of ways. As my blog about 10 errors shows. To master SQL is a worthwhile thing to do. Data is your most valuable asset. Treat your data with a respectful mindset to write better SQL statements.
10 errors that Java developers often make when writing SQL