10 types of Java developers often encounter errors when writing SQL statements, and 10 types of java developer SQL
The degree to which Java developers can coordinate object-oriented programming thinking and command line programming thinking depends on their ability:
Tips (anyone can write code in the command line format)
Dogma (some people use the "mode-mode" method, that is, the mode is everywhere, and the name is used as the identifier)
Emotional condition (in the initial stage, real object-oriented code is much more difficult than imperative code .)
However, when Java developers write SQL statements, everything changes. SQL is a descriptive language that has nothing to do with object-oriented thinking and imperative thinking. In SQL, queries are easily expressed. But it is not so easy to write in the best or the most correct way. Developers not only need to rethink their own programming models, but also need to think deeply from the perspective of set theory.
When Java developers use JDBC or jOOQ to write SQL statements,Several Common Errors:
1. NULL is missing
Misunderstanding of NULL may be the most common mistake for Java developers to write SQL statements. This may be because NULL is also called UNKNOWN, but there are other reasons. Of course, if it is called only UNKNOWN, it will be easier to understand. Another reason is that when JDBC acquires data or binds a variable, the NULL value in SQL is mapped to the null value in Java. This may lead to a situation that is considered similar to null = null in Java, and NULL = NULL exists in SQL.
A more bizarre misunderstanding of NULL is that when a NULL predicate is used for a row value expression.
Another subtle problem arises from a misunderstanding of the meaning of NULL in NOTIn anti-join.
Solution
Constantly train yourself. Always define the meaning of NULL. Every time you write an SQL statement, consider the following:
Is the predicate correct for NULL?
Does NULL affect the result of this function?
2. process data in Java memory
Some Java developers are very familiar with SQL features. Occasional JOIN, scattered UNION, no problem. But what if we encounter Windows functions, result set groups, and so on? Many Java developers load SQL data to the memory and convert the data into some suitable Collection types, execute annoying mathematical operations on the set in a very lengthy loop structure (at least before Java 8 improves the container ).
However, in addition to SQL standards, some SQL databases also support advanced OLAP features, which provide better execution efficiency and ease of writing. A non-standard example is the MODEL clause of oracle. It only allows the database to process data and load the final result to the Java memory. Because some very smart people have optimized these expensive products. Therefore, in fact, by migrating data to an OLAP database, you will receive two benefits:
Concise. It may make it easier to write the correct code in SQL than in Java.
Performance. The database may be faster than your algorithm. More importantly, you no longer need to transmit millions of records over the network.
Solution
Every time you implement a data-centric Algorithm in Java, you should try to ask yourself: Is there a way for the database to execute these tasks, instead of just delivering the results to me?
3. Try to use UNION instead of union all.
Compared with UNION, union all requires additional keywords. If the SQL standard has been defined as the following support, it will be much better:
UNION (repeated allowed)
Union distinct (remove duplicates)
In general, deduplication is rarely required (sometimes deduplication or even incorrect), and for large result sets with many columns, it is often very slow, because the two subqueries need to be sorted, each tuples must be compared with the subsequent tuples.
It should be noted that, even if the SQL standard specifies INTERSECTALL and effectall, almost no database can implement these useful operations.
Solution
Every time you write a UNION statement, consider whether you actually want to write UNIONALL.
4. Use the JDBC paging function to paging a large number of results
Most databases support paging results by using clauses such as LIMIT... OFFSET, TOP... start at, and OFFSET... FETCH. These clauses are not supported, but ROWNUM (Oracle) or ROW_NUMBER () OVER () (DB2, SQL Server 2008 and earlier) is still available ), this is much faster than paging in the memory. This is more obvious for large datasets.
Solution
If you use those clauses or tools (such as jOOQ), you can simulate the preceding paging clauses.
5. connect Java memory
In the early stages of SQL development, some developers still feel uneasy about SQL connection. There is always an inherent fear-slow JOIN speed. If the cost-based optimizer chooses to execute a nested loop and loads the complete table to the database memory before creating a connection table source, the speed is indeed very slow. But this rarely happens. With appropriate predicates, constraints, and indexes, MERGEJOIN and HASHJOIN operations are very fast. This is related to the correct metadata (I no longer need to give an example of Tom Kyte ). However, it is still possible that many Java developers will load two tables from a separate query to the map container and perform the join operation in java memory in some way.
Solution
If you perform the SELECT Operation on multiple tables in multiple steps, you should carefully consider whether you can express the required query functions in a statement.
6. Use DISTINCT or UNION to delete duplicates from a Descartes
The existence of lengthy connections will lead to loose relationships that play a role in SQL statements. Specifically, if a multi-column foreign key relationship is involved, it is likely that you forget to add a predicate to the JOINON clause. This may lead to repeated records, but it may only be in special circumstances. Then some developers may choose to use DISTINCT to delete these duplicate records again. There are three dangers to this error:
It is possible to cure the symptoms. Even in some marginal situations, the labels cannot be cured.
This will be very slow in the large result set with many columns. DISTINCT performs the order by operation to delete duplicates.
This is also very slow in large Descartes, because this will still cause loading of a large amount of data in the memory.
Solution
As an empirical rule, when you get unwanted duplicate results, you should first check your join predicates. It may be that there is an imperceptible Cartesian Product somewhere.
7. Do not use the MERGE statement
Strictly speaking, this is not a real error. It may be a lack of sufficient awareness or fear for powerful MERGE statements. Some databases include other forms of UPSERT statements, such as the onduplicate key update clause of MySQL. But MERGE is really powerful. The most important thing is that in the database, it expands SQL standards to a large extent, such as SQL Server.
Solution
If you use the INSERT and UPDATE links or SELECT... for update links to implement UPSERTING, you need to think about it more. Aside from the risk of running conditions, you may be able to use a simple MERGE statement to achieve your goal.
8. The aggregate function is used instead of the form function.
Before introducing the window function, using the GROUPBY clause and the projection aggregate function is the only way to summarize data. This is very effective in most cases. If the aggregated data needs to be supplemented by conventional data, the query of this group can be placed in the connected subquery.
However, SQL: 2003 defines window functions, and many mainstream database vendors have also implemented window functions. The Window Function aggregates ungrouped data in the result set. In fact, each window supports its own independent PARTITIONBY clause, which is a very useful tool for report applications.
Use the window function:
Lead to more readable SQL statements (reduce the existence of non-dedicated GROUP BY clauses in subqueries)
Improve performance. As an RDBMS, It is likely easier to optimize its window function.
Solution
When you write a GROUPBY clause in a subquery, think about whether this can be done using a window function.
9. Indirect sorting by memory sorting
The sqlorder by clause supports multiple types of expressions, including CASE statements, which are useful for indirect sorting. You should always sort data in Java memory, because you think:
Slow SQL sorting
SQL sorting cannot do this.
Solution
If you sort any SQL data in the memory, consider whether you can migrate the data to the database. This is the same reason as migrating data to the database by page.
10 insert a large number of records one by one
JDBC contains batch processing, and you should use it. For thousands of records, do not create a new PreparedStatement for each record for insertion. If you want to INSERT all records into the same table, use a single SQL statement and multiple bound value sets to create a batch INSERT statement. Based on your database and database configuration, you may need to submit a certain number of inserted records to keep the UNDO logs too large.
Solution
Always insert large datasets in batches.
Java developers often see 10 errors when writing SQL statements. Do you have a rough idea? I hope you will pay special attention to them during the compilation process!