10 errors that Java programmers often make when writing SQL programs

Source: Internet
Author: User

Java Programmers need to mix Object-oriented Thinking and general imperative programming methods in programming. Whether the two can be perfectly combined depends entirely on the programmer's level:
Skills (anyone can easily learn imperative programming)
Mode (some people use "Mode-mode". For example, the mode can be applied anywhere and can be classified as a certain type of mode)
Mood (first of all, writing a good object-oriented program is much harder than imperative programs, and you have to spend some time)
But when Java programmers write SQL statements, everything is different. SQL is a descriptive language rather than an object-oriented or imperative programming language. It is very easy to write a query statement in SQL. However, similar statements in Java are not easy because programmers should not only repeat the programming paradigm, but also consider algorithm issues.
The following are common errors that Java Programmers Make when writing SQL statements (there is no specific order ):


1. Forget null

The misunderstanding of null when writing SQL statements by Java programmers may be the biggest error. It may be because (not the only reason) null is also called unknown. If it is called unknown, it is better to understand it. Another reason is that when you get something from the database or bind a variable, JDBC maps SQL null to null in Java. This leads to the misunderstanding of null = NULL (SQL) and null = NULL (Java.
The biggest misunderstanding of null is that when null is used as a row value expression Integrity Constraint.
Another misunderstanding occurs in applications where NULL is not in anti-joins.
Solution:
Train yourself well. When you write SQL statements, you have to keep thinking about the usage of null:
Is this null Integrity Constraint correct?
Does null affect the result?

2. process data in Java memory

Few Java developers can understand SQL well. Occasional join and odd union, okay. But what about window functions? What about grouping a set? Many Java developers load SQL data into the memory and convert the data into some similar Collection types, then, those sets use the boundary loop control structure (at least before the Java 8 collection upgrade) to execute annoying mathematical operations.
But some SQL databases support advanced (and are supported by SQL standards !) OLAP features, which provide better performance and easier to write. an (not standard) example is the awesome model clause of oracle. let the database do the processing and then only bring the results to the Java memory. after all, all the very smart guys have optimized these expensive products. therefore, by moving OLAP to the database, you will get two benefits:
Convenience. This is easier than writing correct SQL statements in Java.
Performance. The database should be faster than your algorithm. More importantly, you don't have to transmit millions of records.
Perfect method:
Every time you use Java to implement a data-centric algorithm, you can ask yourself: is there a way to replace the database with such troubles.

3. Use Union instead of Union all

It is so shameful that an extra keyword is required for Union all compared to union. If the SQL standard already specifies support, it may be better.
Union (repeated allowed)
Union distinct (deduplication)
Removing duplicate rows is not only rare (sometimes or even incorrect), but also slow for big data sets with many rows, because the two sub-select statements need to be sorted, in addition, each tuple also needs to be compared with its sub-sequence tuples.
Note that even if the SQL standard specifies intersect all and except T ALL, few databases will implement these useless set operators.
Solution:
Every time you write a Union statement, consider whether the Union all statement is actually required.

4. Paging for a large number of results through JDBC paging technology

Most databases support paging commands to achieve paging effect, such as limit... offset, top... start at, offset... fetch statements. Even if the database does not support these statements, it is possible to filter rownum (Oracle) or row number () over () (DB2, SQL server2008, etc ), these are faster than implementing paging in the memory. In processing a large amount of data, the effect is particularly obvious.
Correction:
If you only use these statements, a tool (such as jooq) can simulate the operations of these statements.

5. add data to the Java memory

Starting from the early stage of SQL, some developers still feel uneasy when using join statements in SQL. This is due to the inherent fear of slow join operations. If the cost-based optimization option is used to implement nested loops, all the tables may be loaded in the database memory before creating a connection table source. This may be true. However, the probability of this event is too low. Merge connections and hash connections through appropriate predictions, constraints, and indexes are fast. This is entirely about the correct metadata (here I cannot reference too many Tom Kyte ). In addition, there may still be many Java developers loading two tables and querying a ing separately, and adding them to the memory to some extent.
Correction:
If you have a query operation on various tables in each step, think about whether your query operation can be expressed in a single statement.

6. Use distinct or union to remove duplicates in a temporary Cartesian collection

Through complex connections, people may lose the concept of all the relationships that play a key role in SQL statements. In particular, if this involves a multi-column foreign key relationship, it is likely that you will forget to add the relevant judgment in the join... on clause. This will lead to repeated records, but it may only be in special circumstances. Some developers may choose distinct to remove these duplicate records. This is incorrect in three aspects:
It solves surface symptoms but does not solve the problem. It may also be unable to solve extreme symptoms.
It is slow for a large result set with many columns. Distinct performs the order by operation to eliminate duplicates.
For a large Cartesian Product set, it is very slow, but it still needs to load a lot of data into the memory.
Solution:
Based on experience, if you have obtained unnecessary duplicate records, check your join statement. There may be a Cartesian Product set that is hard to perceive somewhere.

7. Do not use the merge statement

This is not a mistake, but it may be lack of knowledge or lack of confidence in powerful merge statements. Some databases understand other forms of upsert statements, such as the repeated primary key update Statement of MySQL. However, merge is so powerful and important in the database that it can greatly expand the SQL standard, for example, SQL Server.
Solution:
If you want to use a combination of insert and update, or a combination of select... for update, and then update insert or update, think twice. You can use a simpler merge statement to stay away from Risky competition conditions.

8. Use Aggregate functions instead of window functions)

Before introducing window functions, aggregating data in SQL means that the group by statement is mapped to the aggregate function. In many cases, it works well. For example, if the aggregate data needs to concentrate the regular data, the group query is used in the join subquery.
However, window functions are defined in SQL: 2003, which is implemented in many mainstream databases. Window functions can aggregate data in the result set, but there is no group. In fact, each window function has its own independent partition by statement. This tool is too good for displaying reports.
Use window functions:
Make SQL easier to read (but the group by statement is not specialized in subqueries)
Improve performance, such as the ability of the relational database management system to easily optimize window functions
Solution:
When you use the group by statement in a subquery, consider whether you can use the window function.

9. Indirect sorting using memory

SQL order by statements support many types of expressions, including case statements, which are useful for indirect sorting. You may not sort data in the Java memory because you will think:
Slow SQL sorting
SQL sorting failed
Solution:
If you sort any SQL data in the memory, please think twice whether it cannot be sorted in the database. This is very useful for database paging data.

10. Insert a large number of records one by one

JDBC "understands" Batch Processing (batch), and you should not forget it. Do not use the insert statement to create a new preparedstatement object every time thousands or tens of thousands of records are recorded in and out of one record. If all your records are inserted into the same table, create an insert batch statement with an SQL statement and many value sets. You may need to submit a certain amount of insert records to ensure that the UNDO log is small, which depends on your database and database settings.
Solution:
Batch inserts a large amount of data.

Original article: http://www.oschina.net/translate/10-common-mistakes-java-developers-make-when-writing-sql

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.