10 errors that Java programmers often make when writing SQL programs

Source: Internet
Author: User
Tags join joins sort

Java programmers need a mixture of object-oriented thinking and general imperative programming methods, the perfect combination of both to rely entirely on the level of programmers:

Skills (anyone can easily learn to command-style programming)

Pattern (Some people use "pattern-pattern", for example, a pattern can be applied anywhere, and can be grouped into a certain pattern)

State of mind (first of all, it's much harder to write a good object-oriented program than an imperative program, and you have to spend some effort)

But when Java programmers write SQL statements, everything is different. SQL is a descriptive language, not an object-oriented or imperative programming language. It is easy to write a query statement in SQL. But similar statements in Java are not easy, because programmers need to consider not only the programming paradigm but also the problem of algorithms.

The following are the mistakes that Java programmers often make in writing SQL (no specific order):

  1. Forget about NULL

A Java programmer's misunderstanding of NULL when writing SQL may be the biggest mistake. Perhaps because (not the only reason) NULL is also known as unknown. If it's called unknown, it's okay to understand. Another reason is that when you take something from a database or bind a variable, JDBC corresponds to SQL null and NULL in Java. This leads to a misunderstanding of NULL = NULL (SQL) and Null=null (Java).

The biggest misconception about NULL is when NULL is used as a row-value expression integrity constraint.

Another misconception appears in the application of NULL in the Anti-joins.

Workaround:

Train yourself well. When you write SQL you have to keep thinking about null usage:

Is this NULL integrity constraint correct?

Does null affect the result?

  2. Working with data in Java memory

Few Java developers can understand SQL well. The occasional join, and the odd Union, OK. But what about window functions? and grouping the collections? Many Java developers load SQL data into memory, converting the data into some similar collection types, Then those collections use the boundary loop control structure (at least until the JAVA8 set is upgraded) to perform tiresome mathematical operations.

But some SQL databases support advanced (and are supported by SQL standards!) OLAP features, this feature is better and easier to write. One (not very standard) The example is Oracle's AWESOME model clause. Just let the database do the processing and just take the results to Java memory. Because after all, all the very smart guys have optimized these expensive products. So in fact, by moving OLAP to a database, you get two benefits:

Convenience. This can be easier than writing the correct SQL in Java.

Performance. The database should be faster than your algorithm can handle. And more importantly, you don't have to pass millions of records.

The perfect way:

Every time you use Java to implement a data-centric algorithm, ask yourself: Is there a way to get the database to do this kind of trouble for me?

 3. Use Union instead of UNION all

It's a shame, and union ALL requires additional keywords. If the SQL standard already provides support, then it might be a bit better.

UNION (Allow duplicates)

UNION DISTINCT (except to repeat)

Removing duplicate rows is not only very rare (and sometimes even wrong), but is quite slow for large sets of data with many rows because the two sub select requires sorting, and each tuple also needs to be compared to its child sequence tuples.

Note Even though the SQL standard prescribes intersect all and except all, few databases implement these useless collection operators.

Processing method:

Every time you write a union statement, consider whether the union ALL statement is actually needed.

 4. Paging to a large number of results through JDBC paging technology

Most databases will support pagination commands for paging, such as limit. Offset,top.. START At,offset ... Fetch statements, and so on. Even without a database that supports these statements, it is possible to rownum (Oracle) or row number () over () filtering (Db2,sql Server2008, etc.), which is faster than implementing paging in memory. The effect is particularly noticeable in processing large amounts of data.

Correct:

Using only these statements, a tool (such as JOOQ) can simulate the operation of these statements.

  5. Add data to Java memory

Starting at the beginning of SQL, some developers still feel uneasy when using join statements in SQL. This stems from the inherent fear of slowing down when joining a join. If you implement a nested loop based on cost optimization, it may be true that you can load all the tables in database memory before creating a connection table source. But the probability of this happening is too low. With appropriate predictions, constraints and indexes, merging joins and Hash joins are fairly fast. This is all about the correct meta data (where I can't quote Tom Kyte too much). Also, there may still be a lot of Java developers loading two tables by separating them into a map and adding them to memory in some way.

Correct:

If you have queries from various tables in each step, think about whether you can express your query actions in a single statement.

 6. Use DISTINCT or UNION to eliminate duplicates in a temporary Cartesian product set

With complex connections, people may lose concept of all relationships that play a key role in SQL statements. In particular, if this involves a multiple-column foreign key relationship, it's likely that you'll forget to join. On clause to increase the relative judgment. This can result in duplicate records, but perhaps only in exceptional cases. Some developers may therefore choose distinct to eliminate these duplicate records. This is wrong in three ways:

It (perhaps) solves the symptoms but does not solve the problem. It also may not be able to solve the symptoms of extreme conditions.

It's slow for a large set of results with many columns. Distinct to perform an order by operation to eliminate duplication.

It is very slow for a large Cartesian product set, or it needs to load a lot of data into memory.

Workaround:

According to experience, if you get a duplicate record you don't need, check your join judgment. There may be a set of Cartesian product that is difficult to perceive somewhere.

 7. Do not use the merge statement

This is not a fault, but it may be lack of knowledge or lack of confidence in the strong merge statement. Some databases understand other forms of update insert (UPSERT) statements, such as MySQL's duplicate primary key update statements, but the merge is so powerful and important in the database that it expands SQL standards, such as SQL SERVER.

Way to solve the problem:

If you use a combination of INSERT and update or joint select. For update then think twice when inserting updates, such as INSERT or update. You can use a simpler merge statement to stay away from risky competition.

 8. Using aggregate functions instead of window functions (Windows functions)

Before you introduce window functions, aggregating data in SQL means using the GROUP BY statement to map to aggregate functions. Working well in many situations, such as aggregating data that requires regular data to be condensed, use a group query in a join subquery.

But the window function is defined in sql:2003, which has been implemented in many mainstream databases. Window functions can aggregate data on a result set, but they are not grouped. In fact, each window function has its own, separate partition by statement, which is too TM for the display report.

To use a window function:

Make SQL easier to read (but no GROUP BY statement professional in subqueries)

Improve performance, like relational database management systems to make it easier to optimize window functions

Workaround:

When you use the GROUP BY statement in a subquery, consider repeatedly whether you can use the window function to do so.

 9. Indirect ordering using memory

The SQL ORDER BY statement supports many types of expressions, including case statements, which are useful for indirect sorting. You may not be able to sort data in Java memory again, because you'll want to:

SQL Sort is slow

SQL sort cannot be done

Processing method:

If you sort any SQL data in memory, think twice about whether you can't sort in the database. This is useful for database paging data.

 10. Inserting a large number of records into one article

JDBC "understands" batch processing (batch), you should not forget it. Do not use INSERT statements to access thousands of records in one entry or another (because) each time a new PreparedStatement object is created. If all of your records are inserted into the same table, create an INSERT batch statement with one SQL statement and a collection of values attached. You may need to commit to a certain amount of insert records to make sure that the undo log is small and thin, depending on your database and database settings.

Processing method:

Always use batches to insert large amounts of data.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.