Original address: http://www.nowamagic.net/librarys/veda/detail/2217
Last week I was talking about ORM, and after that someone wanted me to clarify what I meant. In fact, I used to write articles about ORM, but it was in the context of a big discussion about SQL that I shouldn't confuse two things. Therefore, in this article I will focus on the ORM itself. At the same time, I try to keep it simple, because it's obvious from my SQL article that people tend to leave when they read something that makes them angry (leaving a message, regardless of whether or not their concerns are discussed later).
What is anti-pattern?
I am pleased to find that Wikipedia has a fairly comprehensive list of anti-patterns, including content from the programming community and beyond. The reason I call ORM anti-pattern is because the author of the anti-pattern defines two conditions for distinguishing between anti-patterns and common bad habits, and ORM fully conforms to these conditions:
- It may seem useful at first, but in the long run the downside is greater than the benefits.
- There are verified and repeatable alternatives
Because the first factor led to the ORM's maddening (to me) epidemic: It looked like a good idea at first glance, but it was hard to leave when the problem was more pronounced.
What does this mean for ORM?
The main problem I want to talk about is ActiveRecord, which is famous for Ruby on Rails and has been ported to many other languages since then. However, these problems also exist in other ORM layers, such as the Java Hibernate and PHP doctrine.
The advantages of ORM
- Simple: Some ORM layers tell you that they "eliminate the need for SQL". I still see this commitment spreading. Others will more realistically claim that they can reduce the need for handwritten SQL, but still allow you to use it when you need it. For a simple model and early in the project, this is really an advantage: with ORM, it's no wonder you're able to start faster. However, you will be heading in the wrong direction.
- Code generation: Using ORM to eliminate user-level code from the model, this approach opens the door to code generation. With a simple description of the schema, scaffolding mode can generate a working interface for all your tables. More magical, you can modify your schema description and then regenerate the code, eliminating the crud. Again, this is actually feasible at the beginning.
- Performance "Good enough": I don't see any ORM layer claiming to be more superior in performance. Obviously, the code needs to be performance-responsive for the agility of the code. If it slows down, you can always overwrite your Orm method with more efficient handwritten SQL. Isn't it?
The ORM Problem
1. Insufficient abstraction
The most obvious problem with ORM is that it cannot be abstracted entirely from the implementation details. The concept of SQL is referenced everywhere in all the main ORM documents. Some of these introductions do not indicate their equivalents in SQL, while others treat the library as a procedure function for generating SQL.
The point of abstraction is that it should simplify the problem. by abstracting SQL and asking you to understand SQL, you multiply the things you need to learn: first, you have to understand what SQL you are trying to execute, and then you have to learn the ORM API to get it to write these SQL for you. In hibernate, you even need to learn a third language in order to complete complex sql: HQL, which is almost SQL (but not quite), which is translated into SQL behind the scenes.
ORM supporters will argue that not every project is so, not everyone needs a complex join, and ORM is a "80/20" solution, where 80% of the users only need 20% of the functionality in SQL, ORM can handle these issues. What I can say is that the experience of the backend of the database that I wrote the web app for 15 years suggests that this is not the case. You do not need join and local join at the beginning of the project. After that, you need to refine and consolidate your queries. Even though 80% of users use only 30% of the functionality in SQL, 100% of users need to break the ORM abstraction to get the job done.
2. Incorrect abstraction
If your project does not really need any relational data functionality, then ORM works perfectly for you. But then you come across another problem: you're using the wrong data store. The extra pay for relational storage is very high, which is one of the important reasons why NoSQL data is much faster. However, if your data is relational, the extra effort is worth it: your database not only stores data, it also expresses your data, and you can answer questions about it based on the concept of relationships, which is much faster than what you can do with process code.
However, if your data is not relational, then you are using SQL in an inappropriate situation, which adds a huge and unnecessary burden to you, and in order to make the problem more serious you add an extra abstraction to it.
On the other hand, if your data is relational, your object mappings will eventually fail. SQL is about relational algebra: the output of SQL is not an object, but a solution to an issue. If your object is an instance of X and has some y and each Y "belongs to" Z, what is the correct representation of the object in memory? Should it be an attribute of x, or is it all contained in Y, or/and is all contained in Z? If you only get the properties of X, then when do you run the query to get y? And, do you want one or all of them? In reality, the answer is conditional: That's why I say SQL is the answer to the question. The expression of an object in memory depends on your intentions, whereas object-oriented design does not rely on the functionality of context-sensitive representations. Relationships are not objects, and objects are not relationships.
3. Multiple queries cause failure
This naturally leads to another problem with ORM: inefficiency. What attributes do you need when you get one? ORM doesn't know, so it always gets all (or it asks you to tell it, but it breaks the abstraction again). This is not a problem at first, but when you take thousands of records at a time, if you only need 3 properties and have to pull out all 30 columns, you have a serious performance problem. Many ORM layers are not very good at inferring joins, and thus have to use detached queries to get associated data. As mentioned earlier, many ORM layers explicitly declare that efficiency will be sacrificed, some of which provide some mechanism to adjust queries that cause problems. The problems I have found in my past experience show that there are very few situations where a single "silver bullet" query needs to be adjusted: The backend of the application's database is not because of one of the queries, but by a multitude of queries. The lack of context-sensitive nature of ORM means that it cannot consolidate queries, but instead must use caches or other mechanisms to compensate for some degree.
So what's the alternative?
Hopefully I've clarified some of the design flaws in ORM. But as an anti-pattern, there is a need for alternative solutions. There are actually two alternatives:
1. Working with objects
If your data is an object, stop using the relational database. The programming community is currently in the process of storing a key-value pair that allows you to access elegant, self-contained, massive amounts of data at lightning speed. There is no legal requirement that MySQL should be installed in the first step of writing a Web application. Using relational databases for each expression of an object is an overuse, which is one reason why SQL's name is not so good in recent years. In fact, the problem is the lazy design.
2. Using SQL in the model
There is only one right way to do everything in programming, which is a dangerous argument. However, according to my practice, the best way to express a relational model in object-oriented code is still the model layer: encapsulating all your data representations in a single area is a good note. However, remember that the workbook in the model layer is about expressing objects, but rather answering questions. Provide an API that can answer the questions your application contains, keeping it simple and efficient. Sometimes, these answers seem to be out of tune to look "wrong", even for senior OO developers. However, you can better find the universality of it based on experience, allowing you to refactor multiple query methods into a single.
Similarly, sometimes the output is a single object x, which is very easy to express. But there are times when the output is an aggregated object table, or a single integer value. You have to resist the temptation to wrap this content in too much abstraction, and describe it in terms of the object itself. First and foremost, don't believe OO can express any object and all objects. OO itself is a graceful and flexible abstraction, but the relational data is outside its scope, and it is the core of ORM and the real problem to disguise what it cannot express.
Summarize
- ORM is faster and easier to understand than writing SQL-based model code at first
- It was effective early in any project.
- Unfortunately, these benefits disappear when project complexity increases: Abstractions are broken, developers are forced to use and understand SQL
- Entirely informal, I think Orm's destruction of abstractions is not just about 20% of projects, but almost 100%.
- object is not sufficient to fully express the results of a relational query.
- The adequacy of relational query mapping to objects leads to inefficient ORM backend applications, which are widely distributed throughout the application, and there is no simple solution beyond completely abandoning the ORM.
- Instead of using relational storage and ORM for any problem, think more carefully about your design
- If your data is inherently an object, use Object storage ("NoSQL"). They are much faster than relational databases.
- If your data is inherently relational, the cost of relational databases is worth it.
- Encapsulate your relational queries in the model layer, design your API to provide data access support for your application, and reject the temptation to overdo it.
- Object-oriented cannot express relational data in a valid form; This is a basic limitation of object-oriented design, and ORM cannot fix it.
[Go] Why do I say ORM is an anti-pattern