In fact, this is the first time I have published my opinion on topics related to Orm. However, since such articles already exist, I should have discussed them again this time. The reason for writing this blog post is that I have implemented an ORM framework and integrated many of my own ideas. Therefore, this article will discuss my understanding and positioning of ORM from the perspective of object-oriented database development. This is a topic that is closely related to ORM but is obviously different. This article does not discuss what is object-oriented, the benefits of object-oriented, and why to use object-oriented methods to develop database-related applications. Here is a simple assumption. We must use the object-oriented method for development and discuss ORM on this basis.
Let's talk about what an ORM is. According to Wikipedia, Orm converts data to incompatible type systems in object-oriented programming languages. From the perspective of the orm I have seen, it is indeed done according to the definition of this concept. However, I do not understand the orm here. I will explain it later. First, let's talk about some of the most criticized questions about Orm. These words have long been intended to be said, and today is finally achieved.
1. Poor performance
This may be the highest among all the articles discussing the disadvantages of ORM. The performance bottleneck of ORM generally lies in the Process of instantiating data as objects, and of course there are other overhead. However, the problem is that the ORM gets an object, while the traditional method gets a dataset. The results of the two are different, so they are actually not comparable. It depends on whether the developer needs data or objects. If you need an object, You Need To Do It Yourself Without using an Orm, which will also increase the performance overhead. What is the slow performance of Orm? Avoid the low-level implementation of ORM. In fact, when the data volume of operations is limited to a relatively practical range, the performance of ORM is not much lower, or in actual development, it is completely acceptable and meets the requirements. Therefore, the essence of this problem is that ORM needs to provide two methods to operate data, one is object-oriented and the other is data-oriented, so as to provide developers with maximum flexibility, performance is not an essential issue. So why does ORM need to support two interfaces? Why don't programmers do it themselves? Orm is used when objects are required, and conventional methods are used directly when data is required. This involves my understanding of ORM, which will be explained below.
2. Data Access flexibility
If the four basic operations supported by An ORM are based on the SQL statement text, the development experience will be much worse, making people feel that they are directly writing SQL statements to operate the database. If ORM provides specific methods to support four basic operations, it is a challenge to complete complex queries and input complex conditional expressions. I admit that this is indeed a problem With Orm, and the vast majority of ORM frameworks (including the hibernate I saw at the time, maybe not now) are the same. For this reason, I implemented eql while implementing the emlib ORM framework. This technology similar to LINQ successfully solved this problem. Eql can complete SQL statements of any complexity and conditional input of any complexity without String concatenation.
3. Access to different data volumes
This means that when updating or reading data, the traditional method can only read or update the data in the required fields, but the ORM is to update or read the data of all fields. This is also a problem, especially when the data in a field is large, for example, a field contains an image. In my opinion, in essence, this question is a battle of ideas, that is, whether developers need to be object-oriented or data-oriented. For the former, these expenses must be borne. There is no perfect solution in the world. It is acceptable to pay some extra costs when you get the benefits of object-oriented. But what should I do if the access to these extra data volume is too large to cause performance problems or be unacceptable? At this time, as a problem, developers of the ORM framework still need to come up with solutions to the problem. I think there are two solutions:
A. When only some data is operated, data is operated using the data-oriented method in the orm framework.
In fact, when you operate on part of the data, especially in the query operation, the obtained results cannot be regarded as objects. Therefore, it is better to directly use data-oriented methods for operations. This is one of the reasons why I think the ORM framework must support data-oriented methods.
B. solve this problem through lazy loading.
First, design a field with a large amount of data into an entity, and specify the object member type corresponding to the field as the entity. In the orm operation, access data is loaded with lazy loads, and big data fields are not read when they are not actually used. This can effectively solve the problem of excessive data access during data reading.
For update operations, I have not seen any other ORM framework providing a good solution. My solution in emlib is to use the eql statement object to directly execute the update operation. At the same time, eql supports global transactions that keep the memory and database data consistent, so as to solve the problem that the access data volume is too large during the update.
4. Data Synchronization Problems
I have only raised this question, but I have never seen it in other places. I am very impressed with this issue and it is also the initial motivation for eql development. This problem is that if you directly execute an SQL statement to modify the data in the database, the value of the members of the corresponding entities in the memory will not be changed. In this way, the data in the memory is inconsistent with the data in the database. From the perspective of ing, it is indeed not perfect. However, the direct execution of SQL operation data has exceeded the controllable range of the Orm, so it is also excitable FOR THE ORM to be powerless. But how can we solve this problem? My solution is eql and global transaction management technology. The eql technology allows the ORM to execute various SQL statement objects. Its ability is equivalent to the direct execution of SQL strings. The global transaction management technology can make ORM transactions and database transactions a whole, so as to ensure that the data in the memory is consistent with the data in the database. Therefore, this problem was successfully solved in my ORM framework.
Next, let's talk about what the ORM looks like. I think According to Wikipedia, the positioning of ORM is too simple. You should change the name, for example, the name of an object-oriented database development library. So the name of my ORM framework is emlib, and the full name is entity model Lib. This positioning requires the ORM (still using this name) framework to provide a class library with complete object-oriented database operation capabilities. Instead of simply providing some practical methods. Second, this class library also provides other capabilities to help programmers operate data in the database. Although these conveniences are not necessary, they can significantly improve the development efficiency and enhance the coding experience. The second step is to be able to independently assume and complete the responsibilities of the data layer. Let's talk about these three points.
1. complete object-oriented database development capabilities
Currently, relational databases are the most widely used. Therefore, all data operations in a database can be attributed to several basic set operations, namely, makeup and difference product. For basic operations, adding, deleting, modifying, and querying are enough. However, the following two enhancements are required:
A. You can execute SQL statements of any complexity and construct conditional expressions of any complexity.
B. You can execute database functions.
Of course, the premise is that the String concatenation method cannot be used (Why cannot the concatenation method be described in the third part ). If we do this, we can basically complete any operation on the database, and achieve the so-called completeness.
2. Provide sufficient capabilities to improve development efficiency
This is a key and an important extension and enhancement of completeness. Here we will discuss several features.
A. Cascade operations
In my opinion, this function is almost necessary because it is not just a matter of convenience. During the design of the entity model, there will inevitably be an inheritance relationship between the entity and the entity, which will lead to the emergence of polymorphism. If the ORM does not directly support cascade operations, and this process requires the programmer to complete, the process of writing SQL statements and writing code that completes cascade operations is cumbersome and repetitive. In addition, because the polymorphism is determined by the model itself, it is difficult to achieve it at the source code level. The code of the Operation cascade can remain unchanged as the inheritance relationship changes. Therefore, An ORM framework supports cascade operations. Of course, considering performance issues, developers do not need the cascade function, but Orm must have this capability. Emlib is more perfect in this respect. It not only supports cascade operations, but also supports deep cascade operations for performance and Flexibility considerations.
B. entity type members
This feature is an embodiment of the complete body supporting object-oriented design. Without this function, object-oriented is incomplete.
C. Link Operations
It should be said that cascade itself is a manifestation of the use of relational operations. However, the two basic operations of the link must be independently supported. The two operations are the establishment and termination of the link. What if not? The developer must write the code to modify the foreign key and then modify the data in the memory. When the entities associated with a link involve polymorphism, the relationship is cumbersome. When the model changes, the Code modifications are relatively complex. Therefore, ORM should also support these two operations. For
Emlib also supports the third operation, that is, link transfer. Transfer means to terminate the relationship with one entity and establish a relationship with another entity. Emlib supports object-based and eql-based operations.
D. Lazy Loading
The urgency of this function is not very strong, but it is better to implement it if it can.
E. uniqueness of an object
This is the unique reference of an object. That is to say, a cache is required to store all the ing entities. If the ing is performed again, it is obtained directly from the cache instead of creating a new one. In the articles I have seen, this is all about performance. Because cache can reduce the time overhead of instantiating objects. However, I understand that this is a requirement of object-oriented systems and has nothing to do with performance. It is a required feature. Because any object-oriented programming language uses reference types to process objects. That is to say, the same object will always have only one. In addition, from the perspective of ing, the same record can only be mapped to the same object.
3. independent responsibilities at the data layer
All the ORM frameworks we have seen are positioned to provide support for the data layer, rather than being independently responsible for the data layer. So what is the reason for taking responsibility for the data layer independently? The reason is very simple. If the data layer still needs to be done by myself, I don't need an independent ORM framework. This requirement is beyond the concept of ORM for positioning based on Wikipedia. So what conditions must be met to independently assume the responsibilities of the data layer?
First, the functions of the ORM framework should be strong enough and flexible enough to encapsulate data operation details. Not to mention strong. Here, the flexibility is that the orm framework must support data-oriented and data-oriented operations. It is impossible to imagine that developers must use two class libraries to operate databases. One is Orm. When the ORM is not flexible, the class library provided by the development tool is used to perform data operations in a data-oriented manner. Secondly, the method design of the class library is appropriate, which ensures that the interface provided by the ORM framework allows developers to consider the problem from the business function level, rather than the data operation level. I personally think that a key step here is the existence of eql technology similar to the emlib framework. Let's discuss a specific example.
Assume that we enter the employee's employee ID and then view the employee's information. We call this employee ID query, and we can only enter the employee ID to query the employee. Consider the practice of a common ORM framework:
1. Obtain the entered employee ID.
2. Construct the text of the SQL statement.
3. Take the text as a parameter and execute relevant methods to obtain the corresponding employee object.
Emlib Framework practices:
1. Obtain the entered employee ID.
2. Construct an eql query statement object.
3. Use the modified statement object as the parameter and execute relevant methods to obtain the corresponding employee object.
The difference here is in step 2. The specific difference is as follows:
Splicing SQL text is the content of the data layer, and the data layer appears directly when implementing business functions. This indicates that the used ORM does not prevent developers from directly entering the data layer. That is to say, The ORM does not independently assume the responsibility of the data layer. Constructing an eql statement object is not the content of the data layer. The reasons are as follows:
1. In terms of content, eql is part of the emlib framework. The developer uses emlib and does not directly involve the data layer.
2. Constructing an eql statement object is completed by calling the method. It constructs a statement object instead of directly processing the content of the data layer.
3. In terms of structure, eql essentially prevents developers from directly using SQL statement text, which is located between developers and SQL statement text, thus blocking developers from directly entering the data layer.
The most typical embodiment is that the same eql statement object can be directly executed on different databases without modification. Emlib automatically completes content processing. The problems that may be exposed in this simple example are not very obvious. You can try it out. However, if such a difference exists in all operations, and the difference is even greater, the orm framework can assume the responsibility of the data layer independently.
The above are some of my understandings of the ORM framework. Of course, there are some one-sided, inadequate, or even incorrect ones, and I hope to get the readers to correct them. We are also very willing to communicate with readers.