Development object-relational database applicationProgram(Part 1)
Paul Brown IBM Informix June 2002 |
|
|
|
content |
|
|
introduction |
|
database design objectives |
|
Database Analysis and Design |
|
data stream, workload, and query |
|
conclusion |
|
Introduction
This articleArticleIt was previously published on the Informix Tech Notes quarterly. The Tech Notes document collects articles from this quarterly magazine from 1991 to 2001. To obtain a reprinted copy of articles since 1998 or earlier, contact the technical support (tsmail@us.ibm.com.
Editor's note:This is the first article in a series of articles about the development object-relational database application. It is a free book published by Informix press.Developing object-relational database applications. This article (the first part) describes the database analysis and design methods, and the second part focuses on application implementation.
When developers using the object-relational database management system (ordbms) adopt a more "holistic" approach than the typical relational DBMS Technology for database analysis and design, it will benefit the most from ordbms technology.
Generally, relational DBMS is regarded as an efficient, reliable, and relatively static commercial data resource library. The SQL-92 provides a flexible framework for storing facts about interesting realistic situations and a mechanism for operating data that records these facts. However, the simplicity of SQL makes it difficult for developers to use it to implement many things. For example, try to answer the following simple questions with a SQL-92:
"Which of my employees has their birthdays this week ?"
You will ask, is this really so difficult? Yes, it would be easy if the calendar we actually use is as simple as the date data type of the SQL-92. However, this problem becomes complicated because people born in February 29 celebrate their birthdays on February 28 every year except for a leap year. As a result, you need to write a SQL-92 query as follows:
Figure 1. "Who is today's birthday?" written in SQL-92 ?" Incomplete query Solution
The main goal of object-relational DBMS technology is to enable developers to avoid such complexity and then create more intelligent databases. Later in this article, we will study ordbms solutions for the same problem. But first, we will review the high-level development method descriptions for developers who use ordbms. This method combines the relationship and object-oriented analysis and design. During this period, we will give practical suggestions on what is most effective and what is useless at all.
Database Design goals
A good way to consider ordbms is to treat it as a software "back-plane": a framework, you can embed software modules (Object Class). Object-relational DBMS and more traditional software frameworks (such as object-oriented DBMS, application server, or TP monitor middleware) the difference is that embedded object classes are deployed in abstract or logical data models. The result is that the object-relational database development team needs to work at two levels. A developer team uses a language such as C or Java to implement objects in the database, and the other team combines these objects to meet the High-level requirements of the application problem domain.
The starting point of any MIS Development Project is the user group that the system intends to support, and the most important goal is that the development project must meet user needs. We use the glossary to represent the complete set of user requirements. To fully serve user needs, the information system should strive:
- Complete.The system must contain all information related to the problem domain. In other words, databases must be shareable. Sometimes, this means that the database should contain "Reference" data: the fact that users are not authorized to control but rely on them for decision-making (ing, documentation, and metadata ).
- Correct.The system should accurately represent the problem domain. This fact will change over time, similar to all human work, so the system should be able to adapt to such changes.
- .Fuzzy Information in the information system (for example, if two opposite information is displayed) will cause confusion and errors among the users of the system. The design of information systems should minimize the risks of such problems. An important way to accomplish this is to design the database mode so that there is only one location in the database for any fact to be stored.
- Flexible.The most important experience learned from the success of RDBMS technology is that the ability to answer questions that were not expected during design at runtime is of great value. Humans are naturally more flexible than computer systems, and users often ask new questions and demands. Of course, the degree of database support for such behaviors is a measure of its strategic value.
- Valid.Time is money. System efficiency must be measured according to the following criteria: Operational Performance (end user response time), time-to-market (ease of development), and administrative overhead.
These are the goals of any information system, no matter what technology is used to build it. The project varies with the degree of emphasis on each target. Embedded applications accessed by other software systems focus on efficiency and consistency, but do not have to be particularly flexible. On the other hand, systems that support management decision makers should first be correct and flexible. When selecting an alternative design, a clear understanding of the goals and objectives of the new system's user groups is very helpful.
Database Analysis and Design
ANSIThree-tier Database Model)It has been available for 20 years, but it is still widely used as in the past (although suppliers are constantly repackaging its concept ). The layer-3 Model describes three "abstract levels", corresponding:
- The conceptual framework of end users and methods of their inference problem domains. This is calledConceptOrExternalLevel.
- Software Services that store information and provide tools that other programs can retrieve and manipulate. This corresponds to the abstraction level provided by DBMS software. We call itLogicOrInternalLevel.
- It is used to construct the physical structure of data storage and a low level of data retrieval and manipulation.Algorithm. Hide thisPhysicalLevel details are the main value-added for relational or object-relational DBMS.
All DBMS Analysis and Design Methods describe the techniques used to represent problem domains at different abstraction levels, as well as algorithms used to convert different models between layers. Figure 2 below provides an idealized diagram of how three models are arranged and interacted.
Figure 2. Three layers of the ANSI database model
The method described in this article is very common, because it is carried out from left to right. First, it focuses on capturing how system users consider the complete description of their access information. This conceptual view consists of a column of objects used by users and the facts described by these objects. Note: In many websites, "users" are idealized. That is to say, the overall architecture designer of the website will "introduce" them to you and decide what they should see, where they should see, and their behavior.
Conceptual models promote the design of logical databases. Logical databases are implemented as object-relational databases. In object-relational DBMS,ModeIn addition to tables, views, and common business processes, the concept also extends to include data types and their behaviors. Finally, ordbms is flexible enough, so developers and administrators can fine-tune the logic mode to optimize system performance. This involves physical layer work-creating indexes, partitioning data, and re-implementing embedded logic to achieve appropriate performance.
Conceptual Data Modeling
Different users have different points of view about the problem domain. The primary task of the development team is to analyze and record all the problems. As mentioned above, for starting this processConceptual Model ChangeIt is important to have reliable and definite cognition. Previously, it was difficult to modify the user interface. Upgrade the client software with a large and phased release version. However, because HTML and scripting languages make it so easy to modify the website's end user experience, and because the "Internet Era" is such a reference frame for accelerated development, therefore, today's user interface (UI) developers put a great deal of pressure on their back-end databases, requiring the database to develop as quickly as the user interface.
However, one thing remains unchanged, that is, graphs are the best way to record conceptual models. The graphic user interface makes it easy to draw complex images. And too manySemanticsThe existence of data models standardizes the meaning of different graphs, so that the developer team can clearly exchange ideas about users. Some of these semantic models include:
- Extended entity-relationship modeling (EER )). EER is similarInheritanceConcepts and abstract data types (Object domains or classes) extend the traditional entity-relational modeling (entity-relationship modeling ). Excellent eer analysis manuals include those written by Toby theoreyDatabase Modeling and Design(Morgan Kaufmann) and Fleming and von HalleHandbook of Relational Database Design(Addison Wesley ).
- Recently, more advanced link modeling technology has been developed. The most famous among them is object-Role Modeling or Orm. Prepared by Terry HalpinConceptual schema and Relational Database DesignPrentice Hall (Prentice Hall) provides an excellent introduction to this technology and provides an excellent introduction to conceptual analysis.
- Object-Oriented Analysis and Design have produced several very useful Chart Drawing languages. Among them, the most famous is the Unified Modeling Language (Universal Modeling Language) or UML. Although mainly intended for use and object-orientedProgramming LanguageBut UML can be quickly rewritten to work with the same object-relational DBMS. Many books and tools describe and use UML.
Many Computer Aided Software Engineering (CASE) programs help developers automate conceptual modeling. Unfortunately, most of these tools currently do not reflect the full capabilities of ordbms. For example, they do not capture information about the data type palette combined into database tables, or information about functions that implement the behavior of these objects. In time, these tools will be improved.
Extended object link Modeling
For clarity and simplicity, we use the eer chart in the example to draw a model. The eer does not want to cover every possible modeling situation, but it is simple, intuitive, and works well in most cases.
The difference between the usage of EER in the ordbms method and that in the traditional method is thatContentThe analysis is slightly different. After completing the high-level overview of the user conceptual model with eer, we use object-oriented technology to break down objects into the smallest set of object classes. Sometimes we treat the entire object as an object class. But the more typical case is that the elements that constitute the object (the elements of each elementTypeAnd the role of this type in the entity) can be effectively modeled into a meaning unit that cannot be further divided in the database (using object-oriented analysis technology ).
For example, consider the classic "Employee/department/product/customer" database model that university professors love. Naturally, our example is updated a little and contains some typical practical details that make things complex, so it is difficult to handle in traditional RDBMS. Figure 3 below shows an extended Er (Extended ER) diagram describing the advanced conceptual model in this example.
Figure 3. Extended ER Model of "boxes-R-Us Inc"
A brief description of our application should look like this:
"Boxes-R-Us Inc. (a company) manufactures and sells a series of board boxes of various colors and sizes.Product)ToCustomer). Its manufacturing is distributed to a seriesBranch). EveryEmployee)All inBranchWork, but may change laterBranch.EmployeeInto different subtypes:Contractor)AndFull-time (full_time)Employee (Production)Workers andSales)Personnel ). This is becauseEmployeeCompensation is obtained according to different procedures, and the procedures are developed based on the type of employees.SalesBy salesCustomerOfProductThe quantity is rewarded ."
When using ordbms, all modeling concepts with eer modeling features are effective. Like the example using a traditional eer model, it is a good idea to record the following content in your diagram:
- Estimate the ratio of the "one (has_a)" Relationship between entities. In our example, the "crow's feet" technology is used to indicate that each segment creates multiple products, but the products are not manufactured by multiple segments. Other concepts, similar to necessary relationships (whereWeakThe existence of entity instances depends on the relatedStrongEntity existence) and business rules are also useful.
- Description of anyKey. The key concept is very important for two reasons. First, the key-driven normalization process enables database designers to ensure that each fact in their mode is represented only once. Second, keys often indicate important data rules that should be enforced to ensure that the data being stored is correct.
- Carefully model the relationship. The one-to-one relationship is much more complex than the simple "one-to-one" Relationship between entities. This first relationship is reflected in the employee hierarchy. Indicates this class in ordbmsInheritanceThe hierarchy is much more direct than in RDBMS. The second type of link isQualified. For example, multiple employees work in multiple divisions for a period of time.
Let's discuss the internal structure of these two entities in more detail. Note: Although we have used a fairly standard entity diagram here, you may have better conditions and use the UML Style Class Diagram (class digoal) for this more detailed description. An important difference between the method of modeling RDBMS and ordbms by using EER is that the ordbms model supportsStrong typing). That is to say, all types of similarity (in the relational theory, we callDomain. Cataloging all these fields into part of the eer model requires that you pay special attention to the data types that actually constitute each entity element. For example:
Figure 4. Detailed Structure of two entities from the eer Model
There are several things to point out about these entities.
First, pay attention to the number of different data types and how to capture more semantic information in the conceptual model of a strong type. When using RDBMS technology, it is common to break down each attribute into a group of atomic components and assign a SQL-92 type to each component. For example, we can use integer instead of employee_num and customer_num. This leads to an unfortunate result. The type of information must be distinguished by the attribute name. When using ordbms, it is easy to create a separate type for each data object type, and keep the attribute name as the role of the attribute in the entity.
Strong type is a good programming practice. In a strongly typed mode, it is possible to use system cataloguing to find messages that are combined with different parts of the mode. For example, assume that you want to know everything the database knows about employees. In our mode, the employee uses the employee ID (the primary key of the employee entity is an attribute named ID, specified as employee_num. Therefore, all "tables" with such columns in the pattern can be said to have some relationship with employees, even if no general foreign key constraint is defined.
Strong data types are particularly effective in decision-making support databases. For data warehouse purposes, developers may create several tables that are extracted from several operating systems and loaded into a single database. Generally, data objects similar to employee_num are used to cross-check records between different systems. Once all the data is loaded into a single database, the strong type allows developers who may not be clear about what data exists to answer important metadata questions, similar to "What information do we have about employees ?"
When performing an analysis at this level, it is a good idea not to study too much details about the data type you are using. If you are using RDBMS, the two address attributes may be divided into first_line, second_line, city, state, and zip. The most effective use of ordbms means a slightly different approach. Once you have identified each location where a special data type is used in the problem domain, it is better to know what they share. Then, you can reuse the structure and behavior of the development object in the entire mode.
Attributes that contain employee resumes and addresses are unfamiliar to RDBMS developers. However, these are the data types managed by many objects-relational databases. In these cases, developers can purchase extended packages-known as IBM Informix; The Datablade module-to manage this type of data.
Figure 5. Internal Structure of products entities
There are two points to describe about the elements that constitute the products object.
The ordbms data model supports non-first paradigm attributes. Here, we can see an example that stores the available color ranges of a specific box. In this phase, you do not have to worry about how to physically represent this structure. Although an ordbms table can store a set of data values in a single column, as we can see, this may be the best way or not.
Each box produced by our company has a specific physical specification, the most important of which is the size of the box (how long, how wide, how high) and how powerful (How much can be accommodated without breaking ). For boxes-R-Us, their business efficiency depends on the difficulty of ordering by the customer. This means that, if their database allows them to directly ask whether objects such as physical weights and "12 inches x 10 inches x 8 inches" can be put into the "31 centimeters x 31 centimeters x 31 centimeters" box. things, is ideal. Ordbms scalability can directly embed similar objects into the query language, which is the biggest benefit of building commercial applications.
So why are all these activities useful? Because it helps you create the minimum directory of various data in the application. As we can see, some development processes involve the use of ordbmsUser-Defined typeAndUser-Defined FunctionsFeature to implement these different data types.
Object-Oriented Analysis and Design
The central idea of object-oriented analysis and design is that the object interfaces (the method of processing objects) should be separated from the detailed information (data structure and logic) implemented by them. The role of OO analysis in the development of ordbms databases is to use a conceptual framework of a scalable type system. Use a user-defined type mechanism to implement new objects.
These types of behaviors are implemented in user-defined functions.
The primary task is to create a list of all types of data identified in the eer model. This includes entities and various data that comprise the attributes of these entities. The following table lists the parts of the various data identified in our example, which are not listed in specific order.
Figure 6. domain list from boxes-R-Us eer Mode
An important advantage of this method is that even though shared data objects may be used in many parts of the entire mode, you only need to develop them once. In this way, if you find errors or negligence in the early stages of analysis, it is relatively easy to correct the error. If the definition of these objects is dispersed in multiple places throughout the mode, correcting the problem will indeed become a very large task.
Users often use different terms to represent the same thing (Synonym) Or use the same term to indicate different things (Antonym). Using object-oriented technology helps you understand these vague terms. Object-oriented software engineering technology uses "object"Atomic Unit indicating encapsulation status and BehaviorThis concept is central. Although no one agrees with these names, if the two objects have similar processing methods, they are likely to be the same. Similarly, two data types with different definitions but with the same name may be different objects.
For example, the mail_address and delivery_address fields may become synonyms. Represents the interface of each object of the object interface using the UML standard. A uml class diagram represents the static structure of an object, including whether the elements of that structure can be directly processed by developers of this type. The two data types are closely related, but the difference is that the delivery_address instance can contain a document that provides detailed instructions.
Figure 7. UML class diagrams of two similar objects
Developers have two options to turn this analysis into an implementation plan. First, they can develop these two objects as completely unrelated types. Another method is to useType inheritanceUse the mail_address type in delivery_address. To solve this problem, developers will reduce the numberCodeTotal. The code used to format the address label in mail_address can be reused for delivery_address. Of course, they can alwaysHeavy LoadThis action.
Some data types arePattern). Pattern is the repeated structure used by many types. Patterns are useful because they allow you to enrich the complete design based on several features.
EnumerationAn object is a simple mode example. In our database, there may be a limited variety of colors in the production box, only black, white, and brown. To maintain data consistency as much as possible, developers can use enumeration implementation to force these rules. Internally, the instance of the color object may be simply stored as a 2-byte integer. However, in the code that returns a value to the client program or inserts a value into the table, developers can write a logic to ensure that invalid instances cannot occur.
Another way to describe this situation is to say that the new object must implement specific behavior for specific methods. For example, when a customer or sales person browses a product catalog, they may search by specific criteria. They may only want a box that can hold 3 pounds objects. In other words, they want more than 3 pounds boxes. To effectively browse, they want the query to return matching boxes in ascending order of capacity.
This query may be similar:
"Show all boxes with a capacity greater than 3 pounds ?"
Figure 8. Example of workload Query
By understanding the importance of such queries, You can infer that users want to compare the instance of this object with another one, sort them, and add indexes to them. To support this function, a new type is required for implementationSequence Operators(Lessthan, lessthanorequal, equal, greaterthanorequal, greaterthan, and notequal) andB-tree Functions(Compare ()).
In, we show a UML diagram of the mass data type. Note how to use the full definition of each object behavior in this graph. This helps us to differentiate object behavior and identify multiple types of public "retained" behavior.
Figure 9. UML class diagram example with sequence support
Other types of indexes must provide different support functions. For example, consider the type of period data that is used to store the duration of a specific employee's association with a division. Period is an object with a start and end date and time. The concept of effective support such as overlapping time periods requires that the types include actions that can be used by the R tree access method.
Let's analyze the employee hierarchy in more detail. Recall that the main reason for dividing employees into sub-categories is that different types of employees perform different jobs in boxes-R-Us, so their compensation is also different (their payment calculation method is different ). The differences between production and sales personnel.
Figure 10. UML class diagram of Object Type
Note the degree of overlap in these definitions. These two entities are identical except for attributes related to how to pay for different types of employees. Note how these two entities share public behaviors. When we analyze other types of employees, we find similar patterns. However, the algorithm used to calculate compensation changes according to the employee category.
At this stage, it is clear that standardizing behaviors is an important task. All these classes introduced so far have some type of behavior. In some cases, how to express this behavior is not obvious. For example, the mass class may need to cater to both the metric and imperial units. In this case, it also needs to solve the conversion between the two units. Sometimes, you may see behavior changes. The precise algorithm used to calculate the annual salary of a sales representative may change every year because the management attempts to guide the sales personnel to accomplish different goals.
In the representation of this method, we describeTop-down. Before breaking down the problem into the smallest part, identify and analyze a large structure. Another equally useful method is to start from the analysis of the smallest component of the problem and find out how they are combined. This is calledBottom-upThe advantage of the analysis method is that a higher level structure (such as the relationship between tables and them) can be changed more quickly and easily. By first focusing on the integrated color palette of object types in a composite database, you will often find it easier to answer questions about a higher level structure. The method used to determine your project depends on your preferences and experience.
Data Stream, workload, and query
Another aspect of the overall application that needs to be considered in this phase isWorkload: A group of public queries and business processes supported by the Information System. The workload description is useful in multiple aspects.
First, developers can embed the logic for implementing business operations into ordbms, and then directly call them from the client program. Of course, this is nothing new. Relational DBMS supports thisDatabase ProcessIt has been a while. However, many developers are cautious when using this technology because it provides improved performance and requires developers to develop code in a dedicated process language. With ordbms, you can use standard API open languages (such as Java and JDBC) to implement these database processes.
Second, one of the experiences of early ordbms technology creators is that by introducing the logic traditionally associated with external programs into ordbms, they can improve the performance and flexibility of their overall system. For example, it is recommended that boxes-R-Us send a greeting card or text message to each employee on their birthday. As we have seen before, using SQL-92 support such as "Give a birthday, And then confirm today is a birthday ?" This very simple operation is impossible, because the matching birthday must be considered in the leap year of February 29. In a non-leap year, these people have their birthdays on February 28. Ideally, you want to run the following queries to handle complex problems.
Figure 11. Birthday Query
In external programs, it is feasible to create a "birthday" class in languages such as C ++ or Java. To use this function, you need to write a query to retrieve the name and birthday of each person born in the current month (about 1/12 of all data ). Then, on the client side, the programming logic in the birthday class will determine the matching. This type of object class may be similar to this:
Figure 12. Birthday UML class diagram
An important principle of the ordbms data model is that it is not necessary to use such a type to define the table structure or column. You can use the behavior of this object to perform complex query operations on other common data, as shown in Figure 11. Similarly, most Object-Oriented Methods admit that what you need is not necessarily an object, but a function or process. To those who are not familiar with object-oriented readers, it is entirely possible thatFunctionalityOrModularTo consider the design of ordbms databases.
Third, the workload Directory provides you with the Foundation for Writing Performance and scalability test environments. One of the most common causes of technical failures in new software systems is that developers did not test their code in any environment such as the production environment.
Conclusion
In this article, we have studied the initial development method of object-relational database applications. We have realized that in order to maximize the benefits from ordbms, developers need to collectProblem domainGenerally, more information is collected than such information in relational DBMS development. Because ordbms allows them to embed in software modules to indicate the status and behavior of objects in their problem domains, developers must pay attention to some aspects of their applications. Traditionally, these include external program tasks, middleware or client interfaces.
By using methods that combine extended object-link modeling and universal modeling languages to visually represent class definitions, we may get an application and scale it down:
- The smallestComplete.
- Construct these objects into the fact we want to recordCorrectAndConsistent.
In the last article of this series, we will describe a process in which we can use the features of the ordbms data model to represent the semantic model we described here. As part of this detailed design process, we review the performance and flexibility trade-offs between different mechanisms.
About the author
Paul BrownIs the "Chief pipe engineer" of IBM's chief Informix Technology Office ". Paul and Dr. Michael Stonebraker, Informix Chief Technology OfficerObject-relational dbmss: Tracking the next great wave. He is a member of Informix's architecture Review Board. He regularly gives speeches at Informix user group meetings and partner forums. He also published many papers on Database topics. You can contact him through the pbrown1@us.ibm.com. |
|
IBM and DB2 are trademarks or registered trademarks of IBM in the United States and/or other countries or regions. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and/or other countries or regions. The names of other companies, products, and services may be trademarks or service marks of other companies. IBM copyright and trademark information |