The application of UML in the design of relational database

Source: Internet
Author: User
Tags abstract define extend implement include inheritance object model sql primary key
Design | data | database | Database design 1. Introduced

Many people think that object-oriented concepts and relational databases are inconsistent and cannot be combined. In fact, quite the opposite! With flexibility, a relational database can provide an excellent set of implementations for object-oriented (OO) models. The same model can be used to develop programming code and relational database structures.

Relational database technology is far-reaching and powerful, but it is much more difficult than many developers have made you believe. A single table is easy to understand and intuitive. But the application of hundreds of tables (which are common) is quite difficult to understand thoroughly. This is where the OO model works. OO models allow you to think deeply and coherently.

The OO model provides a superstructure way of thinking about the problem, which can then be implemented using a lower-level block of relational databases.

This article discusses the relational database technology in a comprehensive way, rather than focusing on specific products. We will not discuss physical design details, such as storage allocation and physical aggregation, because they are dependent on the product.

There are two aspects to implement UML model with relational database: Mapping structure (2nd section) and Mapping Function (section 3rd). The 4th section notes the extension of object-oriented to relational databases. The 5th section summarizes this article.

2. Structure map to Table

The UML object model is essentially an extended entity-relationship (ER) model. Using the ER model of the design database is universally accepted, and we use the UML object model in an approximate but more powerful way. The main advantage of the OO model is the same model work for programming and databases. Also, as a way to consider functionality (section 3rd), we emphasize the navigation of the OO model. This section shows how to implement the main constructs of the UML object model.

2.1 Identity (Identity)

The first step in implementing the object model is to process identities. Let's start by defining a few terms.

1 A candidate key (candidate key) is a combination of one or more attributes that uniquely determines the record of a table or table. The attribute set in a candidate key must be minimized, and the attribute cannot be removed from the candidate key unless it is corrupted. The attribute in the candidate key cannot be null.

2 The primary key (primary key) is a specially selected candidate key, which is used to prioritize references to records.

3) foreign key (foreign key) is a candidate key reference. The foreign key must include a value for each element attribute, or it must be all empty. Foreign keys are used to implement associations and generalizations.

Normally you should define a primary key for each table, although there are occasional exceptions. We strongly recommend that all foreign keys point to only the primary key and not the other candidate keys.

There are two basic ways to define a primary key:

1) based on the existence of the identity. You should add an object identifier attribute to each class table and set it as the primary key. The primary key for each associated table includes one or more identifiers for related classes. Identifiers based on existence have the advantage of being a separate property, with a small placeholder and the same size. As long as your relational database management system (RDBMS) is supported, there is no performance disadvantage based on existing identifiers. (most RDBMS provide a valid allocation order number based on an existing identifier.) The only disadvantage is that identifiers based on existence have no intrinsic meaning in the maintenance.

2) based on the identity of the value. The combination of some real-world attributes determines each object. The value based identity has different advantages. Primary keys are inherently meaningful for users and are easy to debug and database maintenance. On the other side, a value based primary key is difficult to change. A primary key change needs to be propagated to many foreign keys. Some objects have no natural identifier in the real world.

We recommend that you use an identity based on presence in more than 30 types of RDBMS applications. The existence and value based identities are all viable options for all RDBMS applications.

2.2 Fields (property type)

An attribute type is a uml term that corresponds to the terminology of a field in a database book. The domain is upgraded to a more consistent design and facilitates the positioning of the application rather than directly using the data type.

Simple fields are easy to implement. You just have to define the data type and size. And each domain-scoped attribute, you must add a SQL query clause for each domain constraint. Some examples of simple fields are: First Name, long character (longstring), and phone number (phone-number).

An enumeration field restricts an attribute to a series of values. Enumeration fields are more complex to implement than simple fields, and Figure 1 shows four methods.



Figure 1: How enumerations are implemented

2.3 Class

Normally, we map each class to a table, and each attribute is mapped to a column. You may need additional columns for a generated identifier (based on an existing identifier), a hidden Association (sect. 2.4), and a generic discriminator (sect. 2.5).

2.4 Association

Now we discuss the implementation of the association. We have divided our statements into suggested mappings (maps we normally use), optional mappings (which we use occasionally), and discouraged mappings (errors we encounter that should be avoided). All of our examples are based on existing identities.

2.4.1 Suggested mappings

Many-to-many associations. Use a special table (Figure 2) to implement a Many-to-many association. The associated primary key is a merge of the primary keys for each class. Those ellipses (.....) Represents a property that is not displayed in the model. Primary keys are displayed in bold font.

A One-to-many association. Hide a foreign key in the "many" table (Figure 3). The role name becomes part of the foreign key attribute name.

0 or one-to-one association. Hide the foreign key in the "0 or one" table (Chart 4).

Other one-to-one associations. Hide the foreign key in any list.



Figure 2: Recommended implementations: A Special Many-to-many association table



Figure 3: Recommended implementations: Hidden One-to-many associations



Figure 4: Recommended implementations: Hidden 0 or one-to-one associations

Optional mappings Normally we use the suggested mappings. But there are occasional cases where an optional mapping is more appropriate.

A special table. You can also use a special table (Figure 5) to implement One-to-many and one-to-one associations. Special tables give you a more unified design and greater scalability. In any case, the special correlation table breaks the database and increases the number of tables. In addition, a special association table cannot force a lower multiplicity limit to "one".



Figure 5: Optional implementation: A special pair of X association tables

Discouraged mappings we have noticed that some developers choose flawed mappings. We need to be aware of these mappings so that we can avoid them. Merge. Do not merge multiple classes, and do not force the association to be a separate table (Figure 6). This reduces the number of tables, but interferes with the third paradigm.

Two times to hide a one-to-one association. Do not hide a one-to-one association two times, each time hidden in a class (chart 7). This is superfluous and does not contribute to performance.

The same property. Do not use the same attributes to implement multiple association roles (Figure 8). The same attributes make programming complex, and reduce extensibility. Generalization now we're talking about generalization. We only discuss a single inheritance here. The simplest way to suggest a mapping is to map only the superclass and each subclass to a single table. All tables share a common primary key. The application must perform the partitioning of subclasses because of the RDBMS support. (For a detailed description of the latter, see section 4th.) )

A special table. The mapping superclass and each subclass is a table (Chart 9). All tables share a common primary key. The discriminator indicates the appropriate superclass table for each child class record.



Figure 9: Recommended implementations: Separate superclass and subclass tables

There are several optional mappings for the optional mapping generalization. Eliminate. You can optimize the removal of classes that have no attributes other than primary keys (Figure 10). This reduces the number of tables, but provides fewer formal implementations.

Reduce superclass properties. You can remove the superclass table and copy the superclass attributes to each subclass (Figure 11). This can have the advantage of describing each object as a table. However, it will cause redundancy in the database structure, and you may need to search for more than one subclass table when you look up an object.

Increase the subclass properties. As a third option, you can drop the subclass table and store all of the subclass attributes into the superclass tables (Figure 12). This describes each object in a single table, but interferes with the second normal form.



Figure 10: Optional implementation: Eliminating unnecessary subclass tables



Figure 11: Optional implementation: reducing superclass properties



Figure 12: Optional implementation: increasing subclass properties

Referential integrity once you have established a table, you should define referential integrity actions to clarify the meaning of the object model. (Do not use SQL triggers to achieve referential integrity!) If you use an identity based on presence, you will not need to propagate the results of the update. We recommend the following guidelines for referential integrity for deletion: generalization. The deletion of a foreign key that is generated by cascading from a generalization implementation.

Hidden associations, minimizing diversity to zero. Normally set the foreign key to null, but sometimes you may want to disable the deletion.

Hidden associations, minimizing diversity as null. You can cascade a delete result or prohibit the deletion.

Association tables. It is normal for us to delete the records in the related table. However, sometimes we prohibit a deletion.

We have briefly discussed the referential integrity, because it is a high-level topic. References have more explanations for z and examples. The final step in indexing the database structure is to add indexes to adjust database performance. Normally, you should define a unique index for each primary key and candidate key. (most RDBMS are the side effects of SQL primary key and candidate key constraints to establish a unique index.) You should also create an index for each foreign key that is constrained by a primary key or a candidate key.

We emphasize the importance of indexing. The index of foreign keys and primary keys makes it no doubt that the object model can be traversed quickly. You must include these indexes otherwise you will discourage the user. You should index your database at the beginning of the design phase because they are easy to join and there is no good reason to postpone joining.

The database administrator (DBA) may define additional indexes for frequently requested queries. DBAs may also adopt product-related performance tuning mechanisms. Paradigm paradigm is an effective method to improve data consistency in relational database design. Our book 3 discusses paradigms, but we have little to say about the problem. We will use the opportunity of this article to clarify our point of view. You can skip this section if you are unfamiliar with the paradigm. Our instructions are for relational designers who are trying to adapt their original skills with object-oriented objects.

Paradigm is the precise principle of correctly designing a relational database. In the same way, they have nothing to do with what technology is being used-based on the design of the attributes, the entity based design, the object-oriented design, or anything else. In the past, developers had to pay attention to paradigm by using a method based on attribute design, which provided the basis for grouping data. Conversely, paradigms are not important for object-oriented (or entity-based) development. If you take the OO approach and your model is well conceived, you are making the data organization a meaningful unit, and in essence satisfying the rules of the paradigm. If you want, you can still check the paradigm, but such checks are unnecessary. Summary Chart 13 summarizes the mapping rules that we have already stated. A complete example of these mapping rules, including a UML object model, can be found in this complete extended version (Adobe Acrobat PDF file).



Figure 13: Summary of recommended mapping rules

Mapping functionality to the SQL command object model provides three main uses for database applications. Structure. The object model indicates the structure of the database. We have explored this aspect in section Ii.

Constraints. The object model also indicates important constraints on the data that can be stored. Matching implementations must work to cater to these constraints. Our approach to mapping rules and the referential integrity in the second section point to a number of constraints. (Additional UML constructs are not discussed in this article to get more constraints.) )

Potential estimates. An object model identifies potential estimates; it's a blueprint for what queries and how to formulate them. Section III will briefly clarify the third purpose.

Object models are not just passive data structures, but they can help you think about an application's functionality. You can say many of its functions by traversing an object model. For example, we think about the traversal of a model when we examine the use case. This emphasizes that the estimation capabilities of the object model are particularly important for RDBMS applications, because traversal expressions can be mapped directly to SQL code.

The UML Object constraint language (object Constraint LANGUAGE,OCL) helps to express traversal. Point symbol navigation from objects to objects and objects to properties. A square bracket represents a filter for the collection of objects. We add a colon (:) operator to represent the traversal of a generalization, because it is useful to have a clear traversal because we normally implement a generalization inheritance with multiple tables.

The traversal expression examples in Figure 14 are based on the UML object model we created (see the Extended version (Adobe Acrobat PDF file) of this article), which we mapped to SQL code. We start the SQL programming variable with a colon.



Figure 14: Example of object model traversal and SQL code

The OO extended database community to the RDBMS is interested in the OO extensions of the RDBMS. Product and SQL standards are trying to add to OO extensions. We will briefly state the direction of this technique. Abstract data type (ADT). This is a good idea to extend the capabilities of an RDBMS. Developers have used many names for this technology, such as Oracle cartridge and Informix data blades. The disadvantage of ADT is that they tie you tightly to a particular developer; The ADT category goes beyond the SQL standard. Therefore, you should only use it when the benefits of ADT are obvious.

There is some confusion about how ADT fits into database development. If you use the OMT development process, you can use attributes to implement a simple domain, using ADT to implement complex domains. You should still use a table to implement the class. SQL3 pointer. The latest SQL Standard version, SQL3, has been added as a pointer symbol for a data type. Clearly, the intent is to support navigation and object-oriented. Our most friendly comment on the SQL3 pointers is that they are grafted and can be ignored. A further criticism is that pointers are theoretically absurd, add complexity, and do not extend the ability to express SQL. CJ date eloquently discussed this at the last September object/relationship meeting.

Well, then we're coldly accused of abstract data types and SQL pointers. But we quite like object-oriented technology and relational databases. There are two extensions to RDBMS that make them easier to use for OO technology. We will be happy to see RDBMS developers adding these capabilities to their products. Extended referential integrity actions to support generalization. The current referential integrity mechanism is one-way. In order to fully support generalization, we need a two-way mechanism. This way, a superclass record can rely on a subclass record. And, a sliver class record can rely on a superclass record. We can best explain this by example.

Chart 15 is an excerpt from our financial case study in 3. We use asset superclass to unify some of the common data and functionality not shown in the excerpt. An asset can be a stock or stock privilege. A stock can have many of its stock privileges. For example, an IBM stock can have many write-down or privileged privileges that reach the price and expiration dates.



Figure 15: Examples of referential integrity and generalization

The generalization implementation we recommend is a special table-mapping the superclass and each subclass to a table. Then we can use referential integrity to make stock privileges and stock records dependent on assets. The deletion of an asset record is cascaded to the corresponding subclass record, stock privilege, or deletion of the stock. We can also define a referential integrity action, such that a stock deletion is cascade to the associated stock privilege record deletion.

Now the question is as follows. If one of the assets we delete is a stock, the deletion of the asset record is cascading to the deletion of the stock record. Subsequently, the deletion cascade of stock records causes deletion of all stock privilege records. But now the referential integrity has failed us: the deletion of a stock privilege record does not cause an asset record to be deleted. Deleting a cascade can only go from a superclass to a subclass. For the sake of complete behavior, the cascade should go in two directions.

The current useful referential integrity work is to do more programming (i.e. do more work and risk more failures). In our implementation of use-case learning, users are ready to delete an asset that is a stock, and we have to write extra code to check the existence of the associated stock privileges first and then delete them. Support for cross table record partitioning. The implication of individual inheritance (the most common way of generalization) is that each instance of a superclass is illustrated with a majority of only one subclass. Today's RDBMS cannot easily enforce this constraint. For example, there is nothing to prevent the following situation. A stock can be added to the asset table with ID18, added to the stock table with ID18, and can also be added to the Stock privilege table with ID18. Again, we have to do extra programming to make sure that the behavior is complete, rather than writing a simple declarative constraint

Conclusion

This paper presents a quick overview of UML models using relational databases. We hope that the technology presented to you in this article is appropriate enough. A well-trained developer can use a relational database to prepare a set of outstanding OO model implementations. If you want more details about the implementation mechanism, reference 3 has additional information, and it also covers some of the advanced model modeling structures that we have not discussed here.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.