Agile Methods in Database Design

Source: Internet
Author: User
[This article is transferred from http://hanbosun.blog.sohu.com/1285004.html]

Source http://www.martinfowler.com/articles/evodb.html
Original Name: Evolutionary Database Design

Author: Martin Fowler, Pramod Sadalage
Translation: winboy20

Keywords:
Database Design agile method Evolutionary Design

0 Introduction
In the past few years, we have applied agile methods to database design. We have come up with some tips to make databases evolve as applications develop. This is an important attribute of agile methods. Our approach is to work closely with application developers through continuous integration and automatic restructuring through database administrators (DBAs. These skills are effective in all stages of application development.

1. Agile Methodology

In recent years, a new software development methodology-Agile Methodology has emerged. This puts forward some new and huge demands for database design. One center of these needs is evolutionary design. In an agile project, we need to assume that we cannot determine the system requirements in advance. Therefore, it is unrealistic to have a detailed design stage at the initial stage of the project. System design must evolve with software changes. Agile methods, especially extreme programming (XP), make this evolutionary design possible through some practices. Agile methods are used in Database Design for repeated iterations.

Many people doubt whether agile methods can be used in systems with large database components. But we did use a lot of agile and XP skills to solve the evolution and iteration problems in large database-based projects.

This article will introduce some practices for adopting Agile Methods in database design. Of course, this does not mean that we have completely solved the problem of database evolution, but we want to provide some effective methods.

2. actively respond to changes

A notable feature of Agile programming is its attitude towards change. The general interpretation of the software process is to understand the requirements as soon as possible, stop the changes in requirements, use these requirements as the basis of the design, stop the design changes, and then start to build a system. This is the waterfall method-a planned life cycle.

This method reduces changes through a large amount of preliminary work. Once the preliminary work is completed, the demand changes will cause great problems. Therefore, when demand changes, such a method will have a big problem, so demand changes are a big problem in this process.

Agile programming faces changes in another way. Embrace changes, and even allow changes later in project development. Although changes will be controlled, this attitude will allow as many changes as possible. Changes are partly due to unstable project requirements, and partly to support the changing business environment to face the competitive pressure.

In order to achieve this, we must adopt different design attitudes. Design is not just a phase-a phase that is largely completed before the building starts; design is a continuous process related to coding, testing, and even publishing. This is the difference between planning and evolutionary design. An important contribution of agile methods is the proposed evolutionary design in a controllable manner. Therefore, it is not because the design is not pre-planned, which leads to confusion. Agile Methods provide techniques for controlling evolutionary design and making it feasible.

An important feature of agile methods is iterative development, that is, running multiple complete software lifecycles throughout the project lifecycle. The agile process has a complete life cycle in each iteration. Iteration can complete the final product requirement, including subcentralized coding, testing, and integration code. Agile method has a short iteration time, usually between one week and two months, and we prefer a shorter iteration cycle.

When using agile methods, the biggest problem is how databases are designed for evolution. Many people think that database design is the work of the preliminary plan, and changing the database design plan in the future will cause the application software to crash; changing the database design plan after configuration will lead to data migration problems.

Over the past three years, we have participated in a large project that uses practical evolutionary design methods. This project includes a project team of 100 members and more than 200 tables. The database has been evolving for a year and a half during initial development, and is even evolving for multi-user distribution. At the beginning, we iterated once a month, and after a few months it became an iteration once every two weeks.

As we promote these experiences to more and more parts of the project, we will gain experience from more and more cases. At the same time, we have also absorbed some experience from other agile projects.

2.1 Restrictions
Before describing practical methods, we must point out that we have not solved all database evolutionary design problems, especially:

* ** We designed an application database for a separate application, rather than trying to integrate multiple databases;

* ** We did not update the database 24*7.

Although many believe that we cannot solve this problem, these problems can be solved. Of course, this requires further work, but it cannot solve the problem.

3. Practice
The methods for database evolutionary design depend on some important practices.

3.1 close cooperation between database administrators and developers
An important principle of agile methods is that people with different skills and backgrounds can work closely together. Formal meetings and documents cannot achieve communication results, so they need to work together and cooperate closely. All project team members must work closely together: system analysts, project managers, industry experts, developers, and database administrators (DBAs)

Every task of a developer may require DBA help. Developers and DBAs need to consider whether they need to make great changes to the database plan. Developers ask DBA about how to cope with changes: developers know what new features are needed, and DBAs have a global view of the data in the application.

To achieve the effect of close cooperation, DBAs must make themselves easy to access. DBA needs to set aside several minutes for developers to ask questions. Make sure that DBAs and developers are sitting together so that they can communicate easily. At the same time, make sure that the application design meeting is open so that the DBA can join the meeting at any time. In many cases, we find that barriers between DBAs and application developers must be removed so that evolutionary database design is possible.

3.2 each project team member has its own database instance
Evolutionary Design believes that people are trying to learn. During programming, developers perform some experiments before applying a preferred solution to a feature. The same is true for database design. Therefore, it is important that every developer has an instance for testing without affecting others. In this way, everyone can perform experiments based on their own needs.

Many DBA experts think that multiple databases are troublesome and not easy to use. However, we find that it is easy to operate around one hundred databases. Of course, it is very important to have a convenient tool that allows you to operate databases like operating files.

3.3 developers often integrate databases into the Shared primary database
Although developers can perform frequent experiments in their own spaces, it is also important to regularly integrate different jobs. Application development requires a shared primary database, where all the work is collected. When developers start to work, they obtain and copy data from the primary database to their workspace, perform operations and modifications, and then feed the changes to the primary database. Our rule is that each developer submits a consortium once a day.

Assume that a developer starts a development task at ten o'clock A.M., part of the task is to change the database plan. If this change is simple, such as adding a field, you can decide on your own. With the help of Data Dictionary, developers must also ensure that the field database they want to add does not exist. However, it would be much easier for him to discuss this possible change with DBA.

When preparing to start, he first obtains a copy from the master database, so that he can freely change the database plan and code. Because he uses his own database instance, it will not affect others. At some time, for example, three o'clock P.M., he knew exactly what kind of database changes he needed, even though he had not completed his coding work yet. Then he finds the DBA and tells him the change he wants. At this time, DBA can raise issues that developers did not consider. Of course, most of the time it works well, DBA agrees to this change (through reconstruction of one or more databases ). DBAs make changes immediately (unless they are destructive changes) so that developers can continue their work and submit code at any time because DBAs have sent these changes to the primary database.

This principle can be viewed as similar to continuous integration, which is often used for source code management. In fact, this is to regard the database as another source code. Because the configuration management system controls the master database like the source code. As long as the building is successful, the database and source code are sent to the configuration management system, so that we have a complete and synchronized version history.

For source code, problems in integration are handled by the source code control system. For databases, you need to do a little more work. All Database changes must be properly handled, such as automated database reconstruction. In addition, DBAs need to review any database changes to ensure that they comply with the overall database plan. In order to make this work more stable, there should be no major changes in the integration process-therefore, DBA should work closely with developers.

We emphasize regular small integration because it is much easier than non-regular large integration. The complexity of integration increases with the scale of integration. Therefore, it is easier to implement many small changes in practice. Of course, this seems to be in conflict with intuition.

3.4 The database contains plan and test data
When talking about databases, we not only refer to database plans, but also include a considerable amount of data. The data includes the standard data required for the application, such as the names of all provinces in China and sample data of some sample customers.

Role of data:

1. Easy to test

Using a large number of automated tests can help stabilize the development of applications. Such tests are commonly used in agile methods. To make these tests effective, the rational way is to work on the basis of a sample test data so that all tests can be completed before the program is officially started.

2. Test Database migration

In addition to the test code, the sample test data allows us to test the database migration. When the database plan is changed, we must ensure that all planned changes can also process the sample data.

In most projects, the sample data is fictitious. However, in some projects, people use actual data as an example. In these cases, the data is extracted from the system previously used by the automated data migration code. Obviously, it is difficult to migrate all the data immediately, because in the early iteration, only a small part of the database is established. But we hope to change the Migration Code when the application and database develop. This not only solves the migration problem as soon as possible, but also makes it easy for industry experts to handle the system being developed. Because of the data they are familiar with, they will point out areas that may cause problems in database and application design. Therefore, we recommend that you introduce actual data in the early iteration of the project.

3.5 All changes should be refactored into the database
Refactoring technology is to apply all controllable technologies to change the existing code base. Similarly, we have defined database refactoring and provided similar control for database changes.

The difference between database reconstruction is that it must complete three different changes at the same time:

* ** Change the database plan

* ** Data migration

* ** Change the database access code

Therefore, when describing database refactoring, We must describe three aspects of changes and ensure that these three changes are completed before another refactoring is applied.

We must document different database refactoring, so we cannot describe them in detail. However, there are several points that need to be pointed out: Like code refactoring, database refactoring is very small. A series of minor changes in the concept chain, similar to databases and code. The changed three attributes make small changes more important.

Many database restructures, such as adding a field, do not need to update the code of all access systems. However, if you do not know the new plan before using it, this field will be useless, because the new plan does not know where it changes. Many changes, without considering the entire system plan, are called destructive changes. For example, you can set an existing null value column to a non-empty one. Disruptive changes require greater attention, depending on the extent to which they are destructive. A small destructive example is to set an existing null value column as a non-empty column. In this case, you can do this with your header masked.

Reconstruction will consider the database's empty value data. Developers will update the database ing code, so the update will not damage the code of others; if it happens by accident, developers will find problems during the establishment and use of tests.

Dividing a frequently used table into two is a more complex type of damage. In this case, it is important to let everyone know that the change is coming, so that they can be prepared. In addition, changes should be implemented at a safer time.

The most important thing here is that the choice is applicable to the process of your changes.

3.6 automatic refactoring
In the Code world, many languages can implement automatic refactoring. During planned changes and data migration, this automation is also important for databases. Therefore, you can write SQL DDL (for scheduled changes) and DML (for data migration) to reconstruct each database. These changes are not implemented manually, but automatically implemented using some SQL statements.

Once the code is complete, we save these code files to generate a complete change record of database changes as the result of database restructuring. Therefore, we can update any instance to the latest primary database, and generate an earlier change record for the database instance by copying the primary database.

Automatic serialization of changes is a basic function for continuous integration and migration of product databases.

For the end of the product database, we do not implement changes in the regular iteration cycle. We create a complete log of database reconstruction changes between each release. There is no doubt that this is a huge change and we must implement it offline. It is wise to test the migration plan before practical application. So far, this technology has been quite useful. By breaking down big changes into small and simple changes, we can make big changes to product data without causing too much trouble. To use a sentence in the military strategy, it is to "convert to zero ".

In addition to automatic forward changes, we also need to consider backward changes during refactoring. If this can be done, you can return to the previous database status. We didn't do this in our project because we didn't have this requirement, but this is also an important basic principle.

3.7 automatically update the databases of all developers
People change and update the primary database, but how can we find that the primary database has changed? In the traditional continuous integration environment with source code, developers update the primary database before submitting changes. In this way, they can solve problems on their own machines before submitting changes to the shared primary database.

Every time the primary database changes, We need to update the developer's database. When the primary database changes, we automatically update the databases of all project members. When the same reconstruction code is used to update the master database, the member database is automatically updated. Some people may think that updating the developer database will cause many problems without the developer's knowledge, but we have not found any problems in practice. Of course, this is only useful when people are connected to the Internet. Therefore, when developers are offline, they must be synchronized with the primary database as soon as possible.

3.8 clearly separate all databases to Obtain Code
To understand the results of database restructuring, it is also important to understand how applications use the database. If SQL statements are distributed around the code foundation, it is difficult to do so. Therefore, a clear database acquisition layer is very important to show how the database is used and where it is used.

Clear database layers have many advantages. It reduces the amount of SQL knowledge required for developers to manipulate databases, making it easier for developers who are not familiar with SQL statements to develop. DBA provides clear code to clearly understand how the database will be used. This also helps prepare indexes, optimize databases, and optimize SQL statements so that DBAs can better understand how databases are used.

4. Change rules
Like any practice, these principles must follow your special environment changes. There are no static projects, and we must respond to changes.

4.1 keep multiple databases in one system
A simple project may require only one primary database. However, complex projects require multiple databases, that is, database systems. If the database must be split before it is put into production, we can create a new database system. Databases are similar to code branches. Different test datasets are required for testing.

When developers obtain a copy from the primary database, they must register the database system they are modifying. When DBA updates a database system of the primary database, it also updates the databases of all developers who register the database system.

4.2 full-time DBA not required
All of these seem to require a lot of work, but it does not need a lot of human resources. Among the largest projects, we have 30 developers and a project team of 100 members (including quality evaluation, analysis and management personnel ), we have about 100 different series of products distributed across workstations. However, all these jobs require only one full-time DBA, and only two programmers can help them in their spare time.

In small projects, you do not even need a full-time DBA. When we use these skills for a smaller project-a small project of about 12 people, we find that this project does not require a full-time DBA. In contrast, we rely on two developers interested in databases to handle DBA tasks in their spare time.

This is the benefit of automation. If you automate each task, you can use fewer people to do more work.

5 Auxiliary Tools
Database Evolution requires a lot of repetitive work. We can develop some simple tools to help us solve a lot of repetitive work.

The most valuable part of automation is a simple code set for common database tasks. Automated tasks include:

* ** The user information is consistent with that of the current Administrator.

* ** Create a new user

* ** Copy the database plan and modify it in collaboration

* ** Move and merge Databases

* ** Delete a user

* ** Export users so that project team members can distribute offline Database backups.

* ** Import users so that project team members can have Database backups, import databases, and create new plans.

* ** Export the baseline and back up the primary database. This is a special case for exporting users.

* ** Create reports of different plans for comparison.

* ** Compare the plan with the master plan so that developers can compare their local copy with the master database

*** List all users

Analysts and quality reviewers often look at test data and need to change them. Therefore, we use VBA statements to develop an Excel application that extracts data from the database to an Excel file, allowing users to modify the file and then return it to the database. Of course, you can also use other tools to browse and edit the database content, but we use excel because many people are familiar with it.

All members of the project team should easily obtain detailed information about the database design to find out what tables are available and how to use them. We have created an HTML-based tool that uses servlets to query database metadata. Therefore, before adding a field, developers can search for the table and field metadata to see if the field exists in the database. We use Erwin modeling to extract data from Erwin to our metadata table.

6 conclusion
Of course, this is not all the application of Agile Methods in database design, nor the whole of database evolution design. There are also integrated databases, 24x7 implementation, and other unsolved problems, further research is required for database evolution design.

In addition, the Agile Database information provided by the redsaga site is provided, that is, Chapter 14th of Scott W. Ambler's Agile Database Techniques: <DownLoad It>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.