Design method and principle of database table structure

Source: Internet
Author: User
Tags benchmark dba

Http://www.cnblogs.com/RunForLove/p/5693986.html

The three paradigms of database design: In order to build a database with small redundancy and reasonable structure, we must follow certain rules when designing database. In a relational database, this rule is called a paradigm. A paradigm is a summary of a design requirement. In order to design a relational database with reasonable structure, a certain paradigm must be met.

In the actual development of the most common design paradigm has three: the first paradigm is the most basic paradigm. If all the field values in a database table are non-biodegradable atomic values, the database table satisfies the first paradigm, and the second paradigm is more advanced on the basis of the first paradigm. The second paradigm needs to ensure that each column in a database table is related to the primary key, not just one part of the primary key (primarily for the Federated primary key). In other words, in a database table, only one data can be saved in a table, and a variety of data cannot be stored in the same database table; the third paradigm needs to ensure that each column of data in the data table is directly related to the primary key, not indirectly. To summarize, that is: the first paradigm (to ensure that each column remains atomic); the second paradigm (to ensure that each column in the table is related to the primary key); the third paradigm (ensure that each column is directly related to the primary key column, not indirectly).

In the current enterprise information system, the database is still the best way to store data, although there are a lot of books in the guidance of our database design, but that way is the best way to design the table structure of the database, design should follow what principles, How the four paradigms can be used in a way to achieve smooth application, etc. is I have been thinking and summary of the problem, the following is my design for these issues based on the preparation of a summary of the outline of an article, we welcome a piece of discussion, brainstorming. The concept of domain modeling is mentioned, but it is not explained in detail, I hope we can have time to discuss this proposition in depth.

1. The database design should not be designed for the whole system, but should be based on the components of the system architecture, and the database design of the component unit for the business handled by each component; The association between the database tables corresponding to the different components should be minimized, If a table between different components requires a foreign key association, try not to create a foreign key association, but simply record a primary key of the associated table, ensuring that the components correspond to the independence of the table, providing the possibility of refactoring the system or table structure.

Notice what he said here is "do not create foreign key association", the statement that creates the foreign Key association is://foreign key (member_id) references member (ID);//We hardly use this statement, because that's what we do, when we use the foreign key, Simply record the primary key of the associated table instead of creating a foreign key at the database level. I do not know whether it is fluke, or the senior DBA is too strong, has been considered well.

2. Using the domain model-driven approach and the top-down approach to database design, first analyze the system business, define the object according to the responsibility. Objects conform to the encapsulated nature, ensuring that the data items associated with the duties are defined within an object that fully describes the responsibility and does not present a lack of responsibility description. And an object has and has only one responsibility, if an object is responsible for two or more than two responsibilities, it should be split.

Domain model-driven approach, currently used is not very familiar, not enough to consider. Because the table in the regular database is only used for storage,//especially small demand, what fields to add, find the related table plus go on the line, not much consider the domain model. This is very common in the Chinese station old business table

3. To map the database tables based on the established domain model, refer to the second paradigm of database design: All non-keyword attributes in a table depend on the entire keyword. A keyword can be a property or a collection of multiple properties, and in either case, ensure that the keyword is guaranteed to be unique. When determining a keyword, you should ensure that the keyword is not involved in the business and that there are no update exceptions, and the optimal solution is to use a self-increment numeric attribute or a random string as the table's keyword.

4. Because the domain model-driven approach of the 1th describes the design of a database table structure, each object in the domain model has only one responsibility, so the data item in the object has no transitive dependency, so the database table structure design of this idea satisfies the third paradigm from the beginning: a table should satisfy the second normal form, And there is no transitive dependency between attributes.

The three normal forms of the database do not remember the students to check the information. Personally, the third paradigm is designed to minimize data redundancy and ensure that only one copy of the same data exists. The third paradigm in fact we are not very strict, especially in the old database table will have redundant fields. It depends on the circumstances.

5. Similarly, because of the uniqueness of the object's responsibilities and the relationship between the objects reflects the relationship between the business logic, the objects in the domain model exist in the main object and from the object, from the object is from the angle of 1-n or n-n to further refine the business logic of the main object, Therefore, there is no delete and insert exception for table and table association relationships mapped from object and object relationships.

The last sentence does not understand, it may be "so the table and table associations should not be deleted and inserted exceptions." "?

6. In the database table structure derived from the mapping, further modifications should be made to the fourth paradigm to ensure that there is no multi-valued dependency. At this time, we should feedback to the domain model according to the idea of reverse engineering. If there is a multivalued dependency in the table structure, it proves that the objects in the domain model have at least two or more responsibilities, and that the design corrections should be made according to the first article. The Forth paradigm: if a table satisfies bcnf, there should be no multivalued dependency. 

We don't obey much of the four paradigm. For example: The value of the//vas_wp_config.config_name field includes: ADV (advertising theme)/glare (colorful scrolling theme)/theme_simple (General theme)/theme_cartoon (animated theme)/Theme_ None (does not show background theme)//cate_background (background)/video (company video)/board_cartoon (animation signboard)/board_simple (ordinary signboard) and so on. If you follow the IV paradigm, you need to add a new Vas_wp_config_name table, store the configuration name enumeration value, and the Vas_wp_config.config_name field to vas_wp_config.config_name_id. This is better for scaling, not to set a messy value in the Vas_wp_config.config_name field because everyone's understanding is inconsistent, but this requires maintaining more small tables, causing the number of data tables to swell, and DBAs may find it more difficult to manage. We use the unspoken rules convention, the Java enumeration class and other ways to guarantee. But sometimes the effect is not very good, often found in the old database table in the enumeration of the value of a variety of fields, not all agreed.

7. When the analysis confirms that all the tables satisfy the two-three or four paradigm, the association between the table and the table is as weak as possible in order to adjust and reconstruct the table fields and tables structure. And, I think the table in the database is used to persist the state of an object instance at a specific time and under certain conditions, but only a storage medium, so there is no strong association between tables and tables to express the business (consistency between data), this responsibility should be ensured by the logical layer of the system, This approach also ensures that the system is compatible with incorrect data (dirty data). Of course, from the point of view of the whole system, we still have to do our best to ensure that the system does not produce dirty data, from another point of view, the generation of dirty data is also inevitable to some extent, we also want to ensure that the system fault tolerance of this situation. This is a compromise solution.

8. Index the primary key and foreign key for all tables, and target the index of the combined attribute (for some large data volumes and common retrieval methods) to improve the retrieval efficiency. While indexing consumes some of the system resources, it is still worth advocating to compare the performance impact of searching for data in the entire table, especially when the data in the table is large, and the performance impact of sorting operations without indexes.

Indexes are currently created by DBAs based on specific SQL, but when you develop write SQL, you should also consider the index of the fields appropriately.

9. Minimize the use of stored procedures, there are many technologies can replace the functions of stored procedures such as "object/relational mapping", and so on, the guarantee of data consistency in the database, regardless of version control, development and deployment, as well as the migration of the database will have a great impact. However, there is no denying that the stored procedure has a performance advantage, so when the hardware that the system can use is not improved and the performance is a very important quality attribute, the stored procedure can be considered balanced.

Now is to eliminate the use of stored procedures, I think it is more convenient to use, for us, the main reason is to give the DBA management trouble,//Because time is long, the logic of stored procedures and usage scenarios, often no one can understand, easy to produce more problems

10. When the cost of dealing with association constraints between tables (often a cost of use) exceeds the cost of ensuring that no modifications, deletions, changes to exceptions are guaranteed, and that data redundancy is not the primary issue, the table design can be inconsistent with four paradigms. The four paradigms ensure that no anomalies occur, but may also lead to overly pure design, making the table structure difficult to use, so it is necessary to make a comprehensive judgment at design time, but first ensure that it conforms to four paradigms, and then refinement correction is the best way to enter the field of database design.

11. The design of the table to have a good usability, mainly reflected in the query if you need to correlate multiple tables and also need to use complex SQL skills. I feel that the more I follow the paradigm, the more I make SQL complex, specific analysis of the situation. Designed tables to minimize data redundancy, ensure data accuracy, and effectively control redundancy to help improve database performance

Therefore, after considering the above conditions, the table design Convention rules are as follows:

Rule 1: The table must have a primary key. Rule 2: A field represents only one meaning. Rule 3: Always contain two date fields: Gmt_create (Date Created), gmt_modified (date Modified), and the two fields should not contain additional business logic. In rule 4:mysql, Gmt_create, Gmt_modified uses the datetime type. Rule 5: Prohibit the use of complex data types (arrays, custom types, etc.). In rule 6:mysql, the satellite table ID is consistent with the primary table ID after the satellite table has been split. Adding a primary key field to the secondary table is not allowed. In rule 7:mysql, a table with outdated concepts must have an expiration mechanism at the beginning of its design with a definite expiration time. Expired data must be migrated to the history table. In rule 8:mysql, tables that are no longer used must be notified to the DBA to rename the archive. In the rule 9:mysql, if there are any fields that are no longer used in the online table, it is forbidden to keep the data intact. In rule 10:mysql, the use of OCI drivers is prohibited and all thi drivers are used.

A summary of some of the learning notes on MySQL:

One, transaction and storage engine

1. Four transaction isolation levels: Read uncommited, read commited (most db default), repeatable read (mysql default), seriazable.

2.mysql is the default auto commited, which means that each query is automatically committed by default (show variables like ' autocommited '). MySQL can set the isolation level by using the SET transaction isolatioin levels command, for example: Set session transaction Isolation degree read commited.

The 3.mysql image InnoDB uses MVCC (multi-version concurrency control) to handle concurrency. MVCC only works on the two transaction isolation levels of Read commited,repeatable read. The read uncommited isolation level is incompatible with MVCC because the query at that level does not read data rows that conform to the current transaction version, but is the most recent version of the data row. The Seriazable isolation level is incompatible with MVCC because read operations under this level lock each returned row.

4. Select the storage engine, concurrent selection MyISAM, transaction selection Innodb,myisam more error than InnoDB, error recovery time is also longer. Only MyISAM supports full-text indexing.

5. Move the table from one storage engine to another:

  1.    ALTER TABLE mytable Engine=falcon;  The operation is time consuming and may consume all of the server's I/O processing power.  2.    CREATE table innodb_table like myisam_table;//        ALTER TABLE innodb_table engine=innodb;//        INSERT INTO innodb_ Table select * from Myisam_table;

Ii. Types of data

1. As far as possible to define field as not NULL, MySQL is more difficult to optimize the use of nullable columns of the query, it makes the index, index statistics more complex. Empty columns require more storage space and special processing is required within MySQL, and each record requires an extra byte when the nullable column is indexed. Even if you want to store "no value" fields in the table, consider using 0, special fields, or empty strings instead.

2.datetime and timestamp can save the same data: the accuracy is seconds, but the timestamp uses only half of the datetime, but also can save time zone, have special Automatic update ability. However, the time range saved by timestamp is much smaller than DateTime. MySQL can store the finest time granularity in seconds

The 3.mysql supports many aliases, such as Bool,integer,nummeric.

The 4.float and double types support approximate calculations using standard floating-point operations. The decimal type holds the exact decimal number, which is performed on the >=mysql5.0,mysql server itself, because the CPU does not support direct operations on it, so it is slower.

5.mysql will save text and blob-type columns as objects with entities. They have their own data type family (tinytext,smalltext,text,mediumtext,longtext; blob-like); MySQL is different from sorting blobs to text columns and other types, It is not sorted by the full length of the string. Instead, they are sorted by only a few bytes specified by Max_sort_length.

6. Use an enum instead of a string type. MySQL saves each enumeration value internally as an integer. Enums are internally sorted by number, not by string. The worst thing about enum is that the list of strings is fixed, and the add and remove must use ALTER TABLE.

7.ip addresses are typically saved with varchar (15) columns. In fact, the IP address is an unsigned 32-bit integer, not a string. MySQL provides the Inet_aton () and Inet_nota () functions to convert between a certificate and an IP address.

Third, index

1. A clustered index is not just a separate index type, but also a way to store data. The clustered index of the InnoDB engine actually holds the B-tree index and data rows in the same structure. When a table has a clustered index, its data rows are actually stored on the leaf of the index. Note that the storage engine implements the index.

2.myisam and InnoDB data layout: The MyISAM index tree (either primary key index or non-primary key index) the leaf node is the data row that points to, and the InnoDB in the clustered index, the primary key index tree leaf node takes the content of the data, instead of the primary key index tree, the leaf node points to the primary key value, Rather than the location of the data.

3.mysql has two ways of producing sort results: Using file sorting, or scanning an ordered index. Currently only MyISAM supports full-text indexing.

Table-level lock for 4.myisam table; MyISAM table does not support transactions, in fact, MyISAM does not guarantee the completion of a single command, MyISAM only cache the internal index of the MySQL process, and stored in the key buffer. The OS caches the table's data, the rows are tightly stored together, the data on the disk is very small disk occupied and a fast full table scan.

5.innodb support transactions and four transaction isolation levels In mysql5.0, only InnoDB supports the external key, supports row-level locks and MVCC, all InnoDB tables are clustered by primary keys, all indexes (open primary keys) are referenced by primary key, the index is not compressed using prefix, so the index may be much larger than MyISAM, the data is reproduced slowly; _increment, which is the use of table-level locks to produce each auto_increment.

Four, MySQL performance analysis

The 1.mysql provides a benchmark (int loop number, char* expression), which can parse the time spent executing an expression. For example:

Select BENCHMARK (10000,SHA1 (' aaaaaaaaaaaaaaaa '))

2.mysql has two query logs: normal log and slow log.

V. Advanced features of MySQL

1. In MySQL, only the MyISAM storage engine supports full-text indexing. MyISAM Full-Text indexing is a special B-tree with a two-layer structure.

2. Storage engine transactions are given an acid attribute inside the storage engine, and distributed (XA) is a high-level transaction that extends the ACID properties outside of the storage engine, even outside the database, in a two-part submission. Phase 1: Notifies all contributors to prepare for submission Phase 2: Notifies all participants to make a real submission.

The 3.mysql character set and proofing rules have 4 levels of default settings: Server-level, database-level, table-level, and field-level. Mysql4.1 started supporting SQL subqueries.

/******************************************//*   Database Full name = [email protected]:3318 "MySQL"    *//*    table name = task_new   *//******************************************/create table ' Task_ New ' (   ' id ' bigint) unsigned not NULL auto_increment COMMENT ' primary key ',   ' task_name ' varchar (+) not Null COMMENT ' Task name ',   ' image ' varchar ($) DEFAULT NULL COMMENT ' task icon ',   ' description ' varchar ( 1024x768) NOT NULL COMMENT ' task description ',   ' content ' varchar (1024x768) NOT null COMMENT ' task contents ',   ' finished_message ' varchar ' DEFAULT NULL COMMENT ' task completion prompt ',   ' task_scope ' int (one) not NULL COMMENT ' task scope, 0-platform task, 1-game task ', &N bsp;  ' series_task ' int (one) not null DEFAULT ' 0 ' COMMENT ' Task type: series task, separate task ',   ' task_type ' int (one) not null DE FAULT ' 0 ' COMMENT ' task type: fixed task, promotion task, Daily Task ',   ' pre_task ' varchar (+) DEFAULT NULL COMMENT ' predecessor ',   ' P Ost_task ' varchar ' DEFAULT NULL COMMENT ' Post task ',   ' task_status ' int (one) not NULL COMMENT ' task status, pending, not started, in effect, paused, completed, audited failed ',   ' auto_task ' tinyint ( 4) NOT null DEFAULT ' 1 ' COMMENT ' Whether the manual task, 0-No, 1-is ',   ' is_required ' tinyint (4) NOT NULL COMMENT ' must task ',  &N BSP; ' event_type ' varchar (+) default NULL COMMENT ' Care event type ',   ' task_target ' bigint (0) Default ' COMMENT ' Task target ',   ' reset_num ' int (one) NOT null COMMENT ' reset number ',   ' reset_cycle ' int (one) NOT null COMMENT ' reset period ',    ' task_interval ' int (one) NOT null COMMENT ' task interval ',   ' xiaoer ' bigint (all) unsigned NOT null COMMENT ' created Person ',   ' review_id ' bigint (a) unsigned not NULL COMMENT ' approver id ',   ' last_start_time ' datetime DEFAULT Null COMMENT ' Last effective time ',   ' gmt_create ' datetime not NULL COMMENT ' creation time ',   ' gmt_modified ' datetime not Null COMMENT ' modified time ',   ' start_time ' datetime NOT NULL COMMENT ' start time ',   ' end_time ' datetime NOT NULL CO Mment ' End time ',   ' start_condition ' varchar (1024x768) NOT NULL COMMENT ' task trigger condition ',   ' end_condition ' varchar (1024x768) NOT NULL COMMENT ' task completion condition ',    ' Enable ' tinyint (4) NOT null DEFAULT ' 1 ' COMMENT ' is available ',   ' rule ' varchar (4096) NOT null COMMENT ' task rule ', &NB sp;  ' priority ' int (one) not null DEFAULT ' 1 ' COMMENT ' task precedence ',   ' progress_rule ' varchar (2048) NOT NULL Defau LT ' COMMENT ' progress calculation rule ',   ' order_no ' int (one) DEFAULT ' 1 ' COMMENT ' sort number ',   ' classification ' int (11) Default ' 0 ' COMMENT ' 0: \n1: Play the game \n2: Draw ',   ' level ' int (one) default ' 0 ' COMMENT ' for the same category, different levels ',   ' Ext1 ' longtext COMMENT ' extension field 1 (Use this field in UU to indicate a button jump) ',   ' ext2 ' longtext COMMENT ' extension field 2, temporarily reserved ',   ' Channel ' int (one) default ' 0 ' COMMENT ' Task channel: 0-uu or 1-game_box ',   ' consecutive_day ' int (one) default ' 1 ' COMMENT ' The number of days to complete the task ',   ' activity ' varchar ' "Default ' Default ' COMMENT ' task belongs to the active name ',   ' device ' text COMMENT ' model ',   ' packages ' text COMMENT ' application ',  PRIMARY KEY (' id '),   key ' Name_channel ' (' task_name ', ' Channel '),   key ' activity ' (' Activity ' (255 )) Engine=innodb auto_increment=1194 DEFAULT charset=utf8 comment= ' Task table ';

Database table structure design methods and principles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.