Performance Optimization series six: Database design

Source: Internet
Author: User
Tags dba lowercase

First, for the optimization of the design of 1. Database design

Database design, a cornerstone of the success of a software project. Database design is also a subject of knowledge.
Database design by the developer early in the project (DBA required for post-tuning). A developer proficient in OOP and ORM, the database is often designed to be more reasonable and adaptable to changes in demand. Because the database normalization, and OO part of the idea of similar (such as cohesion). DBAs, however, have the advantage of designing a database that is capable of maximizing the capabilities of the DBMS, using SQL and the DBMS to implement many of the program's logic, and that the DBA-optimized database is more efficient and stable than the developer.

2. Differences in database design and program design

A system like the following:

Object-oriented design ideas:

Packaging
Polymorphic
IOC
Aop...

Database design ideas:

Focus on Storage
Focus on efficiency
Two-dimensional table relationship
Focus on completeness

3. Early optimization of database design

Don't just think of it as a function of storage
1, the relationship is clear: the relationship between the tables must be clear precedent
2, space-saving: the principle of the right type, do not waste storage space
3, improve efficiency: for example, the current operating system are 64-bit, we choose the type of primary key is Beginint, because Beginint is also 64-bit, and CPU register matching, so that can be calculated once to get value, more efficient

Second, the design principle 1. Type of database

Description: The database we are talking about here refers to the relational database

2. Database Features

The difference between a file system and a database system:
(1) The file system is used for long-term preservation of data on external memory, and the database system uses database to store data uniformly;
(2) The program and data in the file system have certain connection, the program and data in the database system are separated;
(3) The file system uses the access method of the operating system to manage the data, and the database system uses DBMS to manage and control the data uniformly;
(4) The file system realizes the data sharing in the file unit, and the database system realizes the data sharing in the record and the field.

Connection between the file system and the database system:
(1) is the management technology of data organization;
(2) Management of data by data management software, program and data between the use of access method to convert;
(3) The database system is developed on the basis of the file system.

3. The first step in optimization design

Proficiency in data types

4. Second step in optimization design

Understanding Paradigm 1nf,2nf,3nf ...

4.1 First Paradigm

1NF: column is not divided. Each column is an indivisible basic data item

Counter Example: a column: Name = (Xiao Li, Xiao Zhang)

4.2 Second Paradigm

Based on the 2NF:1NF, the non-primary attribute is completely dependent on the primary key

ID, student number, name, account number, Grade

Name, age these non-primary attributes depend on the primary key number, account

4.3 Third paradigm
3NF: Property does not depend on other non-primary properties, eliminating transitive dependencies

Curriculum: Student ID, school number, Grade

The number of the study here and 4.2 in the number of the transmission depends on, because the number of the 4.2 in the change, the number of the school will be followed by changes
4.4 BCNF: 3NF compliant, only one candidate key per table
4.5 4NF: No multi-valued dependency

Description: Satisfies the 123th normal form when designing the table

5. The third step of optimization design

Have some ideas before you design

1. Choose a small data type--save space
2. Design the primary key separately and consider the distributed extension
3, FOREIGN key design
4. Index design--Improve performance
5, association relationship table design, many to one, many to many
6. Read and write frequently, separate from infrequent information
7, Configuration table, log table, scheduled task table, etc.
8. Summary table Design--To solve big data performance problem, summarize data according to business scenario

6. Fourth step of optimization design

There are some routines.

1. General-Purpose design
Example: Personnel, Department, role

This refers to the general design refers to applicability, such as the first to do is a small company's OA system database design, after switching to an international type of large companies, this set of design can still use
2. Special design
Accessories, logs, configuration, monitoring, etc.
3. Storage Design
Type partitioning facilitates partitioning
4. Some additional fields
Create date, modify date, sort
5. Water meter
Similar to logs, but comprised of business processing results, account changes or intermediate values for business processing

7. The RDBMS12 Law of Codd

Edgar Frank Codd (Edgar Frank Code) is known as the "father of relational Databases" and was awarded the Turing Award in 1981 for his outstanding contribution to the theory and practice of database management systems. In 1985, Dr. Codd published 12 rules that succinctly define the concept of a relational database as a design guideline for all relational database systems.

1. Information law all information in a relational database is represented in a single way-the value in the table.

2. Ensure that access rules rely on a combination of table names, primary key values, and column names to ensure access to each data item.

3. The systematic processing of NULL values supports NULL values (NULL) to process null values in a systematic manner, and null values do not depend on the data type.

4. The description of the dynamic online catalog database based on the relational model should be self-describing, with the same representation of the normal data at the logical level, that the database must contain a system table that describes the structure of the database, or that the database description information should be included in the tables that the user can access.

5. Uniform data Sub-linguistic law a relational database system can support several languages and multiple terminal uses, but must have at least one language whose statements can be represented as strings in a well-defined syntax and fully support all of the following rules: Data definition, view definition, data manipulation, constraints, Authorization and transactions. (This language is SQL)

6. View update rule all views that can theoretically be updated can also be updated by the system.

7. Advanced INSERT, UPDATE, and delete operations the ability to handle an underlying relationship or derivation as a single operand is not only suitable for data retrieval, but also for inserting data, modifying deletions, i.e., data rows are treated as collections in insert, modify, and delete operations.

8. Physical independence of data regardless of how the data in the database changes in the way it is stored or accessed, both the application and the terminal activity remain logically invariant.

9. Logical independence of data when the table is made theoretically without compromising the information changes, both the application and the terminal activity remain logically invariant.

10. The integrity of data integrity constraints that are specific to a relational database must be defined in a relational database sub-language and can be stored in a data directory, rather than in a program.

11. Distribution independence regardless of whether the data is in physical or distributed storage, or at any time changing the distribution strategy, the RDBMS's data manipulation sub-language must enable the application and terminal activity to remain logically invariant.

12. Non-destructive law if a relational database system supports a low-level (single-record) language, the low-level language cannot violate or bypass the integrity rules or constraints imposed by higher-level languages (one-time processing of multiple records), that is, the user cannot violate the constraints of the database in any way

Implement these principles:

(i) Reduced reliance on database functionality

(ii) Principles for defining entity relationships
The entity involved identifies all the entities involved in the relationship.
Ownership takes into account the situation in which an entity "owns" another entity.
Cardinality considers the number of instances of an entity associated with another entity instance.
(c) The column means the unique value
If you are representing coordinates (0,0), you should use two columns instead of "0,0" in 1 columns.
(iv) Sequence of columns, readability issues
(v) Defining primary and foreign keys
The data table must define the primary key and foreign key (if there is a foreign key).
(vi) SELECT key
(vii) whether NULL is allowed
Any value and null stitching are null after the concatenation.
All mathematical operations that are performed with NULL return NULL.
When NULL is introduced, logic is not easy to handle.

(eight) Normalization-paradigm

1NF
String data that contains the delimiter class character.
The tail end of the name has a number attribute.
There are no tables that define bad keys or key definitions.
2NF
Multiple attributes have the same prefix.
The repeating data group.
Aggregated data, the referenced data is in a completely different entity.
bcnf-"Each key must uniquely identify the entity, and each non-key familiarity must describe the entity.
4NF
Ternary relationship (Entity: Entity: entity).
A latent multi-valued attribute. (such as multiple phone numbers.) )
Temporary data or historical values. (The subject of historical data needs to be raised, otherwise there will be a lot of redundancy.) )
(ix) Select a data type
(10) Optimizing parallel
When designing a db, you should consider optimizations for parallelism, such as the timestamp type.

8. Naming rules

Table name Rules
1, to use a prefix, but do not use a meaningless prefix, is generally based on the business module to set the prefix
2, the underscore separates
3, all lowercase
Column name rule
1, generally do not use the prefix
2, the underscore separates
3, all lowercase

Third, the design case 1. Key design

Physical primary key, good index, eliminate transitive dependency
Primary key type, ordinary system is int or bigint, efficiency problem, the current operating system is basically 64-bit, it is recommended to use bigint
Uuid, capacity problem, collision prevention
Cancel all federated primary keys (in Curriculum: Grade + course name)

2. Index Design 2.1. B-tree Index

The B-tree index is the most frequently used index type in a MySQL database, and all storage engines except the Archive storage engine support B-tree indexes. Not only in MySQL, but in many other database management systems, the B-tree index is also the most important index type, mainly because the storage structure of the B-tree index has a very good performance in data retrieval of the database.

2.2. Hash Index

Hash index structure of the particularity, its retrieval efficiency is very high, index retrieval can be located at once, unlike B-tree index need from the root node to the side point, and finally access to the page node so many IO access, so the hash index query efficiency is much higher than the B-tree index.

Example:

Hash value: 1111 corresponding to record 1, record 2, record 3
Hash value: 2222 corresponding records are recorded 4, record 5, record 6

Description

Calculates the hash value according to the field, puts the record in a lattice, the index only needs to find the hash value to obtain the inside record.

2.3. Types of indexes

Normal index: The most basic index, without any restrictions

Unique index: Similar to "normal index", the difference is that the value of the indexed column must be unique, but a null value is allowed.

Primary KEY index: It is a special unique index and is not allowed to have null values.

Full-Text indexing: Available only for MyISAM tables, generating full-text indexes is a time-consuming space for larger data.

Combined index: For more MySQL efficiency, you can create a composite index that follows the "leftmost prefix" principle.

Overwrite index (covering Indexes): The column of the query is exactly the index column called the Overwrite index, such as the Select ID from table_a

Clustered index (Clustered Indexes)

The clustered index guarantees that the value of the key is similar to the physical location of the tuple store (so the string type should not be clustered index, especially the random string, will make the system to carry out a large number of mobile operations), and a table can only have one clustered index. Because the storage engine implements the citation, not all engines support clustered indexes. Currently, only SOLIDDB and InnoDB support.

Non-clustered index

Secondary index The leaf node saves a pointer to the physical location of the row, but rather the primary key value of the row. This means that rows are found through a level two index.
InnoDB the clustered index on the primary key. If you do not specify a primary key, InnoDB replaces it with an index that has a unique and non-null value. If such an index does not exist, InnoDB defines a hidden primary key and then establishes a clustered index on it. In general, the DBMS stores the actual data in the form of a clustered index, which is the basis for other two-level indexes.

3. Table attached field design

Common:
Creation Time: Create_at
Modification Time: Update_at
Sort: sn
Sometimes used:
Description: Desc,remark
Alternate field: Opts_1,opts_2 ....

4. Dictionary table Design

The difference between a dictionary table and a System configuration table
Examples of dictionary expressions
Constant quantification of commonly used data, type of consumption, type of payment, type of item (including hierarchy)
Examples of system configuration expressions
Constant quantification of function parameters of each module, Key-value

Dictionary table:

CREATE TABLEdict_content (dcc_idbigint  not NULL, dct_idbigint, Dcc_nocharacter varying( -), Dcc_namecharacter varying( -), Extend_nocharacter varying( -), remarkscharacter varying( -), sortbigint, parent_idbigint, Is_leafsmallint, Pathcharacter varying( -),     CONSTRAINTDict_content_pkeyPRIMARY KEY(dcc_id))

system table:

CREATE TABLESys_config_data (IDbigint  not NULL, Create_datetimestampWithout time zone not NULL, Modify_datetimestampWithout time zone not NULL, Catalogcharacter varying(255), Contentcharacter varying(255), Descriptioncharacter varying(255), SNcharacter varying(255),    CONSTRAINTSys_config_data_pkeyPRIMARY KEY(ID))
5. Annex table Design

Storage location
Type
Use
Number of Uses
Number of downloads

6. Hierarchical Structure table design

Parent-child level, parent_id
In-Table partitioning hierarchy
Out-of-table hierarchy
The fastest search design

CREATE TABLE  Public. IHP_BOQ_TPL (boq_idbigint  not NULL,--Manifest IDBoq_nocharacter varying( +),--Listing NumberExt_nocharacter varying( +),--Extension NumberBoq_namecharacter varying( -),--Manifest nameunit_idbigint,--Unit of MeasureBoq_kindsmallint,--List TypeBoq_modesmallint,--manifest ModeBoq_rate Numeric ( -,2),--Metering RateFormulacharacter varying( +),--Calculation FormulaUse_formulasmallint,--whether to enable formulasparent_idbigint,--parent NodePathcharacter varying( -),--Path     Level smallint,--levelEnd_nodesmallint,--Final NodeRemarkscharacter varying( the),--NotesStatussmallint,--StatusFeaturecharacter varying( -), Frequencycharacter varying( +), quota_unit_idbigint, Quota_tbl_nocharacter varying(8), project_idbigint,     CONSTRAINTPk_ihp_boq_tplPRIMARY KEY(boq_id))

7. Process table Design

Process Master Table
Task Sub-table
Business Table Association

CREATE TABLEProcess_run (Runidbigint  not NULL, Subjectcharacter varying( the) not NULL, creatorcharacter varying( -), UserIDbigint  not NULL, Defidbigint  not NULL, Piidcharacter varying( -), CreatetimetimestampWithout time zone not NULL, Runstatusbigint  not NULL, Busdesccharacter varying(1024x768), EntityNamecharacter varying( -), EntityIdbigint, Formdefidbigint,     CONSTRAINTPk_process_runPRIMARY KEY(Runid))
CREATE TABLEProcess_form (Formidbigint  not NULL, Runidbigint  not NULL, Activitynamecharacter varying( the) not NULL, CreatetimetimestampWithout time zone not NULL, Endtimetimestampwithout time zone, Durtimesbigint, Creatoridbigint, Creatornamecharacter varying( the), TaskIDcharacter varying( -), Statusbigint DEFAULT 0, Preformidbigint, Commentscharacter varying( -), entity_idbigint, Entity_namecharacter varying( -),     CONSTRAINTPk_process_formPRIMARY KEY(formid))
Four, redundant design inverse paradigm

Appropriate redundancy
1, draw on the idea of covering the index, put the common fields in the table directly
2, extract the relevant data, make a summary table
3. Violation of section 3NF
4, program-level redundancy or caching
5, do not write triggers, do not write stored procedures

Sub-table sub-Library

? performance: Can easily face massive data and high concurrency request processing, good distributed database can achieve more than 90% of the linear growth capacity;
Flexibility, resilience: the business and usage scenarios of modern systems are changing rapidly, and there are many uncertainties about user growth. Elastic expansion is very important. The distributed database itself has the characteristics of cloud-ready, can be able to add device capacity to meet the demand, without the need to influence development;
Multi-center, multi-Live: This is common in large-scale applications, the distributed database is easier to implement this function, of course, it involves the ability of the synchronization and consistency of the distributed database, which is an important indicator to judge the quality of the distributed database.
Read/write separation: The master-slave node can play a role; for example, the giant FIR sequoiadb database, can be used in a group of three copies of the replication group to implement the Oltp,nosql application, OLAP multiple application scenarios simultaneously.
Low cost: x86 server, SATA storage (partially available SSD), plus better network bandwidth.

1. Horizontal partitioning: Basic to the horizontal hash of 1 or more keys, enhance the parallel processing ability of data to improve performance;
2. Vertical partitioning: Much like the partition of the past, the meaning of the data is split;
3. Mixed partition: Horizontal and vertical partitions are used together.

Performance Optimization series six: Database design

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.