Database Design Notes

Source: Internet
Author: User

Database Design Learning Notes!

First of all, Xie Mu net.

What is database design?
Database design is based on the specific needs of the business system, combined with our chosen DBMS (Database management System), for this business system to construct an optimal data storage model. And the process of establishing the table structure in the database and the association between table and table. So that it can effectively correspond to the data in the system to store, and can effectively access the data that has been stored.

NoSQL System: Mongo/memcache/redis

Why database design?

Excellent design:
Reduce data redundancy
Avoid Data maintenance anomalies
Structure storage space
Efficient access
Bad design:
There is a lot of data redundancy
There are data insertions, updates, delete exceptions
There is a lot of wasted space
Access data inefficiencies

Database design Steps
Requirements Analysis----> Logic analysis----> Physical design----> Maintenance optimization
Requirements Analysis:
What is the data?
What are the properties of the technique?
What are the characteristics of data and attributes?
Logic design:
The logical modeling of the database is mainly through ER diagrams
Physical Design:
Fully taking into account the characteristics of each database management system;
Transform logical design into physical design according to the characteristics of the database itself
Maintenance Optimization:
New requirements to build a table
Index optimization
Large table Split


Requirements Analysis:

Why do you need to conduct a demand analysis?
1. Understand the data to be stored in the system
2. Understand the storage characteristics of the data
3. Understanding the life cycle of your data

Relationships between entities and entities (1 pairs of multiple, many-to-many)
What are the attributes that the entity contains?
What combination of attributes or attributes can uniquely identify an entity?

Example of demand analysis (small ecommerce website)

User module, commodity module, order module, shopping cart module, supplier module
User module:
Include attributes;
Optional unique identity attribute
Storage features: With the system on-line time increased, need to be permanently saved
Commodity module:
Storage Features: For offline products can be archived storage
Order module:
Storage Features: Permanent storage
Shopping Cart Module:
Storage Features: No permanent storage (set archive, cleanup rules)
Supplier Module:
Storage features: With the system on-line time increases, need to be permanently saved (quantity may be limited)


Logic design:

What does logical design do?
1. Translating requirements into a logical model of the database
2. Display the logical model in the form of ER diagrams
3. Independent of the specific database management system selected

Noun Explanation:
Relationship: A relationship corresponds to a table that is usually said
Tuples: A single row in a table is a tuple
Property: A column in a table is a property, and each property has a name that becomes the property name
Candidate code: An attribute group in a table that uniquely identifies a tuple
Main code: A relationship has multiple candidates, select one of the main code
Domain: The value range of a property
Component: A property value in a tuple

Overview of Design Paradigms:
Common database design paradigms include:
The first paradigm, the second paradigm, the third paradigm and the BC paradigm
Emphasis is placed on the top three paradigms;

Data manipulation anomalies and data redundancy

Insert exception, update exception, delete exception;
The same data exists in multiple places, or a column can be computed with other columns, that is, data redundancy
First paradigm: All fields in a database are single attributes and cannot be re-divided. Tables are two-dimensional tables

Second paradigm: A partial function dependency of a non-critical field on any of the candidate key fields does not exist in the table of the database.
Partial function dependency refers to the presence of a keyword in a combination keyword that determines the status of a non-keyword.

The third paradigm: The third paradigm is defined on the basis of the second paradigm, if there is no non-critical field in the data table the transfer function dependence on any candidate key field conforms to the third paradigm.

BC Paradigm: On the basis of the third paradigm, if no field exists in the data table, the transfer function dependency on either of the candidate key fields conforms to the BC paradigm.


Physical Design:

What to do with physical design
1. Select the appropriate database management system;
2. Define naming conventions for databases, tables, and fields;
3. Select the appropriate field type according to the DBMS system being calculated;
4. Anti-paradigm design.


Choose that kind of database?
Oracle, Sqlsercer, MySQL, Pgsql

Cost on:
Oracle and SQL Server belong to the business database and need to consider costs;
MySQL and pgsql are open source databases

Function:
Oracle functionality is strong and transaction processing is good

So the operating system used:
SQL Server is only available on Windows

Languages used for development:
PHP uses MySQL

Scenarios for application:
Oracle and SQL Server are better suited for enterprise-class projects
MySQL and pgsql for Internet projects


MySQL's common storage engine
MyiSAM write very little, read a lot can use, read and write are frequently do not use
Mrg_myisan can combine multiple MyISAM tables with the same structure into a single table handle, row-level locks are not supported, things are not supported, and they are not used to find too many scenes globally.
Innodb MySQL5.5 after MySQL default storage engine, support transactions, support row-level lock, most scenes can be used; Read and write is also very efficient;
Archive is also a row-level lock, suitable for logging this scenario, supporting Insert,select, requiring random reads, updating deleted scenes
NDB cluster to use a MySQL cluster scenario
Mainly use, suggest InnoDB, compared to is a better choice

Naming conventions for tables and fields:
1. Readability principle
2. The principle of ideographic
3. Longevity principle (try not to use abbreviations)
Field type selection:
The data type of a column affects the cost of the data storage space on the one hand, and the performance of the data query. When a column can select more than one data type, it should take precedence over the number type, followed by the date and binary type, and finally the string type. For data types of the same level, it is preferable to select a data type that occupies little space.

Tinyint-->smallint-->midiumint-->int-->bingint-->date-->datetime-->timestamp-->char (M) -->varcher (M)
When working with data, the same data, character processing is often slower than digital processing.
Data processing in the database is in the page, MySQL in InnoDB 16k a page;
The biggest bottleneck of the database is the I/O bottleneck of the disk.

How char and varchar are selected:
Principle:
1. If the length of the column to be stored is approximately the same, you should consider using char, such as phone number
2. If the maximum data length in a column is less than 50Byte, it is generally considered char
3. It is generally not advisable to define a char type class greater than 50Byte.
How to choose decimal and float:
Principle:
1.decimal is suitable for storing accurate data, and float can only be used to store imprecise data;
2. Because the storage overhead of float is generally smaller than decimal, non-accurate data takes precedence over the float type.

How the time type is stored:
1.int length, its use is inconvenient, limit
2. Time granularity required for storage


Additional Considerations for Database design:
How to select a primary key:
1. Differentiate between business primary key and database primary key
A business primary key is used to represent business data and to correlate tables with tables;
Database primary key in order to optimize data storage
2. Depending on the type of database, consider whether the primary key is to grow sequentially
3. The type of field to go to occupy as little space as possible


Avoid using FOREIGN KEY constraints
1. Reduce the efficiency of data import
2. Increase maintenance costs
2. Although foreign key constraints are not recommended, the associated columns must be indexed;

Avoid using triggers
1. Reduce the efficiency of data import
2. Unexpected data anomalies may occur
3. Is the complexity of the business logic table

Use of reserved fields is strictly prohibited

Inverse Paradigm Design:
The inverse normalization is for the normalization, for performance and reading efficiency, but when the third paradigm of the requirements of the violation, and there is a small amount of data redundancy, the use of space to change time;


Maintenance Optimization:

What to do to maintain and optimize:
1. Maintaining a data dictionary
2. Maintaining indexes
3. Maintain table structure
4. Split the table horizontally or vertically at the appropriate time


How to maintain a data dictionary
Tables and column tables in the INFORMATION_SCHEMA table

How to maintain indexes
How do I select the appropriate column to index?
1. In the WHERE clause, the GROUP by clause, the column in the ORDER by clause
2. High-selectivity columns to be placed in front of the index
3. Do not include too long data types in the index

Precautions:
1. The index is not the more the better, too many indexes will not only reduce the write efficiency, but also reduce the efficiency of reading;
2. Periodic maintenance of index fragmentation;
Do not use mandatory index keywords in 3.SQL.

How to maintain table structure
1. Tools for using the online change table structure;
2. Maintain the data dictionary at the same time;
3. Control the width and size of the table.

Appropriate operations in the database:

1. Batch operation and operation, batch operation is good
2. Prohibit the use of queries such as SELECT *
3. Controlling user use of custom functions
4. Do not use full-text indexing in the database

Vertical and horizontal splitting of tables
Vertical split: Splits a table's columns
As our needs become more and more, the columns may be increased
When a table is very wide, the number of rows searched on a page is small.
1. The columns that are frequently queried together are put together;
2.text, Blod and other large print segments are split into additional tables;

Horizontal split: By the primary key hash method of splitting, modulo.

2. Understand the storage characteristics of the data
3. Understanding the life cycle of your data

Relationships between entities and entities (1 pairs of multiple, many-to-many)
What are the attributes that the entity contains?
What combination of attributes or attributes can uniquely identify an entity?

Database Design Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.