Database paradigm: getting your head off

Last Update:2018-06-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[Introduction] Any term that has been designed for databases, as long as it is not a newbie and has heard the word paradigm, the terms 1NF, 2NF, 3NF, and BCNF will pop up immediately in my mind, but if you want to describe the professional definitions of these things in detail ...... Well, at least most people will be dizzy about functional dependencies and function dependencies ,~! @ # ￥ % ...... *

[Introduction]
Any term that has been designed for databases, as long as it is not a newbie and heard the word paradigm, 1NF, 2NF, 3NF, and BCNF will pop up immediately in my mind, but if you want to describe the professional definitions of these things in detail ...... Well, at least most people will be dizzy about functional dependencies and function dependencies ,~! @ # ￥ % ...... & *, My god! I can't remember it anyway :)

However, since we are on the IT thief ship, we can't avoid dealing with database design. The ugly daughter-in-law always wants to see her in the out-of-office. :) since we can't avoid IT, let us face IT bravely, unveil the mystery of the database paradigm and see if it is an ugly daughter-in-law or a handsome daughter-in-law !!

[Sentence paradigm]
The definition of the database paradigm itself is not too long here. You can search online or find this textbook to ensure that it can be seen everywhere. Here I will describe it in popular languages.

Before discussing it in detail, I would like to say: you must note that the paradigm is only used in table design, other database designs, such as indexes, views, triggers, and stored procedures, are irrelevant to the paradigm.

[1NF of the first paradigm]
1NF is actually very simple in Database Design: The column cannot be further divided..
What is a column that cannot be further divided? This means that each column contains only one attribute, and all attributes are of the same type.

Some people may see the following: dizzy. Which of the following statements can be used to separate the columns of database tables? In fact, this sentence is only for part. Next we will analyze in detail why it is called only for part.

What we are talking about now and the used databases are relational databases, and it is impossible for a relational database to design a column that can contain two different types of attributes. However, in the vast world, there are no surprises except relational databases, there are other databases, such as object-oriented databases and XML databases. This database can design an attribute that contains multiple sub-families. Therefore, 1NF is naturally satisfied in relational databases, other types of databases are not necessarily supported.

After reading the above passage, do you have to breathe a sigh of relief and feel at ease, so that you can leave 1NF alone? No, the database only guarantees the first layer, but you can still design a table that does not conform to 1NF. Below are a few simple examples to see how we do not comply with 1NF.

Example 1: The Student table has a property name of the string type.

This is the simplest example. It is estimated that many people have designed such a table, and the database will not reject the creation of such a table. But this design does not comply with 1NF. Why? The name is actually "famous with a surname", which is actually two attributes. For example, we want to find the number of students with the surname "Li" or the number of students with the same name (for example, "dog left. If such a query has only one name column, you can only use like to query it. In this way, the like needs to be modified for different names (for example, the Chinese surname is in the front and the American surname is in the back, in addition, like cannot use indexes to speed up query.

Example 2: The Custom table has an attribute Address.

You can analyze the problem by yourself according to the example above to see if there is any problem ........

================ I am a split line ======================

[2NF of the second paradigm]
2NF can be described as follows: The property of some primary keys does not exist..
Under what circumstances will some primary key attributes appear? There is no general principle. We can only give an example to illustrate this.

For example, the primary key of Score in the orders table is StudentID and CourseID. There are two other attributes: Grade and StudentName. The Grade here is completely determined by two primary keys (the standard term is "full dependency"), while StudentName is just the attribute of StudentID (the standard term is "partial dependency "), therefore, this design does not conform to 2NF.

Someone may ask: Can I use StudentID as the primary key? Yes, of course, but the CourseID does not conform to 2NF, because CourseID is not a property of StudentID.

After talking for a long time, we all say we want to follow the paradigm, What is the problem with non-compliance with the paradigm?What about it?
In short, there are two problems: Redundancy(Storage, operations ), Exception(Cannot be inserted or deleted)

(1) Redundancy
A student chooses many courses, so StudentName will be repeated many times, because it is repeated many times, so when the student changes his name, all StudentName records will be updated.

(2) Exceptions
Failed to insert: a new student has not selected a course (assuming all courses are required), so his StudentName has no place to record
Delete more: if the student graduates and all the courses are completed, all the records will be deleted, so that the student's name will not be recorded at the end.

[3NF]
3NF can be described as follows: Attributes do not recursively depend on primary keys..
What is recursive dependency? The simple one is "A → B → C", so C is the recursive dependency (the so-called dependency, in short, is the relationship) on.

Note: In the database, such A relationship requires that A is the primary key, B and C are not the primary key.

The following is a simple example:

Assume that the Student relationship table is Student (Student ID, name, age, school, school location, school phone number), and the keyword is single keyword "Student ID", because the following decision relationship exists:
(Student ID) → (name, age, school, school location, school phone number)
This database complies with 2NF, because each student's information is related to the student himself, that is, it is determined by the student ID, but does not comply with 3NF because of the following relationship:
(Student ID) → (school location, school phone number)
That is, there is recursive dependence on the non-Keyword section "school location" and "college phone" on the keyword section "student ID.
It also has data redundancy, update exceptions, insertion exceptions, and deletion exceptions. You can analyze them based on the analysis method in the 2NF example.

[Other paradigm]
There are a bunch of other paradigms: BCNF, and even abnormal 4NF and 5NF.

However, almost all heroes, prawns, and experts have said that 3NF is enough and there is no need to do so much. So I don't want to pursue it. If you are interested, you can study it on your own.

================ I am a split line ======================

[Two Sides of the paradigm: Angels and demons]
Someone said: you are the master of database design when you master the database paradigm !!

Well, it seems easy to be a master. But that's the right half of the story. Mastering the database paradigm can be a General Master, because after all, there are not many people who can master and apply the paradigm; but I am not a master. Why? Because the paradigm is not omnipotent, it also has its own adaptation scope and limitations. If principles and occasions are not added everywhere, it may be a devil that brings you a nightmare!

First, let's look at the purpose of the database paradigm? The paradigm has three main purposes:

Reduce data redundancy
Optimize table structure
Avoid operation exceptions

From the above three purposes, we can see that the main function of the paradigm is related to data storage, and the problem to be solved is the data storage problem. To solve these problems, the final action of the paradigm is to split the table into more tables.

However, in the actual application process, it is impossible for us to store the data and ignore it. Instead, we need to store the data. Otherwise, what is the significance of storing the data? :) since it involves data reading, therefore, the performance of Data Reading should also be focused. In addition, the program is written by people. It is also important for people to understand and use tables easily. If the number of tables is large and the relationship is too complex, errors may occur when used. Let's take a look at the limitations of the paradigm on these two points.

The limitations of the paradigm are as follows:

The increase in the number of tables will lead to more connection queries, and the efficiency of connection queries is certainly less than that of single table queries;
Increasing the number of tables will lead to complex table relationships, making SQL statements more difficult to write, and more difficult to understand table relationships;

Based on the above two limitations, there is another concept in database design called "Anti-paradigm". The anti-paradigm is very simple and can be understood literally, that is, the anti-operation of the paradigm. The following is a simple comparison:

------------ Paradigm ------------------------------ anti-paradigm ----------------------
Objective To solve storage problems solve performance problems and usage Problems
Action split table
Performance Optimization
Bytes -------------------------------------------------------------------------------------

Some people may ask: is there the best performance without a paradigm? No, because in addition to connection queries, a key factor affecting performance is the "Lock". If all the information is in a table, the conflicts between all operations on this table will be very severe, but will lead to a sharp decline in performance.

Therefore, the database design should eventually be weighed between the "paradigm" and "anti-paradigm" to find the best balance point. This is the only way to be a real database design expert !!

By now, the database paradigm has all been lifted by us. Of course, we finally found that this is neither a handsome daughter-in-law nor an ugly daughter-in-law, but an ordinary daughter-in-law that can be used in our design days :)

Source: http://blog.csdn.net/yunhua_lee/article/details/4030864

--------------------------------------------------------------------------

The database paradigm is a required knowledge for DBAs. However, many DBAs who leave their homes often ignore this knowledge.
During the eugene interview, I asked questions about int (n) and varchar (n) here about the meaning of N. If I interviewed a candidate, I would certainly ask questions about the paradigm.
In order to make it easy for everyone, I will share with you my understanding of the paradigm from the perspective of a game dba.

Paradigm introduction:
The English name of the paradigm is normal from, which was summarized after E. F. Codd proposed the relational database model in the 1970s S. The paradigm is the theoretical basis of relational databases.

To learn about the paradigm, we should first learn several terms about the paradigm. Of course, to facilitate your understanding, we only list some commonly used terms.
For more information, see related database books.

Entity: Everything is an entity. For a Game Database, a player is an entity, and a game item is an entity.
Attribute: a feature of an object, such as the player's role name, level, and Gender. attributes can be viewed as a column in a table.
Tuples: a row in a table is a tuple.
Code: A table can uniquely identify an attribute or attribute group of a row (tuples. The code here is generally the unique index in the database table. A table can have multiple unique indexes. These unique indexes are collectively referred to as the candidate code. If one of the multiple unique indexes is selected as the primary key, this primary key is called the primary key.

With some basic terms, we can understand the paradigm.
Currently, relational databases have six paradigms: first, second, third, fourth, fifth, and sixth.
The requirement to meet the paradigm is increasing, that is, to meet the second paradigm, we must first meet the first paradigm, to meet the third paradigm, we must first meet the second paradigm, and so on...
Because the fourth paradigm, fifth paradigm, and sixth paradigm are used to eliminate redundant data, the storage space is abundant now, and proper redundant storage can improve the query efficiency, the current database design, it is generally enough to satisfy the first three paradigms.

First paradigm, non-repeated Columns
This paradigm is the basic requirement of relational databases. If it is not met, your tables are not relational databases.
Requirement: each column in the database is atomic and cannot be split.

If a column becomes a binary tree, it is not a relational database.

Second paradigm, attributes
Requirements: All attributes are completely dependent on the master code, eliminating the need for non-candidate codes to adapt to some functions of the master code.
Easy to understand: All attributes are completely dependent on the primary key, eliminating the resistance of non-primary attributes to some primary key functions.
It can be illustrated through a table on the relationship between players and the family.

CREATE TABLE `family` (  `userid` varchar(34) NOT NULL DEFAULT '',  `familyid` int(11) NOT NULL DEFAULT '0',  `familyname` varchar(32) DEFAULT NULL,  PRIMARY KEY (`userid`,`familyid`)) ENGINE=InnoDB DEFAULT CHARSET=utf8

Userid: Player id
Familyid family id
Familyname Family Name
Currently, familyname must be based on the combination of familyid and the primary key of the pair (userid, familyid.
What are the problems with such a table?

1. The system creates a family. The userid value is null. Can it be inserted? No. The insert operation is abnormal.
Some people say: let userid default null. Try it. Another person said: Set the userid used by the system. Of course, this is not what we discuss.
2. When a family has many members, the name of the family stores data redundancy many times.
3. When many members of a family join, do you need to modify the family name? How many records will be modified? Update exception
4. When everyone leaves the family, do they want to keep the family? Yes? No. The deletion is abnormal.

How can this problem be solved?
In general, you can split the table and change the table above to the two tables below. This solves the problem.

CREATE TABLE `family3` (  `userid` varchar(34) NOT NULL,  `familyid` int(11) DEFAULT NULL,  PRIMARY KEY (`userid`)) ENGINE=InnoDB DEFAULT CHARSET=utf8CREATE TABLE `family4` (  `familyid` int(11) NOT NULL DEFAULT '0',  `familyname` varchar(32) DEFAULT NULL,  PRIMARY KEY (`familyid`)) ENGINE=InnoDB DEFAULT CHARSET=utf8

Third paradigm, attributes
Requirement: it is not dependent on other non-group attributes, and the transfer is also eliminated.
We also use the table we just used to change it a bit and change the primary key of the combination to userid as the primary key.

CREATE TABLE `family` (  `userid` varchar(34) NOT NULL DEFAULT '',  `familyid` int(11) DEFAULT NULL,  `familyname` varchar(32) DEFAULT NULL,  PRIMARY KEY (`userid`)) ENGINE=InnoDB DEFAULT CHARSET=utf8

At present, familyname should only be based on familyid, and the user ID primary key must be passed in an affinity relationship.
The problems that may occur are the same as those in paradigm 2. The solution is the same as that of paradigm 2.

Source: http://isadba.com /? P = 261

2nf means what is the father's surname and what is the son's surname.
3nf means that Sun Tzu's surname is the same as his father's. Although Sun Tzu's surname is also the same as that of his grandfather, however, this is because the father and grandfather have the same surname (2nf), but they do not need to say that the grandson's surname should be the same as that of grandpa. If someone says that the grandson's surname should be the same as that of Grandpa, this is equivalent to a 3nf violation, because this has been identified in 2nf.

Source: http://www.itpub.net/thread-1493555-3-1.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Database paradigm: getting your head off

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Database paradigm: getting your head off

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support