by Joe Celko,2013/09/18 (first published: 2010/05/25)
The series
This article is part of the stair series: stair-to-database design
New tasks for designing and creating databases? Joe Celko is one of the most widely read writers on SQL, explaining these basics. as usual, he occasionally surprises the most seasoned database professionals. Joe became the recipient of the DBMS Magazine Reader's Choice Award for four consecutive years. He has taught SQL in the United States, the United Kingdom, the Nordic countries, South America and Africa. He has served in the Ansi/iso SQL Standards Committee for 10 years and has contributed to the SQL-89 and SQL-92 standards.
There are several types of tables, each with special requirements for rules and integrity constraints. regardless of demand, table-level constraints ensure that rules are enforced and data integrity is maintained.
There are several types of tables, each with special requirements for rules and integrity constraints. at the first level, we name the data elements and classify them. in the second level, we modeled the data elements with data types and constraints in SQL to provide the rows. at level 3rd, we'll put these lines in the table. A table is more than a set of rows that are grouped together by one name.
Columns can only appear once in a table. only make sense; If you record someone else's shoe size two times, the best case is superfluous, and in the worst case, when the column is inconsistent, it becomes ambiguous. Now we can have a table-level check constraint in each row of the column. They are not much different from our previous list of check constraints. they can be named and appear in the list of column declarations of the CREATE TABLE statement, instead of being attached to any row. For example:
CONSTRAINT valid_employee_age--don ' t hire people before they is born
CHECK (Emp_birth_date < emp_hire_date)
It is usually a good idea not to combine constraints into a large check () clause. The error message will contain the constraint name, so a separate constraint will give you a better understanding of the error than a single monster named "Bad_things_" constraint.
To continue our aversion to redundancy, at the table level, we want each row to be unique, for the same reason. This can be done through table constraints. Two table-level constraints are unique and primary keys that appear in both single and multiple column versions.
A unique constraint means that the combination of columns or columns is unique in the table. But if there is a null value in one or more columns, we will allow it to be a unique value. the primary key declaration has the same effect as a non-null value, and is unique for all of its columns. But for historical reasons, a table can have only one primary key declaration. These columns are used as default values for other constraints between tables, but don't worry about them now.
How the uniqueness constraint is used depends on the type of table involved. Generally speaking, we can divide a table into three kinds:
1. Entities
2. Relationship
3. Auxiliary
An entity table is a collection of the same classes of things defined by the properties that are modeled by the columns. Each line is an example of this sort of thing. each row has the same column. If you can see how it feels, see or touch it, then it is an entity. the name of the entity table should not be single (unless there is only one member in the collection) because it models a collection. This name needs to be a complex number, or, if possible, a collection. For example, "employee" is bad, "employee" is better, "employee" is the best. " Tree" is not good, "tree" is Better, "forest" is the best. You can add your own examples.
Entities are also divided into weak or strong. a strong entity exists at its own value, and a weak entity exists because of one or more powerful entities. you need to buy one first and then discount.
A relational table refers to one or more entities and establishes relationships between them. a relationship can have its own properties, in addition to references to entities. The marriage certificate number belongs to the marital, not the husband, wife or priest.
The degree of the relationship is the number of entities in the relationship. The two-dollar relationship has two entities, and we like them in the real world because they are simple. a recursive two-tuple relationship links entities to itself. The general n-ary relationship involves n entities, such as mortgages with buyers, sellers, and lenders. It is not always possible to decompose n-ary relationships into two-dollar relationships. membership in a relationship can be optional, or it can be mandatory. Optional membership means we can have a 0 entity-a purchase that doesn't always get a discount.
The cardinality of a relationship is the actual number of actual related events for two entities. the basic types of relationships are: a pair of one or one-to-many and many-to-many. These terms are usually eligible for optional (0 or more) or mandatory (1 or more) memberships.
Single-to-one (1:1) relationship refers to an instance of entity a associated with an instance of entity B, for example, in relation to a traditional couple. every husband has only one wife; every wife has only one husband. in this case, both are mandatory.
A one-to-many (1:n) relationship is an instance of entity a with one or more instances of entity B, but for an instance of entity B, there is only one instance of entity a. An example might be a department with many employees; Each employee is assigned to a department. depending on your business rules, you may allow unassigned employees or empty departments.
A many-to-many (m:n) relationship, sometimes referred to as a neutral relationship, refers to an instance of entity A, which is one or more instances of entity B, and an instance of entity B is an instance of 零、一个 or multiple entities. One example might be pizza and customers.
The auxiliary table is neither an entity nor a relationship; it provides information. they are similar to calendars or other lookup tables, with SQL instead of calculations. They are often misunderstood and considered entities or relational tables.
Let's be more specific. a sales order is a relationship between a customer (entity) and our Inventory (entity). Order details are present in a weak entity because we have an order. relationship has an order number, which is not part of the inventory or customer. freight is obtained from a secondary table. Here are some of the framework tables for this example. I am using gtin (Global trade Item number) to order projects and DUNS (Data Universal numbering system) for customers. when designing a database, be sure to pay attention to industry standards.
CREATE TABLE Sales_orders
(Order_nbr INTEGER not NULL PRIMARY KEY
CHECK (Order_nbr > 0),
Customer_duns CHAR (9) Not NULL,
Order_shipping_amt DECIMAL (5,2) not NULL
CHECK (Shipping_amt >= 0.00),
etc);
CREATE TABLE Sales_order_details
(Order_nbr INTEGER not NULL,
Gtin CHAR (a) is not NULL,
PRIMARY KEY (ORDER_NBR, Gtin),
Item_qty INTEGER not NULL
CHECK (Item_qty > 0),
Item_unit_price DECIMAL (8,2) not NULL
CHECK (Item_unit_price >=0.00));
CREATE TABLE Customers
(Customer_duns CHAR (9) not NULL PRIMARY KEY
CHECK (Customer_duns like ' [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] '),
etc);
CREATE TABLE Inventory
(Gtin CHAR () not NULL PRIMARY KEY
CHECK (Gtin like ' [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] '),
Onhand_qty INTEGER not NULL
CHECK (onhand_qty >= 0),
We can see that the sales order is the relationship between the customer and the inventory. orders have their own key (ORDER_NBR), but nothing can force us to use only valid customer DUNS numbers or product Gtin code to do what we actually have in stock. In fact, I can insert the apparently invalid DUNS and Gtin code into the Orders table.
This is the origin of the reference clause. It enables us to enforce all cardinality and degree data from the data model. references are not links or pointers. These are physical concepts, references are a logical concept, and we don't know how it is implemented. Its role is a rule that references a table column to match a single row in the referencing table. This means that the rows in the referencing table must be unique; by default, the primary key in the referencing table is the target, but not necessarily. the values in the reference table are called foreign keys--they are not keys in the table, but are other locations in the pattern.
This is a skeleton schema with more meat on it:
CREATE TABLE Sales_orders
(Order_nbr INTEGER not NULL PRIMARY KEY
CHECK (Order_nbr > 0),
Customer_duns CHAR (9) Not NULL
REFERENCES Customers (customer_duns),
Order_shipping_amt DECIMAL (5,2) DEFAULT 0.00 Not NULL
CHECK (Shipping_amt >= 0.00),
etc);
CREATE TABLE Sales_order_details
(Order_nbr INTEGER not NULL
REFERENCES Orders (ORDER_NBR),
Gtin CHAR (not NULL)
REFERENCES Inventory (Gtin),
PRIMARY key (ORDER_NBR, Gtin),--both column key
Item_qty INTEGER not NULL
CHECK (Item_qty > 0),
Item_unit_price DECIMAL (8,2) not NULL
CHECK (Item_unit_price >= 0.00));
CREATE TABLE Customers
(Customer_duns CHAR (9) not NULL PRIMARY KEY
CHECK (Customer_duns like ' [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] '),
etc);
CREATE TABLE Inventory
(Gtin CHAR () not NULL PRIMARY KEY
CHECK (Gtin like ' [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] '),
Onhand_qty INTEGER not NULL
CHECK (onhand_qty >= 0),
etc);
Note that we only need to set the check () constraint where DUNS and gtin are keys, not where it appears in the reference table. referencing entity tables, customers, and inventory; relational tables, orders, references to other tables. This is a general pattern, but not specific.
Multiple columns of this clause form the following:
FOREIGN KEY (ORDER_NBR, Gtin)
REFERENCES sales_order_details (ORDER_NBR, Gtin)
The columns in the FOREIGN KEY clause are in the reference table and must match the reference key for the column, but can have a different name. I can get to 1:1, 1, N, and n:m relationships by placing uniqueness constraints in the right place. as an example of an armpit table, we can calculate the shipping cost based on the total value of the order. The table can be like this:
CREATE TABLE shipping_costs
(Start_order_amt_tot DECIMAL (10,2) not NULL,
End_order_amt_tot DECIMAL (10,2) is not NULL,
CONSTRAINT Valid_shipping_range
CHECK (Start_order_amt_tot < End_order_amt_tot),
PRIMARY KEY (Start_order_amt_tot, End_order_amt_tot),
Shipping_amt DECIMAL (5,2) not NULL
CHECK (Shipping_amt > 0.00));
Although we have declared the primary key for the secondary transport cost table, it does not resemble the entity's key-there is no validation or validation, it is not an identifier. to use this table, we will query for similar problems:
SELECT Shipping_amt
From Shipping_costs
WHERE <order amount total> between Start_order_amt_tot and End_order_amt_tot;
As an exercise, try writing a constraint to prevent the start and end ranges from overlapping and having gaps. you can redesign the table if you want.
In the modified skeleton mode, when you try to execute an order for a product that is not in stock, you get an error that actually says, "It's out of stock!" you can try another way. But if you try to delete a product inventory, you'll also get an error actually saying, "Hey, someone ordered this crap," so you have to go to each order and replace the item with something else or null (if allowed) before you can delete it from inventory.
This is where you use declarative referential integrity (DRI) operations. The syntax is:
On DELETE [NO ACTION | SET DEFAULT | SET NULL | CASCADE]
On UPDATE [NO ACTION | SET DEFAULT | SET NULL | CASCADE]
Delete and update are referred to as "data base events"; DRI operations occur when they occur on a table.
1. Do not take action = ROLLBACK TRANSACTION, you will get a message. This is the default value when you have only one simple reference clause.
2.SET default = The column referenced is changed by the event, but the reference column is changed to the default value. It is clear that reference columns need to declare default values on them. These default values must be in the reference table.
3.SET NULL = The referenced column was changed by the event, but the reference column was changed to nulls. Obviously, the reference column needs to be empty. This is the reason for the "dubious benefits".
4. Cascade = The referenced column is changed by the event, and the same value is cascaded to the reference column. This is the most important choice in practice. For example, if we want to stop a product, we can remove it from inventory, and in delete cascade, the SQL engine automatically deletes the matching row in Sales_order_details. Similarly, if you update an item in inventory, update cascading replaces the old value with the new value anywhere in the reference.
Referential integrity constraints are still valid after you perform these operations. This is the final skeleton:
CREATE TABLE Sales_orders
(Order_nbr INTEGER not NULL PRIMARY KEY
CHECK (Order_nbr > 0),
Customer_duns CHAR (9) Not NULL
REFERENCES Customers (customer_duns)
On UPDATE CASCADE
On DELETE CASCADE,
Order_shipping_amt DECIMAL (5,2) DEFAULT 0.00 Not NULL
CHECK (Shipping_amt >= 0.00),
etc);
CREATE TABLE Sales_order_details
(Order_nbr INTEGER not NULL
REFERENCES Orders (ORDER_NBR)
On UPDATE CASCADE
On DELETE CASCADE,
Gtin CHAR (not NULL)
REFERENCES Inventory (Gtin)
On UPDATE CASCADE
On DELETE CASCADE,
PRIMARY key (ORDER_NBR, Gtin),--both column key
Item_qty INTEGER not NULL
CHECK (Item_qty > 0),
Item_unit_price DECIMAL (8,2) not NULL
CHECK (Item_unit_price >= 0.00));
See if you can figure out what's going on:
1. A client died, and we removed him.
2. We replaced the dwarf statue on the lawn with a more tasteful flamingo.
3. We stopped the pink flamingo.
4.1 to 3 steps, someone wants a lawn dwarf.
Obviously, I don't consider restocking problems and other things, but we will get these.
Database design Ladder at level 3rd: building Tables