I have been involved in the maintenance of a public opinion monitoring system, the system every day continuously monitor the news on the Internet, continue to download news from the Internet to save into the database.
Ask questions
For the sake of simplicity, I have deliberately modeled a similar table:
CREATE TABLE NEWS ( Id int PK, Title nvarchar) --News headlines Content text --News contents
Datetime
)
As time went on, the news in the database became more and more, and the system began to run more and more slowly. Then, the technology manager took into account that the public opinion monitoring needed only recent data, outdated data, not very important. As a result, a new table is created, which is named as a Time field:
CREATE TABLE NEWS2009 ( Id int PK, Title nvarchar) --News headlines Content text --News contents
Createtime DateTime
)
1. Continuously generated new tables
According to my estimation, this database will have
news2010,news2011,news2012 .....
You need to select a table when adding:
Because the data to be split into different tables, if the system is not like my public opinion monitoring system, there are different times of data entry, then according to different data, choose different data to add it becomes your responsibility.
INSERT into NEWS2010 (data for 2010)
Special point of Time:
If New Year's day comes, you are still enjoying a happy holiday, suddenly the customer calls to ask why the system is not monitoring, shit,2013.1.1 you forgot to build a new table. Cause a lot of data to be missed, you are inevitably more than the boss beat.
2. Manage Data integrity
To join a day, you run:
SELECT * from NEWS2009 WHERE not between ' 2009-01-01 ' and ' 2009-12-31 '
It turns out there's a return, so you're in trouble again. to avoid this, you have to add constraints to each news sheet, not data within a specified time, and do not allow the addition of a prescribed table.
3. Synchronizing Data
Suddenly, the product manager asked to change the createtime of several data from the time of entering the library to the release time of the news website. For example there was a point when it was 2010-01-02, but the actual release time was slightly early 2009-12-31.
You are confidently typing with confidence:
UPDATE NEWS2010 SET = ' 2009-12-31 ' WHERE = 12345
At run time, suddenly think, this data becomes an invalid data, it should be deposited in NEWS2009 this table. Update is useless, you must add it to the NEWS2010 and then delete this piece of data.
4. To ensure uniqueness
The split data, whose primary key is unique in the table of all years, if you want to move a piece of data from the NEWS2010 table to the NEWS2009 table, you have to make sure that the primary key does not conflict. You have to implement the primary key policy yourself.
5. Cross-Table Query
If the boss wants a data report for February 14 every year. You have to be desperate to
SELECT * from NEWS2009 UNION SELECT * from NEWS2010 UNION SELECT * from NEWS2011 ...
6. Synchronizing meta-data
If the product manager suddenly asks for all the news to add a field, then you have to alter multiple tables.
7. Manage referential integrity
If the product manager asked for comments to get back together, the news table NewSID not be comments Comment Table reference, then this foreign key has to be a key number of tables.
Solutions-Partitioned Tables
Based on the above analysis, it has been obvious that such a design is not a good design.
For this time effect is relatively strong, the data is more than the table, we should be the group demon, partition table.
The implementation of the partitioned table has been written before http://www.cnblogs.com/kissdodog/p/3156758.html
Logical database Design-metadata splitting (partition table)