When our application reaches a certain scale, our data will skyrocket. The original database may not meet our read and write requirements. But the growth of data is a business-wide delight, but the bigger headache is how to achieve effective expansion. How to do it without error.
The simplest is the sub-database table.
So how does the database know which machine to read the data on?
Like I had a 10w record before. Now consider placing the top 10w in a and then adding 10 database servers. Theoretically the capacity can be up to 100w
So how to block the database changes, the upper application transparent? This is what we should consider.
If you do it transparently to the upper system. So it's over?
Because each database is different data, then one of the machines is dropped, then the data will be lost. Perhaps you will think of redundancy, using slave to achieve + heartbeat detection. So how long will it take to recover the database? With the current data volume, assuming that the data volume for each storage node service is 1TB and the internal transmission bandwidth is limited to 20mb/s, the time required to increase the copy data is 1tb/20mb/s = 50000s, about more than 10 hours, Because of the high probability that the storage node fails again in the process of copying the data, it is difficult to automate such an architecture and is not suitable for large-scale distributed storage systems.
So how is a large system implemented?
Before we do this, let's take a look at the solution provided by Mogodb-sharding
About database expansion and shrink capacity