This is a technical issue that explores how to slice data between multiple MySQL servers. As early as 2012, we completed this sharding scheme, and we are still using this package to store core data.
Before we discuss how to slice data, we may want to get in touch with the data first. The lights in the cabin, the chocolate on the strawberry, the Star Trek begins.
Pinterest is a discovery engine that looks for all you are interested in (introductory basic tutorial qkxue.net). From the data point of view, Pinterest is the world's largest human-built interest map. There are more than 50 billion pins that are saved to the 1 billion plate by pin friends, the user pins again, like other people's pin (simple shallow copy), follow other pin friends, tiles and interests, and then view all the information, artboards, and interests of the pin friends you subscribe to on the homepage. Great, let's expand now.
The pain of growing up
In 2011, our company got everyone's approval. In some evaluations, we have developed faster than any previous startup (mobile app development ty300.com). In about September 2011, our infrastructure exceeded the load. Some of the NoSQL technologies have disastrous consequences, while a large number of MySQL slave servers are used to read, generating a lot of annoying bugs, especially caches. We have re-architected the entire storage model. It is gratifying that the new architecture is still more effective and basically satisfies our requirements.
Business requirements
The overall system is very stable, easy to operate, easy to expand. We want the database to be able to start with small storage and grow with the business
Pin Friend generated content must be permanently accessible
N-Pins (supported) are requested in the plate in the order in which they are determined (as in reverse of creation time, or according to user-specific sort) (shown). For the pin friends you like, the pin list, etc., must also be in a specific order.
For simplicity, the update generally guarantees the best performance, and in order to obtain eventual consistency, additional things, such as distributed transaction logs, are required. It's a fun (but not too easy) thing!
Design principles and Notes
Since the data we want is across multiple databases, we cannot use the database join, foreign key, or the index of all the data, but they can be used for subqueries that do not span the database.
We also need to support load balancing our data. We hate moving data back and forth, especially moving things by item, because it's easy to make mistakes, and it's easy to complicate the system unnecessarily. If we have to move the data, it is best to move an entire virtual node to the physical node.
To achieve rapid prototyping, we need a simple solution, and on our distributed data platform, the nodes are very stable.
All data needs to be backed up to be highly available from the node and transferred to S3 for MapReduce. In production, we only interact with the primary node. In production, you cannot read/write from a node. From a node that is lagging, it can cause strange bugs. If you share data, it's generally not advantageous to interact with the nodes in production.
In the end, we need a good way to generate a uniform and unique ID (UUID) assigned to all our objects.
No matter how we build our systems, we need to meet our business requirements and ensure a stable system with high performance and ease of maintenance. In other words, we need our system to be not bad, so we choose the mature technology, MySQL as the foundation of our building system. We intentionally avoid using new technologies with auto-scaling capabilities, such as Mongodb,cassandra and membase, because they are not mature enough (and they will crash in unpredictable ways).
Whisper: I still recommend that startups avoid fancy new things-try to use MySQL that works perfectly. Believe me, I have a lot of wrong practices (wounds) to prove this.
Pinterest Data segmentation: Quickly expand MySQL data volumes with doubts