Yupoo! Youpai is currently the largest image service provider in China. The entire website is built on a large number of open-source software. The following are information about the open source software used by Alibaba Cloud:
Operating systems: CentOS, MacOSX, and Ubuntu
Servers: Apache, Nginx, Squid
Databases: MySQLmochiweb and MySQLdb
Server Monitoring: Cacti, Nagios,
Development languages: PHP, Python, Erlang, Java, and Lua
Distributed Computing: Hadoop, Mogilefs,
Log Analysis: AWStats
Task Management: Redmine
Message System: RabbitMQ, php-amqp
Front-end framework: Mootools
Cache system: Memcached, php-memcached, libmemcached, pylibmc, XCache, RedisRiak, Predis
Image Processing: GraphicsMagick and gmagick
FTP tool: vsftpd
Development tools: VIM and Readline
Debugging tools: Firebug and Xdebug
Version control: Mercurial
Search Service: Solr
Email service: Postfix
Network programming: Twisted, cURL, libevent, Net-SNMP, NTP
Availability test: ibrowse
Cluster System: Heartbeat
Parallel development: gevent
SLB: IPVS
Python framework: bottle
Virtual Channel: OpenVPN
(Source: http://www.yupoo.com/info/about ).
I. Overall Yupoo architecture
2. Programming language selection
Yupoo's server development language is mainly PHP and Python, where PHP is used to write Web logic (directly deal with users through HTTP), while Python is mainly used to develop internal services and background tasks. On the client side, a large number of Javascript based on the MooTools framework are used. In addition, Yupoo separates the image processing process from the PHP process into a service. This service is based on nginx and serves as a module of nginx to open REST APIs.
III. Server selection
The reason why Squid is selected is that "no cache system with higher efficiency than Squid has been found yet. The original hit rate is indeed very poor, and then the layer Lighttpd was installed before Squid, which is based on url for hash, the same image always goes to the same squid, so the hit rate is greatly increased."
Yupoo also developed YPWS/YPFS using Python:
YPWS-Yupoo Web Server is a small Web Server developed using Python. It provides basic Web services and adds logical judgment on user, image, and external link website display, it can be installed on any server with idle resources to facilitate horizontal scaling in case of performance bottlenecks.
YPFS-Yupoo File System is similar to YPWS. It is also based on the image upload server developed on this Web server.
Some netizens commented on the efficiency of Python. Yupoo boss Liu Pingyang was at del. icio. "YPWS is written in Python. Each machine can process 294 requests per second. The pressure is almost below 10%"
IV. Yupoo message system
Due to the single-thread model of PHP, Yupoo separates long-time operations from I/O operations from the HTTP request cycle and submits them to the task process implemented by Python, to ensure the request response speed. These tasks mainly include email sending, data indexing, data aggregation, and dynamic pushing of friends. PHP triggers task execution through message queue (Yupoo uses RabbitMQ. The main features of these tasks are:
Triggered by the user or periodically
Time-consuming
The
The entire task system consists of message distribution, process management, and working processes.
V. Database design
The database has always been the most challenging in the website architecture, and the bottleneck usually appears here. Youpai has a large volume of photos, and the database has encountered severe pressure problems. Like many MySQL 2.0 sites, the MySQL Cluster of upyun has gone through a development process from a master database to multiple slave databases, and then to multiple master databases to multiple slave databases.
It was originally composed of a master database and a slave database. At that time, the slave database was used for backup and disaster tolerance only. When the master database fails, the slave database is manually changed to the master database. Generally, the slave database does not perform read/write operations (except synchronization ). With the increasing pressure and the addition of memcached, only one row of data is cached. However, the cache of a single row of data does not solve the pressure problem well, because the query of a single row of data is usually very fast. Therefore, some queries with low real-time requirements are placed in the slave database for execution. The query pressure will be diverted by adding multiple slave databases. However, as the data volume increases, the write pressure on the master database will also increase. After referring to some related products and other website practices, we split the database. That is, data is stored on different database servers.
How to split databases?
Vertical Split: refers to splitting by function module. For example, you can store group-related tables and photo-related tables in different databases. In this way, the table structures of multiple databases are different.
Horizontal split: the data in the same table is stored in different databases in different parts. The table structures in these databases are identical.
Generally, vertical Sharding is performed first, because this method is easy to implement and you can access different databases based on the table name. However, the vertical split method does not completely solve all pressure problems. In addition, it also depends on whether the application type is suitable for this split method. If appropriate, it can also play a good role in dispersing the database pressure. For example, I prefer vertical splitting for Douban, because the core business/modules (books, movies, and music) of Douban are relatively independent, and the data increase speed is also relatively stable. The difference is that youpai's core business object is user-uploaded photos, and the acceleration of photo data increases as the number of users increases. The pressure is basically on the photo Table. Obviously vertical split does not fundamentally solve our problem. Therefore, Yupoo adopts the horizontal split method.
The implementation of horizontal splitting is relatively complex. First, we need to determine a splitting rule, that is, to split the data according to what conditions. Generally, 2.0 of websites are user-centered, and data is basically followed by users, such as users' photos, friends, and comments. Therefore, a natural choice is to split data based on users. Each user corresponds to a data database. When accessing data of a user, you must first determine the database corresponding to the user and then connect to the database for actual data reading and writing. So how does one correspond to users and databases? Yupoo has these options:
1. Matching by algorithm
The simplest algorithm is based on the parity of the user ID. Users with odd IDs correspond to database A, while users with even IDS correspond to database B. The biggest problem with this method is that it can only be divided into two databases. Another algorithm is the inter-region correspondence based on the user ID. For example, the user with the ID between 0 and corresponds to database A, and the user with the ID in the 10000-20000 range corresponds to database B, and so on. It is convenient and efficient to implement by algorithm, but it cannot meet the subsequent scalability requirements. If you need to add database nodes, you must adjust the algorithm or move a large dataset, it is difficult to expand database nodes without stopping services.
2. Index/ ing table
This method creates an index table to store the correspondence between each user's ID and the database ID. Each time you read and write user data, the corresponding database is obtained from the table. After a new user is registered, randomly select one of all available databases to create an index for it. This method is flexible and scalable. One disadvantage is that a database access is added, so the performance is not properly matched by algorithms.
After comparison, Yupoo adopts the index table method, and we are willing to lose some performance for its flexibility, not to mention memcached, because the index data will not change, the cache hit rate is very high. Therefore, the performance loss is greatly reduced.
You can easily add database nodes by indexing tables. When adding nodes, you only need to add them to the list of available databases. Of course, if you need to balance the pressure on each node, you still need to migrate data, but at this time the migration is a small amount, you can proceed step by step. To migrate User A's data, first set its status to migration data. Users in this status cannot perform write operations and prompt on the page. Copy all the data of user A to the newly added node, update the ing table, set the status of user A to normal, and delete the data in the original database. This process is usually performed in the morning. Therefore, few users encounter data migration issues. Of course, some data does not belong to a user, such as system messages and configurations, and stores the data in a global database.
How can we solve the problems caused by Database Sharding?
Database Sharding can cause a lot of trouble in application development and deployment.
1. Cross-database Association queries cannot be executed
If the data to be queried is distributed in different databases, it cannot be obtained through JOIN query. For example, to get the latest photos of friends, you cannot ensure that all the data of friends is in the same database. One solution is to query multiple times and then aggregate them. So try to avoid similar requirements. Some requirements can be addressed by saving multiple copies of data, such as the User-A and User-B databases are DB-1 and DB-2, respectively, when User-A commented on A photo of User-B, we saved this comment in both the DB-1 and the DB-2. We first inserted A new record in the photo_comments table in the DB-2, insert a new record in the user_comments table in the DB-1. The structure of the two tables is shown in the following figure. In this way, we can query the photo_comments table to get all comments of A User-B photo, or query the user_comments table to get all comments of User-. In addition, full-text retrieval tools can be used to solve certain requirements. Solr can be used to provide full-site tag retrieval and photo search services.
2. Data consistency/integrity cannot be guaranteed
There is no foreign key constraint on cross-database data and there is no transaction guarantee. For example, in the example of the comment photo above, it is likely that the photo_comments table is successfully inserted, but an error occurs when the user_comments table is inserted. One way is to enable transactions on both databases, insert photo_comments first, insert user_comments, and commit two transactions. This method cannot completely guarantee the atomicity of this operation.
3. All queries must provide database clues
For example, if you want to view a photo, it is not enough to use only one photo ID. You must also provide the user ID (that is, the database clue) for uploading this photo to find its actual storage location. Therefore, many URL addresses must be re-designed, while some old addresses must still be valid. Yupoo changed the photo address to/photos/{username}/{photo_id}/, and added a ing table for the photo ID uploaded before the system upgrade, save the relationship between photo_id and user_id. When you access the old photo address, query the table to obtain the user information and redirect it to the new address.
4. Duplicate auto-increment IDs
If you want to use the auto-increment field on the node database, we cannot guarantee that it is globally unique. This is not a serious problem, but when the data between nodes is related, it will make the problem more troublesome. Let's take a look at the example of the comment mentioned above. If the auto-increment field of comment_id in the photo_comments table is. when A new comment is inserted in the photo_comments table, A new comment_id is obtained. If the value is 101, and the ID of User-A is 1, so we also need to insert (1,101…) in the DB-1.user_comments table ...). User-A is A very active User and he commented on the User-C photo, while the User-C database is A DB-3. Coincidentally, the ID of this new comment is also 101, which is very likely to happen. Then we insert a row in the DB-1.user_comments table like this (1,101 ...) . So how do we set the primary key of the user_comments table (identify a row of data )? You can not set it. Unfortunately, sometimes (Framework, cache, and other reasons) must be set. You can use user_id, comment_id, and photo_id as the primary key combination, but photo_id may be the same (coincidentally ). It seems that only photo_owner_id can be added, but this result makes us a little unacceptable. A complex combination of keys will have a certain performance impact during writing, such a natural key looks unnatural. Therefore, Yupoo abandoned using auto-incrementing fields on nodes and tried to make these IDs globally unique. Therefore, a database dedicated to generating IDs is added. The table structure in this database is very simple and there is only one auto-increment Field ID. When we want to insert a new comment, we first insert an empty record in the photo_comments table of the ID library to obtain a unique comment ID. Of course, these logics have been encapsulated in our framework and are transparent to developers. Why not use other solutions, such as Key-Value databases that support incr operations. Yupoo is more assured to put data in MySQL. In addition, Yupoo regularly clears the data of the ID library to ensure the efficiency of acquiring new IDs.
Database optimization implementation
One database Node is Shard, and one Shard is composed of two physical servers. It can be understood as Node-A and Node-B, node-A and Node-B are configured as Master-Master nodes for mutual replication. Although it is a Master-Master deployment method, only one of them is used at the same time, because of the replication delay, of course, in Web applications, you can place A or B in A user session to ensure that the same user accesses only one database at A time, which can avoid latency issues. However, the Python task is not in any state and cannot read or write the same database as the PHP application. So why cannot I set it to Master-Slave? Yupoo thinks it is a waste to use only one logical database. Therefore, multiple logical databases are created on each server. As shown in the following figure, we have created two logical databases shard_001 and shard_002 on Node-A and Node-B. shard_001 on Node-A and shard_001 on Node-B form A Shard, at the same time, only one logical database is Active. In this case, if we need to access the Shard-001 data, we connect shard_001 on Node-A, while the data accessing Shard-002 is connected to shard_002 on Node-B. In this way, the pressure is distributed to each physical server. Another advantage of the Master-Master deployment is that you can upgrade the table structure without stopping the service. Before the upgrade, stop the replication, upgrade the Inactive database, and then upgrade the application, switch the database that has been upgraded to the Active state, the original Active database to the Inactive state, then upgrade its table structure, and finally restore the replication. Of course, this step is not necessarily suitable for all upgrade processes. If the table structure changes will cause data replication failure, you still need to stop the service and then upgrade it.
When adding servers, we need to migrate some data to the new server to ensure load balancing. To avoid the need for short-term migration, eight logical databases are deployed on each machine during actual deployment. After the server is added, you only need to migrate these logical databases to the new server. It is best to add a server that doubles each time, and then migrate 1/2 of the logical data of each server to a new server, so as to balance the load well. Of course, when there is only one logical database on each stage, migration cannot be avoided, but it should be a long time away.
Yupoo encapsulates the Database Sharding logic in our PHP Framework. Developers basically do not need to be troubled by these cumbersome tasks. The following are some examples of reading and writing photo data using a framework:
Array ('type' => 'long', 'primary' => true, 'global _ auto_increment '=> true ),
'User _ id' => array ('type' => 'long '),
'Title' => array ('type' => 'string '),
'Posted _ date' => array ('type' => 'date '),
));
$ Photo = $ Photos-> new_object (array ('User _ id' => 1, 'title' => 'workform '));
$ Photo-> insert ();
// Load the photo with ID 10001. Note that the first parameter is the user ID.
$ Photo = $ Photos-> load (1, 10001 );
// Modify photo attributes
$ Photo-> title = 'database sharding ';
$ Photo-> update ();
// Delete the photo
$ Photo-> delete ();
// Obtain the photos uploaded by users with ID 1 after
$ Photos = $ Photos-> fetch (array ('User _ id' => 1, 'posted _ date _ gt '=> '2017-06-01 '));
?>
First, define a ShardedDBTable object. All APIs are open through this object. The first parameter is the object type name. If the name already exists, the previously defined object will be returned. You can also use the get_table ('photos') function to obtain the previously defined Table object. The second parameter is the corresponding database table name, and the third parameter is the database clue field. You will find that you need to specify the value of this field in all the subsequent APIs. The fourth parameter is the field definition. The global_auto_increment attribute of the photo_id field is set to true, which is the global auto-increment ID mentioned above. If this attribute is specified, the framework will handle the ID.
To access data in the global database, we need to define a DBTable object.
Array ('type' => 'long', 'primary' => true, 'auto _ secret' => true ),
'Username' => array ('type' => 'string '),
));
?>
DBTable is the parent class of ShardedDBTable. Apart from defining different parameters (DBTable does not need to specify database clue fields), DBTable provides the same APIs.
VI. Cache solution selection
The framework used by Yupoo comes with the cache function, which is transparent to developers.
Load (1, 10001 );
?>
For example, in the above method call, the framework first tries to search in the cache with Photos-1-10001 as the Key. If not found, it then executes the database query and puts it into the cache. When you change a photo attribute or delete a photo, the framework deletes the photo from the cache. The cache implementation of a single object is relatively simple. A little troublesome is the cache of list query results like below.
Fetch (array ('User _ id' => 1, 'posted _ date _ gt '=> '2017-06-01 '));
?>
Yupoo divides the query into two steps. The first step is to find the photo ID that meets the condition, and then find the specific photo information based on the photo ID. This can make better use of the cache. The cache Key for the first query is Photos-list-{shard_key}-{md5 (query condition SQL statement)}, and the Value is the photo ID list (separated by commas ). Shard_key is the value of user_id 1. Currently, list caching is not troublesome. However, if you modify the upload time of a photo, the data in the cache may not meet the conditions. Therefore, we need a mechanism to ensure that we do not get expired list data from the cache. We set a revision for each table. When the data in this table changes (the insert/update/delete method is called), we update its revision, so we changed the cache Key of the list to Photos-list-{shard_key}-{md5 (query condition SQL syntax)}-{revision }, in this way, we will no longer get the expiration list.
The revision information is stored in the cache, and the Key is Photos-revision. This looks good, but it seems that the utilization of slow list storage is not too high. Because we use the revision of the entire data type as the suffix of the cache Key, it is clear that this revision is updated very frequently. Any user who modifies or uploads a photo will update it, even if the user is not in the Shard we want to query. To isolate the impact of user actions on other users, we can narrow down the role scope of revision to achieve this goal. Therefore, the cache Key of revision is changed to Photos-{shard_key}-revision. In this way, when a user with ID 1 modifies his/her photo information, only the revision corresponding to the Key Photos-1-revision will be updated.
Because the global database does not have shard_key, modifying a row of table data in the global database still results in invalid cache of the entire table. However, in most cases, data is regional. For example, a topic post for a forum is a topic. You have modified a post on one of the topics, and there is no need to invalidate the cache of all the topics. Therefore, an attribute named isolate_key is added to DBTable.
Array ('type' => 'long', 'primary' => true ),
'Post _ id' => array ('type' => 'long', 'primary '=> true, 'auto _ secret' => true ),
'Author _ id' => array ('type' => 'long '),
'Content' => array ('type' => 'string '),
'Posted _ at' => array ('type' => 'datetime '),
'Modified _ at' => array ('type' => 'datetime '),
'Modified _ by' => array ('type' => 'long '),
), 'Topic _ id ');
?>
Note that the last parameter topic_id of the constructor is to use the field topic_id as the isolate_key, which is used to isolate the range of the revision as shard_key.
ShardedDBTable inherits from DBTable, so you can also specify isolate_key. ShardedDBTable specifies isolate_key, which can greatly narrow the scope of revision. For example, when a user adds a new photo to one of his albums in the associated table yp_album_photos of the album and photo, the cache of the photo lists of other albums will also become invalid. If the isolate_key of this table is specified as album_id, we will limit this impact to this album.
The cache is divided into two levels. The first level is only a PHP array, and the effective range is Request. The second level is memcached. The reason for this is that a lot of data needs to be loaded multiple times in a Request cycle, which can reduce the network requests of memcached. In addition, Yupoo's framework will try its best to send memcached's gets command to obtain data, thus reducing network requests.