Ways to improve Mysql performance with FriendFeed _mysql

Source: Internet
Author: User
Tags couchdb in python

Background

We use MySQL to store all the data for FriendFeed. The database has grown a lot as the user base grows. There are now more than 250 million records stored with a bunch of other data covering from comments and "like" to the buddy list.

As the data grows, we have also iteratively addressed the extensibility problems that have been brought about by such rapid growth. Our attempts are representative, for example, by using read-only MySQL to increase read throughput from nodes and Memcache, and to fragment the database to improve write throughput. However, as the business grows, adding new functionality is more difficult than expanding existing functionality to cater for more traffic.


In particular, adding an index to a database that makes changes to schemas or records for more than 100.02 billion rows locks the database for several hours. Deleting old indexes also takes up so much time, but not removing them can affect performance because the database continues to read and write these useless chunks on each insert and squeezes the important chunks out of memory. To avoid these problems requires complex measures, such as setting a new index from a node and then swapping it from a node to a primary node, but these measures raise errors and are difficult to implement, blocking new features that require changes to the schema/index. Because of the serious fragmentation of the database, MySQL's relational characteristics (such as join) are useless to us, so we decided to break out of the RDBMS.


Although there are already many projects for solving the problem of flexible schema data storage and Run-time building indexes (such as CouchDB). But in the big site but not enough widely used to convince them to use. In the tests we see and run, the projects are either unstable or lack sufficient testing (see this somewhat outdated article about CouchDB). MySQL is good, it does not damage data, replication is no problem, we have understood its limitations. We like to use MySQL for storage, only for the non relational type of storage.

After thinking about it, we decided to adopt a modeless storage system on MySQL instead of using a completely untouched storage system. This article attempts to describe the advanced details of this system. We are curious about how other large sites are dealing with these problems, and we also hope that some of the designs we have done will help other developers.

Review

What we store in the database is a modeless set of attributes (such as a JSON object or a Python dictionary). The stored record requires only a 16-byte UUID attribute named ID. Other parts of the entity are not visible to the database. We can simply deposit new attributes to change the schema (it can be simply understood that there are only two fields in the datasheet: Id,data; Where data stores the set of attributes for an entity).

We retrieve data by storing indexes in different tables. If you want to retrieve three attributes from each entity, we need three data tables-each table is used to retrieve a particular attribute. If you do not want to use an index again, we will stop the write of the index corresponding table in code and optionally delete the table. If you want to add a new index, simply create a new MySQL table for the index and start a process to asynchronously add index data to the table (without affecting the running service).

In the end, although our data tables have increased, adding and deleting indexes has become simpler. We have made great efforts to improve the process of adding indexed data, which we call "cleaners", so that it does not affect the site while adding indexes quickly. We can save and index the new attributes in a single day, and we don't need to swap the MySQL database with the master or any other scary operation.

Details

MySQL uses tables to save our entities, a table like this:

CREATE TABLE entities (
  added_id INT not NULL auto_increment PRIMARY KEY,
  ID BINARY (=) not NULL,
  updated time STAMP not NULL, body
  mediumblob,
  UNIQUE key (ID),
  key (updated)
Engine=innodb;

added_id fields are used because the InnoDB stores data in the physical primary key order, and since the growth primary key ensures that the new instance is written sequentially on disk to the old entity, this helps to partition read and write (relatively old entities, new entities tend to read more frequently, because FriendFeed pages are sorted in reverse chronological order). The entity itself is serialized by the Python Dictionary and uses zlib compression storage.

The index has a separate table, and if you want to create an index, we create a new one that stores all the attributes of the data fragment that we want to index. For example, a FriendFeed entity looks like this by:

{
  "id": "71f0c4d2291844cca2df6f486e96e37c",
  "user_id": "F48b0440ca0c4f66991c4d5f6a078eaf",
  "feed_id" : "F48b0440ca0c4f66991c4d5f6a078eaf",
  "title": "We just launched a new backend system for friendfeed!",
  "link": " http://friendfeed.com/e/71f0c4d2-2918-44cc-a2df-6f486e96e37c ",
  " published ": 1235697046,
  " updated ": 1235697046,
}

We index the properties of the entity user_id so that we can render a page that contains all the attributes of a committed user. Our index table looks like this:

CREATE TABLE index_user_id (
  user_id BINARY) not NULL,
  entity_id BINARY (a) NOT null UNIQUE,
  PRIMARY KE Y (user_id, entity_id)
) Engine=innodb;


Our data store will automatically maintain the index for you, so if you want to open an instance in the data store where we store the above structure entity, you can write a code (in Python):
 

 User_id_index = Friendfeed.datastore.Index (table= "index_user_id", properties=["User_") ID "], shard_on=" user_id ") Datastore = Friendfeed.datastore.DataStore (mysql_shards=[" 127.0.0.1:3306 "," 127.0.0.1:3307 "], Indexes=[user_id_index]) new_entity = {" id ": Binascii.a2b_hex (" 71f0c4d2291844cca2df6f486e96e37c ") , "user_id": Binascii.a2b_hex ("F48b0440ca0c4f66991c4d5f6a078eaf"), "feed_id": Binascii.a2b_hex (" F48b0440ca0c4f66991c4d5f6a078eaf ")," title ": U" We just launched a new backend system for friendfeed! "," link ": U" http:
friendfeed.com/e/71f0c4d2-2918-44cc-a2df-6f486e96e37c "," published ": 1235697046," updated ": 1235697046,} Datastore.put (new_entity) entity = Datastore.get (Binascii.a2b_hex ("71f0c4d2291844cca2df6f486e96e37c")) entity = User _id_index.get_all (Datastore, User_id=binascii.a2b_hex ("F48b0440ca0c4f66991c4d5f6a078eaf")) 

The index class above finds user_id in all entities and automatically maintains indexes on index_user_id tables. Our database is fragmented, and parameter shard_on is used to determine which fragment the index is stored on (in this case using entity["user_id")% num_shards).

You can query an index using the index instance (see User_id_index.get_all above), and use the data store code written in Python to combine the table index_user_id with the table entities. First, the query table index_user_id gets the entity ID column in all database fragments, then the data is presented in entities.

To create a new index, for example, on the property link, we can make a table:

CREATE TABLE index_link (
  link VARCHAR (735) not NULL,
  entity_id BINARY (=) NOT null UNIQUE,
  PRIMARY KEY (link, entity_id)
) Engine=innodb DEFAULT Charset=utf8;

We can modify the initialization code of the data store to include our new index:

User_id_index = Friendfeed.datastore.Index (
  table= "index_user_id", properties=["user_id"], shard_on= "user_id")
Link_index = Friendfeed.datastore.Index (
  table= "Index_link", properties=["link"], shard_on= "link")
Datastore = Friendfeed.datastore.DataStore (
  mysql_shards=["127.0.0.1:3306", "127.0.0.1:3307"],
  indexes=[ User_id_index, Link_index])

I can build indexes asynchronously (especially real-time transport services):

./rundatastorecleaner.py--index=index_link

Consistency and atomic nature

Because a partitioned database is used, an entity's index may be stored in a separate partition from the entity, which causes a consistency problem. What happens if a process crashes before writing all the index tables?

Many ambitious FriendFeed engineers tend to build a transactional protocol, but we want to keep the system as concise as possible. We decided to relax the restrictions:

    • The set of attributes saved in the primary entity table is a specification complete
    • Indexes do not have an impact on real entity values

Therefore, when you write an entity to a database, we take the following steps:

    • Use the InnoDB ACID property to write the entity to the Entities table.
    • Writes an index to an index table in all partitions.


We need to remember that the data taken from the index table may be inaccurate (for example, if the write operation does not complete step 2 may affect the old property value). To ensure that the above restrictions are used to return the correct entity, we use the Index table to determine which entities to read, but do not trust the integrity of the index, and then filter the entities using query criteria:

1. Obtain entity_id from the index table according to the query condition

2. Reading entities from the entities table according to ENTITY_ID

3. Filter out entities that do not meet query criteria based on the real attributes of the entity (in Python)

To ensure the durability and consistency of the indexes, the "cleaner" process mentioned above will run continuously, write missing indexes, and clean up stale old indexes. It cleans up the most recently updated entities, so it actually maintains the consistency of the indexes very quickly (a few seconds).

Performance

We optimized the main index of the new system and were satisfied with the result. The following is the loading delay chart for the FriendFeed page last month (we started the new backend a few days ago and you can find that day based on a significant drop in latency).

In particular, the system's latency is now stable (even during the midday peak). The following is a friendfeed page load delay chart for the past 24 hours.

Compared to a day last week:

The system is convenient to use so far. We've also changed the index several times after deployment, and we've also started to apply this pattern to the larger tables in MySQL so that we can easily change their structure later.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.