How Facebook implements automated backup of Pb-level databases

Source: Internet
Author: User
Tags mysql backup

Facebook's MySQL database is one of the largest MySQL databases in the world and has thousands of database servers in different regions. Therefore, backup is a huge challenge for them. To solve this problem, they built a highly automated and effective backup system that moves multiple petabytes of data every week. Eric Barrett from the Facebook data team shared their practices through an article.

Instead of using a large number of front-loaded tests, they stressed fast detection failure and fast and automated correction. Deploying hundreds of database servers requires little human intervention. Using the following three measures, they have achieved a steady growth and have the flexibility to support over one billion users.

Measure 1: binary log and mysqldump

The first line of defense is called "Measure 1" or "rack" backup (rack backup), or RBU for short. Each database rack has two RBU storage servers regardless of their type. Using RBU as the database server is placed in the same rack, which can ensure the maximum bandwidth and minimum latency. They can also be used as cache for the next backup.

Collecting binary logs is one of the work of these servers. Binary logs are continuously transmitted to the RBU host by simulating a process (simulated slave process. In this way, RBU can receive the same update as the copy version without running mysqld.

It is important to save the synchronized binary log on RBU: If a master database server is offline, users on the server cannot update the status or upload photos. After a problem occurs, they need to ensure that the shorter the repair time, the better. With available binary logs, another database can be started as the primary database within several seconds. Because RBU has binary logs in seconds, even if an old primary database is completely unavailable, it does not matter. You can recover the recorded transactions to the previous backup.

The second task of the RBU server is to perform traditional backup. There are two methods for MySQL backup: Binary and logic (mysqldump ). Facebook uses logical backup because it has nothing to do with the version, providing better data integrity, more compact, and easier recovery. However, when building a new copy for a database, they still use binary copies.

One of the main advantages of mysqldump is that the data corruption on the disk does not affect the backup. If a disk sector is faulty or a write error occurs, the InnoDB page checksum will fail. When a backup stream is combined, MySQL reads the correct content from the memory or reads the data from the disk, and then encounters an incorrect checksum to stop the backup (and database process ). The problem with mysqldump is that pollution is used to cache the LRU cache of InnoDB blocks. However, in the new version of MySQL, the LRU insert operation will be completed from the scan to the cache.

For all databases with their own permissions, each RBU has a nighttime backup. Despite the amount of daily data, Facebook's team can back up all the data within several hours.

If RBU fails, the automation software assigns its responsibilities to other systems in the same cluster. When it becomes online, its responsibilities will be automatically returned to the original RBU host.

The Facebook team will not worry too much about data retention in a single system, because they have two measures.

Measure 2: hadoop DFS

After each backup and binary log is collected, they will immediately copy it to their large custom hadoop cluster. These clusters copy datasets very stably and have a fixed retention time. Because the disk size is growing fast, older RBU may not be enough to save one or two days of backup. However, they will increase hadoop clusters as needed without worrying about underlying hardware. Hadoop's distributed features enable them to have sufficient bandwidth for fast data recovery.

Soon, they will put non-real-time data analysis in these hadoop clusters. This can reduce the number of non-critical reads in the database and speed up the response of Facebook websites.

Measure 3: long-term storage

Every week, they move from hadoop backup to distributed storage in another region. These systems are the latest and secure storage systems, outside of their routine data management tool processes.

Monitoring

In addition to common system monitoring, they also capture many specific statistics, such as BINLOG collection latency and system capacity.

Scoring backup failures is their most valuable tool. It is not surprising to miss some backups because of Facebook's database and the number of maintenance tasks running at the same time. A single backup with a wide range of failures and failure in multiple days is the focus of attention. Therefore, the score for missing backups increases exponentially over time. The aggregation of these scores allows the team to have an effective and quick understanding of the overall health of backups.

For example, if a data backup is missed within one day, it will take one minute and 50 backups will be missed in one day, that is, 50 points. However, a database miss in three days is 27 points (3 power) and 50 times in three days. This is a serious problem, the score is 1350 (50 multiplied by 3 power ). This will lead to a huge wave of traffic on their monitoring charts, and the Team will immediately take action on them.

Restore

"If you haven't tested your backup, it means there is no backup ."

Therefore, the Facebook team has built a testing system that will continue to recover data from Measure 2 to the test server. After the restoration is completed, they perform multiple data integrity checks. If there are any repeated problems, the system will trigger an alarm to remind relevant personnel to pay attention to and review the problems. The system can detect all problems, including MySQL bugs and leakage during the backup process, and make them more flexible to cope with changes in the backup environment.

They built a system named ORC (recursive abbreviation of Orc Recovery Coordinator), how engineers need to restore the previous versions of the database of their tools, you can use the system to restore data in a self-service mode. It is quite convenient for quick development.

At the end, Eric Barrett said:

Backup is not the most fascinating engineering work. They are both technical and repetitive. If everything is normal, no one will pay attention to them. They are also interdisciplinary and team-oriented and require professional knowledge in systems, networks and software. However, it is extremely important to ensure that your memories and connections are safe and rewarding at the end.

Some netizens asked:

On RBU that does not run mysqld, how do you transmit binary logs? What is a simulated slave process?

Facebook's MySQL performance engineer Harrison Fisk gave the answer:

We use the-never-option of mysqlbinlog and a small package program developed using python to monitor and ensure that mysqlbinlog runs successfully.

10

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.