Facebook, a world-renowned social networking site, has more than 300 million active users, of which about 30 million users update their status at least once a day; users upload a total of more than 1 billion photos and 10 million videos a month; Week to share 1 billion content, including journals, links, news, Weibo and so on. So Facebook need to store and process the amount of data is huge, every day new 4TB compressed data, scanning 135TB size data, perform Hive tasks on the cluster more than 7500 times, 80,000 calculations per hour, so high Performance cloud platform is very important for Facebook, and Facebook mainly Hadoop platform for log processing, recommendation system and data warehouse and so on.
Facebook stores data on a data warehouse built using Hadoop / Hive, which has 4800 cores, 5.5 petabytes of storage, 12 terabytes of data per node, two layers of network topology, As shown in Figure 3-5. MapReduce clusters in Facebook are dynamic and dynamically move based on load conditions and configuration information between cluster nodes.
(Click to enlarge) Figure 3-5 Cluster network topology
Figure 3-6 shows the Facebook data warehouse architecture in which web servers and internal services generate log data. Here Facebook uses an open source logging system that stores hundreds of log data sets on an NFS server, However, most of the log data is copied to the same hub HDFS instance, and the HDFS stored data is placed in a data warehouse built with Hive. Hive provides a SQL-like language to integrate with MapReduce, create and publish multiple summaries and reports, and conduct historical analysis on top of them. Hive's browser-based interface allows users to perform Hive queries. Oracle and MySQL databases are used to publish these summaries, which are relatively small in size but are frequently queried and require real-time responses. Some old data needs to be archived in time and stored on less expensive memory, as shown in Figure 3-7.
(Click to enlarge) Figure 3-6 Facebook data warehouse architecture
Here are some of Facebook's work on AvatarNode and scheduling policies. AvatarNode is mainly used for the recovery and startup of HDFS. If HDFS crashes, the original technology recovery first takes 10-15 minutes to read and write 12GB file image, and processes data from 2000 DataNodes in 20-30 minutes Block report, and finally use 40 to 60 minutes to restore the collapsed NameNode and deploy the software. Table 3-1 illustrates the difference between BackupNode and AvatarNode, AvatarNode starts as a normal NameNode and handles all messages from DataNode. AvatarDataNode, like DataNode, supports multi-threading and multi-queuing for multiple primary nodes, but can not distinguish between raw and backup. Manual recovery Using the AvatarShell command-line tool, AvatarShell performs a recovery operation and updates ZooKeeper's zNodes, the recovery process being transparent to the user. The distributed Avatar file system is implemented on top of the existing file system.
(Click to enlarge) Figure 3-7 Data Archive
Table 3-1 Differences between BackupNode and AvatarNode
There are some issues with location-based scheduling strategies in practice: tasks that require high memory may be assigned to TaskTracker with low memory, CPU resources are sometimes underutilized, and tasktracker configurations for different hardware may be difficult . Facebook uses a resource-based scheduling strategy that equitably grants schedules, monitors the system in real time and collects CPU and memory usage. The scheduler analyzes real-time memory consumption and allocates tasks' memory usage equally between tasks. It parses the process tree by reading the / proc / directory and collects all the CPU and memory usage information in the process tree and then sends the message on heartbeat via TaskCounters.
Facebook's data warehouse uses Hive, and its architecture is shown in Figure 3-8. For the relevant knowledge of Hive query language, refer to Chapter 11. Here HDFS supports three file formats: TextFile to facilitate other applications to read and write; SequenceFile, only Hadoop can read and support block compression; RCFile, using sequential file-based block storage, each A block by column, so that there is better compression and query performance. Facebook will improve on Hive in the future to support new features such as indexes, views, and subqueries.
(Click to enlarge) Figure 3-8 Hive architecture
The challenges that Facebook now has with Hadoop are:
In terms of service quality and isolation, larger tasks can affect cluster performance;
In terms of security, what happens if a software vulnerability causes the NameNode transaction log to crash?
Data archiving, how to choose the archived data, and how to archive the data;
Performance improvements, such as how to effectively solve bottlenecks.