Secrets Taobao 28.6 billion massive image storage and processing architecture, mass small file storage solutions

Source: Internet
Author: User
Tags save file

On the afternoon of August 27, in the IT168 System Architect Conference Storage and system architecture sub-forum, the Chairman of the Taobao Technical committee, Taobao core engineer Zhangwensong to us a detailed description of the Taobao image processing and storage system architecture. Dr. Zhangwensong's lecture schedule includes the entire system architecture of Taobao, the architecture of Taobao image storage System, the independent development of the TFS cluster file system in Taobao, the front-end CDN system and the application and exploration of Taobao in the energy-saving server.

This article focuses on the background of Taobao image storage System architecture, including TFS Cluster file system, as well as front-end processing server architecture. This system front-end CDN structure, as well as the Taobao in the energy-saving server application exploration, please refer to:

a system nightmare for solving massive concurrent small files

Taobao for this type of access to a very high number of electronic trading sites, the picture system requirements and the daily photo sharing is not at all a level. Daily photo sharing is often concentrated in a limited number of friends, visits will not be particularly high, and Taobao shops in the merchandise photos, especially hot merchandise, the image of the traffic is actually very large. And for sellers, the picture is far better than the text description, so sellers also attach importance to the image display quality, upload time, access speed and so on. According to the flow of Taobao analysis, the entire Taobao traffic, the image of the traffic will account for more than 90%, and the main station's web page accounted for less than 10%.

Taobao Electronic Mall home screenshot, Taobao back-end system to save more than 28.6 billion picture files, Taobao overall flow, the image of the traffic to account for more than 90%. And the average size of these pictures is 17.45KB, less than 8K of pictures accounted for the overall picture number 61%, the overall system capacity of 11%

At the same time, there are some headaches in storing and reading these images: for example, these pictures require thumbnails of different sizes to be generated based on different application locations. Given the many different scenarios and the possibility of a facelift, it is possible to create more than 20 thumbnails of different size sizes for a single image.

Taobao overall picture storage system capacity of 1800TB (1.8PB), has occupied space 990TB (about 1PB). The number of saved picture files is more than 28.6 billion, and these image files include thumbnails generated from the original artwork. The average picture size is 17.45k;8k the following picture accounts for 61% of the total number of pictures, accounting for 11% of the storage capacity.

This brings a huge challenge to the Taobao system, and it is well known that for most systems, the biggest headache is the large size of small file storage and reading, because the head needs frequent search and lane, so in the reading is easy to bring a longer delay. In the case of a large number of high concurrent traffic, it is a system nightmare.

Analyze the economic benefits of independent research and development and commercial systems

Taobao was founded in 2003, the whole system in the construction and planning has done quite a lot of attempts and exploration.

The picture below is Taobao 2007 before the image storage system. Taobao has been using the commercial storage system before, applying NetApp's file storage System. With Taobao's picture files growing at a rate of twice times a year (3 times times the size of the original), Taobao's back-end NetApp storage systems migrated from low-end to high-end until 2006, when NetApp's highest-end products didn't meet the requirements of Taobao storage.

Taobao 2007 before the picture Storage System Architecture diagram, because the speed of Taobao picture has been increasing twice times per year, the commercial system has been completely unable to meet its storage needs, the current Taobao using the independent research and development of TFS cluster file system to solve a large number of small picture reading and access problems

Dr. Zhangwensong here summarizes some of the limitations and deficiencies of the commercial storage systems:

The first is that the commercial storage system does not have a targeted optimization of the environment for small file storage and reading; second, the number of files is large, the network storage devices can not support, in addition, the entire system is connected to more and more servers, network connectivity has reached the limit of networked storage devices. In addition, commercial storage system expansion cost is high, 10T storage capacity needs millions of ¥, and there is a single point of failure, disaster tolerance and security can not be well guaranteed.

Referring to the comparative economic benefits between commercial systems and independent research and development, Dr. Zhangwensong the following experiences:

1. Commercial software is difficult to meet the application requirements of large-scale systems, whether storage or CDN or load balancing, because it is difficult to achieve such a large scale of data testing at the vendor lab side.

2. In the research and development process, will open source and the independent development unifies, will have the better controllability, the system problem, can completely solve the problem from the bottom, the system expansibility is also higher.

Comparison of economic benefits of independent research and development and commercial system adoption

3. Based on a certain scale effect, the investment in research and development is worthwhile. Above is an independent research and development and purchase of commercial systems of the input-output ratio, in fact, in the intersection of the above map to the left, the purchase of commercial systems are more practical and more economical choice, only in the case of scale over the intersection, independent research and development can receive better economic results, in fact, The scale to such a degree of companies is not much, but Taobao has far exceeded the intersection point.

4. Independent research and development of the system can be in the software and hardware at multiple levels of continuous optimization.

The TFS 1.0 version of the clustered file system

Since 2006, Taobao has decided to develop its own file system for a large amount of small file storage challenges to solve its own image storage problems. By June 2007, TFS (Taobao filesystem, Taobao file System) was officially online. The cluster size used in the production environment reached 200 PC servers (146g*6 SAS 15K Raid5), the number of files reached billion levels, system deployment storage capacity: 140 TB, actual storage capacity: TB, single support random ioPS 200+, 3MBps traffic.

Taobao cluster File System TFS 1.0 First edition of the logical architecture, the biggest feature of TFS is to hide part of the metadata to the image save file name, greatly simplifies the metadata, eliminates the management node to the overall system performance restriction, this idea and the current industry popular "Object storage" is more similar.

The map is the logical architecture of the first edition of TFS 1.0, the cluster file system: The cluster consists of a pair of name server and multiple data servers, the name server is dual-server two computers, is the cluster file System Management node concept.

· Each data server runs on an ordinary Linux host

· Storing data files as block files (General 64M blocks)

· Block saves multiple copies to ensure data security

· Use Ext3 file system to store data files

· Disk RAID5 Do data redundancy

· File name built-in metadata information, the user himself save TFS file name and the actual file control relationship-make the amount of metadata is very small.

Taobao TFS file system in the core design of the biggest tricky place on the traditional cluster system inside the metadata only 1, usually managed by the management node, and thus easily become a bottleneck. For users of Taobao, the image file is what name to save the actual user does not care, so TFS in the design plan to consider in the picture's save file name on the hidden some metadata information, such as the size of the picture, time, access frequency and so on information, including the logical block number. On the metadata, there is very little information actually saved, so the metadata structure is very simple. Just need a fileid to pinpoint where the file is.

Because a large amount of file information is hidden in the filename, the entire system discards the traditional directory tree structure because the tree is the most expensive. After taking off, the high scalability of the whole cluster is greatly improved. In fact, this design concept and the current industry "object storage" is similar to the Taobao TFS file system has been updated to version 1.3, the performance of the production system has been validated, and constantly improved and optimized, Taobao currently in the field of object storage research has been at the forefront.

the TFS 1.3 version of the clustered file system

By June 2009, the TFS 1.3 version was online, the cluster scale was greatly expanded, deployed to Taobao's picture production system, the entire system has been expanded from the existing 200 PC servers to 440 PC server (300g*12 SAS 15K RPM) + 30 PC Server ( 600g*12 SAS 15K RPM). The number of supporting files also expands to Bai; system deployment Storage capacity: 1800TB (1.8PB); current actual storage capacity: 995TB; Single data server supports random IOPS 900+, traffic 15mb+; current name The server is running a 217MB of physical memory (the servers use a gigabit NIC).

TFS 1.3 Version Logical Structure diagram

The diagram is the TFS1.3 version of the logical structure diagram, in the TFS1.3 version, Taobao software team focused on improving heartbeat and synchronization performance, the latest version of the heartbeat and synchronization in a few seconds to complete the switch, while some new optimizations: including metadata storage memory, cleaning disk space, performance has also been optimized, including:

Completely flattened data organization, discarding the traditional file system directory structure.

Set up own file system on the basis of block equipment, reduce the performance loss caused by EXT3 and other file system data fragments

How to manage a single disk in a single process, excluding the RAID5 mechanism

A central control node with HA mechanism is balanced between security stability and performance complexity.

Minimize metadata size, load metadata into memory, and increase access speed.

Load balancing and redundancy security policies across racks and IDC.

Fully smooth expansion.

In the following "Picture server Deployment and Caching" page in detail on the whole Taobao picture processing system topology map. As we can see, TFS has a two-layer buffer at the front end of the Taobao deployment environment, and the request to the TFS system is very discrete, so there is no memory buffer within TFS that contains no data, including the memory buffer of the traditional file system.

The main performance parameter of TFS is not IO throughput, but a single pcserver provides random read and write ioPS. Because of the different hardware models, of course, because some technical confidentiality reasons, Taobao is difficult to give a reference value to illustrate performance. But it can basically reach about 60% of the maximum of the single disk random ioPS, and the output of the whole machine increases linearly with the number of disk increase.

TFS2.0 in development and open source TFS

TFS 2.0 is already in the process of development, the main problem to solve is large file storage challenges. TFS developed at the earliest time for the problem of frequent concurrent reading of small files, the design block size is 64MB, meaning that each file is less than 64MB, which is sufficient for general image storage, but there are some bottlenecks for large file storage.

TFS 2.0 will focus on optimizing storage for large files across blocks. In addition, the application optimization of SSD and SAS hard disk features is also included. According to the data of the Taobao, the storage cost of SSD is approximately 20¥ per GB, and the storage cost of SAS hard disk is less than 1 ¥ per gigabyte of 5-6¥ per gb,sata disk. With the improvement of application performance, the application of SSD is the future trend, it is necessary to optimize the access characteristics of different hard disk.

In addition, Zhangwensong announced that TFS will be fully open in September, full open source means that Taobao will provide all the source code, the open source of TFS and Taobao online application of the system is exactly the same.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.