Author of YouTube's architecture Extension: fenng | reprinted. During reprinting, the original source and author information and copyright statement of the article must be indicated in hyperlink form.
Web: http://www.dbanotes.net/opensource/youtube_web_arch.html
At the Seattle scalability Technical Seminar, Cuong do of YouTube made a report on YouTube scalability. The video content is available on Google Video. Unfortunately, Chinese users cannot see it.
Kyle cordes introduced the content in this video. There are a lot of technical content. It is worth sharing. (The introduction of Kyle cordes is the main source of this article)
To put it simply, the YouTube data traffic, "One day's YouTube traffic is equivalent to sending 75 billion emails." In 2006, there was a message saying that the daily PV exceeds 0.1 billion. Now? Even more exaggerated, "1 billion downloads and uploads a day" is indeed an extraordinary massive volume. internet applications in China, but from the perspective of data volume, are afraid that only 51.com has this scale. but technically, there is no way to compare it with YouTube.
Web Server
For the sake of development speed, most of the Code is developed in Python. Some Web servers are Apache in FastCGI mode. Lighttpd is used for video content. As far as I know, some MySpace servers also use Lighttpd, but the amount is not large. YouTube is the most successful case of Lighttpd. (There are not many Lighttpd sites in China, and Douban is more comfortable to use. By fenng)
Video
Video thumbnails (thumbnails) pose a great challenge to the server. Each video has an average of 4 thumbnails, and each web page has more than one thumbnail.The request is too large. YouTube technical staff have enabled separate server groups to handle this pressure and target cache and OSSome optimizations were made. On the other hand, the pressure of the thumbnail request causes the performance of Lighttpd to decline. More worker threads are added through hack Lighttpd to solve the problem. The latest solution is Google's bigtable, which provides better performance in terms of performance, fault tolerance, and cache. According to the purchase, haogang is used in the cutting edge.
For redundancy, each video file is placed on a set of mini clusters. The so-called "Mini cluster" is a group of servers with the same content. Put the most popular videos on CDNIn this way, your server only needs to undertake some "miss" access immediately. YouTube uses simple, inexpensive, and universal hardware, which is consistent with Google's style. Maintenance methods are also common tools, such as rsync and SSH.Wait, but people are more familiar with it.
Database
YouTube uses MySQL to store metadata-user information, video information, and so on. The database server once encountered swap bumps. The solution was to delete the swap partition! .
Initially, the database had only 10 hard disks, raid 10, and then added a group of RAID1. Saving enough. This wave of Web 2.0 Companies rarely use Oracle (I only know Bebo, see here). In terms of scalability, the routes are similar to those of other sites, replication, and Io dispersion. The final solution is "partition". This is not a table partition at the database level, but a partition at the business level (in terms of username or ID, the application controls the search mechanism)
Memcached is also used for YouTube.