Add By zhj: the translation is slightly modified. At the time of the original article, Instagram was not acquired by Facebook. After reading it, I felt that the three backend engineers of Instagram were awesome.
Three people can handle 14 million registered users. However, on the other hand, we can also see that the three actually use off-the-shelf technologies, at least not from the article.
Of course, it is difficult for three people to make innovations. If the existing technology can solve problems, it is unnecessary for such a small team.
Develop new technologies by yourself. Finally, I would like to express my gratitude for their willingness to share the solution.
Http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances-dozens-of.
Http://www.cnblogs.com/xiekeli/archive/2012/05/23/2514108.html.
When we encounter and communicate with other engineers, we often ask a question: "What is your technology stack "? We think it is interesting to describe all the components of Instagram at a higher level. You can look forward to more in-depth descriptions of these systems in the future. This is our system. We have been living for only one year, and some of them have been being modified. A startup company with a small team can grow to more than 14 million users in more than a year. The core principle of choosing a technology is:
- Keep it as simple as possible
- Do not reinvent the wheel
- Use proven and reliable technologies whenever possible
We will describe from top to bottom:
Operating System/Host
We run Ubuntu Linux 11.04 ("Natty narwhal") on Amazon EC2 "). We found that the previous versions of EC2 encountered various unpredictable problems (freezing episodes) when there was high traffic, but 11.04 was already stable. We only have three engineers and our requirements are constantly changing. Therefore, self-managed hosts are not our choice. Maybe we will consider the unprecedented growth in the number of users in the future.
Server Load balancer
Every access to the Instagram server will go through the Server Load balancer server. We use two nginx machines for DNS round robin. The disadvantage of this solution is that when one of them retires, it takes time to update the DNS. Recently, we have switched to Amazon's Elastic Load balancer. We can use three nginx instances for calling and calling (if any nginx health check fails, it will be removed automatically ); we also stopped SSL at the ELB layer.To ease nginx CPU usagePressure. We use Amazon's route53 as DNS. They recently added a good GUI tool for route53 on the AWS console.
Application Server
Next, the application server is used to process our requests. We run Django on Amazon's high-CPU extra-large machine. As usage increases, we have already run Django instances on 25 hosts (fortunately, because it is stateless, It is very convenient for horizontal scaling ). We found that some of our workloads belong to the CPU bound rather than the memory bound, so the high-CPU extra-large instance just provides a suitable proportion (CPU and memory ).
We used http://gunicorn.org/as our wsgi server; we used mod_wsgi and Apache, but found that gunicorn is easier to configure and has lower CPU requirements. To execute commands (such as deployment Code) on multiple instances at a time, we use fabric. fabric has recently added the parallel mode, so deployment takes only a few seconds.
Data Storage
Most of our data (user information, photo metadata, tags, etc.) is stored in PostgreSQL; we have previously talked about how to split based on different ipvs instances. Our main sharding cluster contains 12 uadruple extra-large memory VM instances (with 12 copies in different regions );
We found that Amazon's Network Disk System (EBS) does not have sufficient track tracking capabilities per second. Therefore, it is especially important to put all our work in the memory. In order to achieve reasonable performance, a soft RAID is created.To improve IoCapability, using the mdadm tool for raidManagement;
By the way, we found that vmtouch is an excellent tool for managing memory data, especially when a Failover occurs from one machine to another, there is even no active memory profile. The script is used to parse the vmtouch output running on one machine and print out the corresponding vmtouch command. It is executed on another machine to match the current memory status;
All of our PostgreSQL instances run in master-slave mode and are based on stream replication. We often back up our system using EBS snapshots. To ensure consistency of our snapshots (originally inspired by the ec2-consistent-snapshot), we use XFS as our file system, through XFS, we can freeze & unfreeze raid arrays when taking snapshots. For stream replication, our favorite tool is repmgr.
We used pgbouncer to connect to data from our application server for a long time, which has a huge impact on performance. We found that the mongoe Pettus blog has a lot of resources about Django, PostgreSQL, and pgbouncer secrets.
Photos are directly stored in Amazon S3, and several t of photo data is currently stored. We use Amazon cloudfront as our CDN, which speeds up photo loading times for users around the world (like Japan, our second most popular country)
We also widely use redis; our main feed, activity feed, and sessions systems (here our Django session backend), and other related systems use redis. Because all redis data needs to be stored in the memory, we finally use several quadruple extra-large memory VM instances to run redis. Our redis is also running in the master-slave mode, and often saves the dB to the disk, and finally uses the EBS snapshot to back up the data (we found it too hard to export on the host ). Because redis allows data to be written to the backup, online failover is very convenient. It is transferred to a new redis machine without downtime.
We have been using PostgreSQL for many months for our geo-search API, but it was migrated to Apache SOLR. he has a set of simple JSON interfaces, so that our applications are only related to another set of APIs.
Finally, like any modern web service, we use memcached for caching, and currently we have used six memcached instances. We use pylibmc & libmemcached for connection. Amazon recently released an elastic cache service, but it is not cheaper than running our own instances, so we have not switched over yet;
Task queue & push notifications
When a user decides to share an Instagram photo to Twitter or Facebook, or when we need to notify a real-time subscriber of a new photo, we will push this task to gearman, a task queue system. Asynchronous execution through the task queue means that the media upload can be completed quickly (that is, sending an Instagram message quickly), and the "heavy load" can be run in the background. We have about 200 consumers (all written in Python) who consume tasks in the queue. Our feed fan-out also uses gearman, so that posting will respond to new users because there are many followers.
For message pushing, the most cost-effective solution we have found is https://github.com/samuraisam/pyapns. the source twisted service has processed more than 1 billion notifications for us and is absolutely reliable.
Monitoring
Monitoring becomes very important for over 100 instances. We use Munin for graphical measurements. If anything is out of the normal range, we will be reminded. We also wrote many Munin plug-ins Based on Python-Munin. We use pingdom as an external monitoring service for graphical measurement of non-system-level items (for example, number of check-in users per minute and number of releases per photo) and pagerduty for Processing notifications and events.
For Python error reports, we use sentry, which was written by disqus engineers and is an excellent open-source Django app. At any time, we can view the system errors in real time.
Are you?
If you are interested in this description of our system, or you are eager to tell us what different opinions you have on the system, we will all be waiting for a good news. We're re looking for a devops person to join us and help us tame our EC2 instance herd.