The Instagram team received 7th employees last month. As the most popular image tool on the iPhone, the number of Instagram users has exceeded 14 million, and the number of images has exceeded 0.15 billion. I have to say that this is really a miracle in the industry.
A few days ago, only a team of three Instagram engineers published an articleArticle: What powers Instagram: hundreds of instances, dozens of technologies, discloses some information about the Instagram architecture, which is enough to arouse the curiosity of most people. Reading and taking notes, there are still some reference values for various clues. You are advised to read the original text directly if you can open the original text.
Instagram Development TeamPursueThree core principles:
- Keep it very simple (minimalism)
- Don't re-invent the wheel (do not reinvent the wheel)
- Go with proven and solid technologies when you can (you can use reliable technologies)
OS/Host
Operating System Selection: Run Ubuntu Linux 11.04 (natty narwhal) on Amazon EC2. This version has been verified to be stable on EC2. As there are only three engineers and only three engineers, It is unreliable to deploy machines on the IDC. Fortunately, there is Amazon.
Server Load balancer
Previously, two nginx servers were used for DNS round robin to host front-end requests. This method has some side effects. Now, three nginx instances have been migrated to Amazon Elastic Load balancer, stop SSL at the ELB layer to relieve CPU pressure. The DNS Service uses the Amazon route53 service.
Application Server
25 Django instances are enabled and run on high-CPU extra-large server instances, the high-CPU extra-large instance is used because application requests are CPU-intensive rather than IO-intensive.
Use gunicorn as the wsgi server. I used the mod_wsgi module in Apache in the past, but found that gunicorn is easier to configure and saves CPU resources. Use fabric to accelerate deployment.
Data Storage
User information, image metadata, tags, and other data are stored in PostgreSQL. The main shard database cluster has 12 nodes.
In practice, it is found that the tracing capability of Amazon's Network Disk System per unit time is not good, so it is necessary to put the data in the memory as much as possible. A Soft Raid is created to improve the I/O capability. The mdadm tool is used for raid management.
Vmtouch is recommended for managing data in memory.
Set PostgreSQL to master-replica mode and stream replication mode. Back up the database using EBS snapshots. Use the XFS file system to fully cooperate with the snapshot service. Use the repmgr tool as the PostgreSQL replication manager.
The pgbouncer is used for connection pool management. The author of replice Pettus contains a lot of information about PostgreSQL databases.
Massive terabytes of images are stored on Amazon S3, and CDN uses Amazon services, cloudfront.
Instagram is also a heavy user of redis. The feed and session information are processed by redis, and redis is also deployed in master-replica mode. Back up data on the replica node.
Apache SOLR is used to undertake GEO-search API work, and SOLR's simple JSON interface is also good.
The cache uses six memcached instances, and the library uses pylibmc and libmemcached. Amazon also provides the caching service-elastic cache service. Instagram also has some attempts, but it is not cheap.
Task queue/release notification
The queue service uses gearman and the notification system uses pyapns.
Monitoring
The number of server instances mentioned above adds up to more than 100, and effective monitoring is quite necessary. Using Munin as the main monitoring tool, you have also written many custom plug-ins, and pingdom services are used for external monitoring. The notification service uses pagerduty.
For Python error reports, use the disqus team's open-source sentry.
Thoughts
0) it is easy to attach to the cloud and difficult to do. This is also the most fascinating part of the Instagram team;
1) PythonCommunityIt is mature enough, and there are good solutions in all aspects.
2) if I want to ask my biggest question, I would like to say:Amazon is a great company, even greater than Google.
From http://www.eit.name/blog/read.php? 504