Instagram architecture Analysis _ Go

Source: Internet
Author: User
Tags apache solr connection pooling solr elastic load balancer

Transferred from: http://www.eit.name/blog/read.php?504

The Instagram team ushered in the 7th employee last month, and yes, the 7-member team. As the most popular image tool on the IPhone, the number of Instagram users has exceeded 14 million and the number of images is more than 150 million. I have to say, this is a real miracle in the industry.

A few days ago, only three people of the Instagram engineer team released an article: what Powers instagram:hundreds of Instances, dozens of Technologies, disclosed some of the Instagram architecture Information, enough to arouse the curiosity of most people. Read and do some notes, all kinds of clues still have a certain reference value. can open the original proposal directly read the original text.



The Instagram development team pursues three core principles:

    • Keep it very simple (minimalist)
    • Don ' t re-invent the wheel (don't reinvent the wheel)
    • Go with proven and solid technologies if you can (use reliable technology)

os/Host

Operating system selection, running Ubuntu Linux 11.04 (Natty narwhal) on Amazon EC2, this version has been verified to be stable enough on EC2. Because there are only three engineers and only three engineers, it is not reliable to deploy the machine to IDC. Fortunately there is Amazon.

Load Balancing

Previously used two Nginx DNS polling to host front-end requests, this will have side effects, has now migrated to Amazon's ELB (Elastic Load Balancer), up to three Nginx instances, the ELB layer to stop the SSL to alleviate the CPU pressure. The DNS service uses the Amazon Route53 service.

Application Server

With 25 Django instances enabled, running on a server instance of the High-cpu extra-large type, the reason for HIGH-CPU extra-large instances is that application requests are CPU-intensive and not IO-intensive.

Use Gunicorn as the WSGI server. The Mod_wsgi module under Apache was used in the past, but found that Gunicorn is easier to configure and conserve CPU resources. Accelerate deployment with Fabric.

Data storage

Most of the data such as user information, picture metadata, tags, etc. are stored in PostgreSQL. The primary Shard database cluster has 12 nodes.

In practice, it is found that Amazon's network disk system is not capable of searching in units of time, so it is necessary to put the data into memory as much as possible. Created a soft raid to enhance IO capability, using the Mdadm tool for RAID management.

Managing the In-memory data, Vmtouch This gadget is highly recommended.

PostgreSQL is set to Master-replica mode, which is a stream copy pattern. Take advantage of EBS snapshots for database backups. Use the XFS file system for full collaboration with the snapshot service. Use the Repmgr gadget to do the PostgreSQL replication Manager.

Connection pooling management, with Pgbouncer. Christophe Pettus's article contains a number of PostgreSQL database information.

Terabytes of massive images are stored on Amazon S3, and the CDN uses Amazon's services, CloudFront.

Instagram is also a heavy user of Redis, and the Feed and Session information are processed using Redis, and Redis is deployed in a master-replica manner. The data is backed up on the Replica node.

Using Apache SOLR to undertake the work of the Geo-search API, SOLR's simple JSON interface is also good.

The cache uses 6 Memcached instances, and the libraries use PYLIBMC and libmemcached. Amazon also offers caching services-elastic cache service, and Instagram has tried, but not cheap.

Task queue/Publish notification

The queue service uses Gearman, and the notification system is implemented using Pyapns.

Monitoring

The number of server instances mentioned above adds up to more than 100, and effective monitoring is quite necessary. Using Munin as the main monitoring tool, also wrote a lot of custom plug-ins, external monitoring with Pingdom services. The notification service uses Pagerduty.

For Python error Reporting, use the Disqus team Open source Sentry for processing.

A few impressions

0) It's easy to say light, and it's very difficult to do. This is the most fascinating place for the Instagram team to be at the moment;

1) The Python community is mature enough to have a good solution on every link.

2) If you want to ask me one of the biggest feelings, I want to say:Amazon is really a great company, even greater than Google .

Reference: http://www.cnblogs.com/ggjucheng/archive/2013/01/20/2868887.html

Instagram architecture Analysis _ Go

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.