The engineering and technical secrets behind Instagram success

Source: Internet
Author: User
Keywords Instagram
Instagram is a model for many startups, with more than 10 of companies, from the beginning of obscurity to the end of a hefty takeover by Facebook. How does a company, from the beginning of the operating system selection, server to database selection, message push, how to do? This article compiles from the Instagram engineering blog, tells you Instagram technical secret. When choosing a system, our core principles are: Try to be as simple as possible, do not do repetitive work, and try to adopt proven technology. Operating system/Hosting because we had only three people at the time, and the demand is not much, not to the point of building their own servers, so consider hosting, using the Amazon EC2 on the Ubuntu Linux 11.04 (Natty narwhal) system, Previous versions of Ubuntu running on the EC2, if encountered high traffic will inexplicably panic, but Natty (Ubuntu version of the code) is not a problem, perhaps with the application development and use of the increase, we will build their own server. Load balancing every request sent to Insagram through the load balancer, when we used 2 Nginx servers and DNS round robin to host front-end requests, the downside of this approach is that if a machine is paralyzed, DNS takes time to update, So we recently used Amazon's resilient load balancer (elastic load balancer), we also stopped SSL in Elb, reduced CPU pressure on nginx, and the DNS service used Amazon's Route53, and recently AWS provided a very good GUI tool. The application server is followed by the request processing, enabling the Django framework in Amazon High-cpu Extra-large, with the increase in usage, the number of Diango increased from the first few to more than 25. At the same time we use http://gunicorn.org/as a WSGI server, we are usually accustomed to using MOD_WSGI and Apache, and later found that Gunicorn more easily configured, and CPU-intensive, if you want to run multiple instance instructions at the same time, We use fabric (recently added several useful parallel modes). The data store most of our data (user information, photos, tags, etc.) exists in PostgreSQL, we have written about how to share between different postgres instances, our main shared cluster contains 12 quadruple extra-large memory instances. We found that Amazon's network Disk System (EBS) is not capable of seeking every second, so we try to save the data in memory, and in order to achieve reasonable IO performance, we enable the EBS driver in the RAID software using the Mdadm tool. Later foundVmtouch This memory data management software is good, especially when the fault is transferred from one device to another machine, it is very good, this is our analysis of the Vmtouch run the script, and in order to match the current memory status of the corresponding Vmtouch output instructions, To connect the database to our application server, we use Pgbouncer, the photo itself will be directly shared with Amazon S3, using Amazon's CloudFront as our CDN to save picture loading time. In addition, we also use Redis, but since all redis data needs to match memory, we stop running some quadruple extra-largememory cases and share some Redis instances only in certain subsystems. For our Geo-search API, we've been using PostgreSQL for months, but if we share the media portal, we'll use Apache SOLR, and our API has a very simple JSON interface, which is equivalent to another API for our applications. Finally, like any modern web service, we use memcached for caching, now we have 6 memcached instances and then connect with PYLIBMC & Libmemcached (Amazon recently launched a elastic cache). Task queue/Message push when a user shares a Insagram image with Twitter or Facebook, or when we want to know when the user uploads the image, we use the Danga-developed task queue system German to deal with it, Uploading media files is very fast without having to process task queues at the same time. About News Push, we find the most cost-saving solution is Https://github.com/samuraisam/pyapns, this is an open source service, can handle billions of message push, absolutely reliable. Monitoring the previous server instances add up to more than 100, so effective monitoring of the case is very important, we use Munin monitor our entire system, the system any abnormalities in time to remind us. We also write a number of custom plug-ins based on Python-munin to detect some of the characteristics of the system (such as the number of registrations per second, upload number of images per second, etc.), while using Pingdom as a server external monitoring tool, and using pagerduty as a message notification and emergency processing tools. For Python error reporting, use the Open source Diango application Sentry for processing, and can log in and see in real time what errors are occurring in the system.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.