Stack Overflow Architecture

Source: Internet
Author: User
Tags syslog dedicated server elastic search haproxy cisco 3945 value store

Unsurprisingly, and unexpectedly, Stack overflow still uses Microsoft's products heavily. They think that since Microsoft's infrastructure can meet demand and be cheap enough, there is no reason to make fundamental changes. And where needed, they also used Linux. At its root, everything is for performance.

Another area of concern is that the Stack overflow still uses a vertical scaling strategy, without using the cloud. They use 384GB of memory and 2TB of SSDs to support SQL Servers, and if you use AWS, you can imagine the cost. Another reason for not using the cloud is that stack overflow thinks the cloud will reduce performance to some extent, while also making it more difficult to optimize and troubleshoot system problems. In addition, their architecture does not require scale-out. Peak periods are scale-out killer scenarios, but they have a wealth of system tuning experience to deal with. The company still insists on Jeff Atwood's famous saying that hardware is always cheaper than programmers.

As Marco Ceccon once mentioned, one thing that must be understood first when talking about systems is the type of problem to be solved. First of all, from a simple point of stackexchange, what exactly is it used to do--first, some themes, then build communities around these themes, and finally form this admirable quiz site.

The second is scale-related. Stackexchange in the rapid growth, need to deal with a large number of data transmission, then how these are done, especially the use of only 25 servers, the following together traced Jiede: state

    • Stackexchange has 110 sites with a growth rate of 3 to 4 per month.
    • 4 million users
    • 8 million Questions
    • 40 million answers
    • World Ranking 54 people
    • 100% growth per year
    • Monthly PV 560 million
    • Most business days have spikes of 2600 to 3000 requests per second, as a programming-related site, and in general, business day requests will be higher than weekends
    • 25 servers
    • 2TB of SQL data stored in SSD
    • Each Web server is configured with 2 320G SSDs, using a RAID 1
    • Each Elasticsearch host is equipped with a 300GB mechanical hard drive and also uses an SSD
    • Stack Overflow's read-write ratio is 40:60.
    • Average CPU utilization for DB server is 10%
    • 11 Web server, using IIS
    • 2 load balancers, 1 active, using Haproxy
    • 4 Active database nodes, using MS SQL
    • 3 application servers with tag engine implemented, all searches through tag
    • 3 Servers Search by Elasticsearch
    • 2 servers using Redis to support distributed cache and messages
    • 2 Networks (Nexus 5596 + Fabric extenders)
    • 2 Cisco 5525-x ASAs
    • 2 Cisco 3945 Routers
    • 2 read-only SQL Servers for the primary service stack Exchange API
    • VMS for deployment, domain controllers, monitoring, operations databases, and more
Platform
    • ElasticSearch
    • Redis
    • HAProxy
    • MS SQL
    • Opserver
    • TeamCity
    • Jil--fast. NET JSON Serializer, built on top of Sigil
    • dapper--Mini-ORM
Ui
    • The UI has an information Inbox for new badge acquisition, user-sent messages, message collection at the time of a major event, implementation using WebSockets, and Redis support.
    • The search box is implemented by ElasticSearch and uses a rest interface.
    • Because of the high frequency of user questions, it is difficult to display the latest problems, there will be new problems every second, so there is a need to develop an algorithm to focus on user behavior patterns, only to show users interested in the problem. It uses tag-based complex queries, which is why the independent tag engine was developed.
    • Server-side templates are used to generate pages.
Server
    • 25 servers are not fully loaded, CPU usage is not high, and single compute SO (Stack Overflow) requires only 5 servers.
    • Database Server resource utilization is around 10%, except when performing a backup.
    • Why is it so low? Because the database server has a full 384GB of memory, and the Web server CPU utilization is only 10%-15%.
    • Vertical scaling has not yet encountered a bottleneck. Typically, such traffic uses a scale-out of approximately 100 to 300 servers.
    • A simple system. Based on. Net, only 9 projects are used, and other systems may require 100. The reason to use so few systems is to pursue the ultimate speed of compilation, which needs to be planned from the start of the system, and the compilation time for each server is about 10 seconds.
    • 110,000 lines of code, which is very small compared to traffic.
    • The use of this minimalist approach is based primarily on several reasons. First, there is no need to test too much, because Meta.stackoverflow is a problem and bug discussion community. Second, Meta.stackoverflow is also a software testing site, if users find the problem, will often put forward and give solutions.
    • The New York data Center is using Windows 2012, which has been upgraded to R2 (Oregon has already been upgraded), and the Linux system uses CentOS 6.4.
Ssds
    • The default is to use Intel (Web tier, etc.)
    • Intel 520 is used for middle-tier writes, such as elastic Search
    • The data tier uses Intel 710 and S3700
    • The system also uses RAID 1 and RAID 10 (any disk above 4+ uses RAID 10). No fear of failure, even if thousands of 2.5-inch SSDs are used in the production environment, they have not encountered a failed scenario. Each model uses more than 1 spare parts, and multiple disk failure scenarios are not considered.
    • The Elasticsearch behaves exceptionally well on SSDs because the so writes/re-indexes operates very frequently.
    • SSDs Change the way search is used. Because of the lock problem, luncene.net does not support so's concurrent load, so they turn to elasticsearch. In a full SSD environment, there is no need to establish a lock around binary reader.
High Availability
    • Offsite Backup--The primary datacenter is in New York, and the Backup data center is in Oregon.
    • Redis has two slave nodes, SQL has 2 backups, Tag engine has 3 nodes, Elastic has 3 nodes, redundancy everything, and exists simultaneously in two data centers.
    • Nginx is used for SSL, which terminates SSL when converted using Haproxy.
    • is not the master and slave, some temporary data will only be placed in the cache
    • All HTTP traffic only accounts for 77% of the total traffic, as well as backups of the Oregon datacenter and some other VPN traffic. These flows are primarily generated by SQL and Redis backups.
Database
  • MS SQL Server
  • Stack exchange sets up a database for each site, so the stack overflow has one, Server fault, and so on.
  • In New York's primary data center, each cluster typically uses 1 and 1 read-only configurations, and also sets up a backup in the Oregon Datacenter. If the Oregon cluster is running, then two of the backups in the New York datacenter will be read-only and synchronized.
  • A database that is prepared for other content. There is also a "network-wide" database for storing login credentials and aggregated data (mostly stackexchange.com user files or APIs).
  • Careers Stack Overflow, stackexchange.com, and Area 51 have their own independent database schemas.
  • Pattern changes need to be provided to all sites at the same time, they need to be backwards-compatible, for example, if you need to rename a column, then it will be very cumbersome, there are several operations: Add a new column, adding code on two columns, write data to the new column, change the code to make the new column valid, remove the old column.
  • There is no need for shards, everything is indexed to solve, and the data volume is not that big. If there is filtered indexes demand, then why not more efficient? Common patterns are indexed only on deletiondate = null, others by specifying the type for the enumeration. Each votes set 1 tables, such as a table to the Post votes,1 table to comment votes. Most of the pages can be rendered in real time and cached only for anonymous users, so there is no cache update, only re-querying.
  • Scores is non-normalized and therefore needs to be queried frequently. It contains only IDs and dates,post votes tables now about 56454478 rows, using the index, most of the queries can be done in a few milliseconds.
  • TAG engine is completely independent, which means that the core functionality is not dependent on any external applications. It is a huge array of memory structures, optimized for so use cases and pre-computed for heavy load combinations. Tag engine is a simple Windows service that runs redundantly on multiple hosts. CPU utilization is basically maintained at 2-5%, 3 hosts are dedicated to redundancy and are not responsible for any load. If all hosts fail at the same time, the network server will load the tag engine into memory for continuous operation.
  • About dapper no compiler check query compared to traditional ORM. There are many benefits to using compilers, but there are still fundamental disconnect problems at run time. And more importantly, because of the generation of nasty SQL, there is usually a need to find the original code, and the lack of capabilities such as query hint and parameterization control makes query optimization complicated.
Coding
    • Process
    • Most programmers work remotely, choosing where they are coded
    • Compile very fast
    • Then run a small number of tests
    • Once the compilation is successful, the code is transferred to the Development Delivery readiness Server
    • Hide new features with feature switches
    • Test run as another site on the same hardware
    • And then transferred to the Meta.stackoverflow test, with thousands of programmers in use every day, a good test environment
    • If you go online, test in a wider community
    • Use static classes and methods extensively for simpler and better performance
    • The coding process is very simple because the complex parts are packaged into libraries that are open source and maintained. The number of. Net projects is low because some of the community-shared code is used.
    • Developers use 2 to 3 monitors simultaneously, and multiple screens can significantly improve productivity.
Cache
    • Cache everything
    • 5 Levels of Cache
    • Level 1 is a network-level cache that exists in browsers, CDNs, and proxy servers.
    • Level 2 by. NET Framework Httpruntime.cache completed in memory for each server.
    • Level 3 Redis, a distributed memory key-value store that shares cache entries on multiple servers supporting the same site.
    • Level 4 SQL Server Cache, entire database, all data is put into memory.
    • Class 5 SSD. Usually only after the SQL Server warms up.
    • For example, each help page is cached, and the code to access a page is simple:
    • Static methods and classes are used. It's really bad from an oop point of view, but it's fast and it's good for simple coding.
    • The cache is supported by Redis and Dapper, a miniature ORM
    • To solve the garbage collection problem, the 1 classes in the template use only 1 replicas, which are built and saved in the cache. Monitor everything, including GC exercises. According to statistics, the indirect layer increases the GC pressure to a certain extent, significantly reducing performance.
    • CDN hit. Because the query string is hashed based on the contents of the file, it is only taken out again when it is newly established. 30 million to 50 million hits per day with a bandwidth of approximately 300GB to 600GB.
    • CDN is not designed to handle CPU or I/O load, but to help users get answers faster
Deployment
    • Deploy 5 times a day, not to build big apps. mainly because
    • Ability to monitor performance directly
    • Be as minimal as possible to build, work is the focus
    • Once the product is established and then copied to each page layer through a powerful script, the steps for each server are:
    • Haproxy a server by post notification
    • Delay IIS ending an existing request (approximately 5 seconds)
    • Stop website (end all downstream through the same pssession)
    • Robocopy file
    • Open website
    • Do Haproxy with another post re-enable
    • Almost all deployments are through puppet or DSC, and upgrades are usually just a significant adjustment to the RAID array and are installed via PXE boot, which is very fast.
Collaboration
    • Team
    • SRE (System Reliability Engineering): 5 people
    • Core Dev (q&a site) 6-7 people
    • Core Dev mobile:6 People
    • The careers team is dedicated to the development of so careers products: 7 people
    • DevOps and developers are very close together
    • The team has changed a lot.
    • Most employees work remotely
    • Offices are mainly for sale, except Denver and London
    • All equal, a bit biased towards New York workers, because face-to-side help to communicate, but the impact of online work is not big
    • Compared to working in the same office, they are more inclined to love products and talented engineers, they can be a good measure of pros and cons
    • Many people choose to work remotely because of their family, New York is good, but life is not easy
    • The office was set up in Manhattan, a place where talent was born. Data centers cannot be too biased, as they often involve upgrades
    • Build a strong team that favors geeks. Early Microsoft gathered a lot of geeks, so they conquered the whole world.
    • The Stack overflow community is also a recruiting site where they look for people who love coding, helpfulness and love of communication.
Budget preparation
    • Budgeting is the foundation of a project. The money was spent on building infrastructure for new projects, and so low-utilization Web servers were purchased when the data center was established 3 years ago.
Test
    • Fast iteration and abandonment
    • Many tests are done by the release team. Development has an identical SQL Server and runs on the same Web tier, so performance testing is not bad.
    • Very few tests. Stack overflow didn't do much unit testing because they used a lot of static code and a very active community.
    • Infrastructure changes. Given that everything has a double copy, each old configuration has a backup and uses a quick recovery mechanism. For example, keepalived can be quickly rolled back in the load balancer.
    • Compared to regular maintenance, they are more willing to rely on redundant systems. SQL Backup is tested with a dedicated server just to be able to re-store it. Plan to do a full data center failure recovery every two months, or use a fully read-only secondary datacenter.
    • Each new feature release is a unit test, integrated test box UI test, which means that the product function tests that can be predicted for input will be pushed to the hatchery site, Meta.stackexchange (formerly Meta.stackoverflow).
Monitoring/Logging
    • Now considering using http://logstash.net/for log management, a dedicated service is currently being used to transfer syslog UDP to the SQL database. The Web page adds a header to the timing so that it can be captured and fused to the syslog transmission via Haproxy.
    • Opserver and Realog are used to display measurement results. Realog is a log display system that was built by Kyle Brandt and Matt Jibson using go.
    • Logs are completed with Syslog through the haproxy load balancer, rather than IIS because they are richer in functionality than IIS.
About Cloud
    • It's still a cliché that hardware is always cheaper than developers and efficient code. Based on the cask effect, speed is certainly limited to a short board, and existing cloud services are largely limited in capacity and performance.
    • If you use the cloud from the start to build so it might also reach the current level. But without a doubt, if the same performance is achieved, the cost of using the cloud will be much higher than the self-built data center.
Performance first
    • StackOverflow is a heavy performance control, the home page load time is always controlled in 50 milliseconds, the current response time is 28 milliseconds.
    • Programmers are keen to reduce page load times and improve the user experience.
    • Each independent network submission is timed and recorded, and this measurement can clarify where the performance needs to be modified.
    • The main reason for such low resource utilization is efficient code. The average CPU utilization of the Web server is between 5% and 15%, and the memory usage is 15.5 GB, and the network is transferred from MB/s to + MB/s. The SQL Server has a CPU usage of 5% to 10%, memory usage is 365GB, and the network is transferred from S $ MB to S/MB. This can bring 3 benefits: Leave a lot of room for the upgrade, keep the service available when a critical error occurs, and quickly back up when you need it.
Learn the Knowledge

1. Why use the MS product while using Redis? What good use, do not do unnecessary system of contention, such as C # on Windows machine run best, we use Iis;redis on *nix machine can get full play, we use *nix.

2. Overkill is the strategy. The usual utilization does not mean anything, and when something specific happens, such as backup, rebuild, and so on, you can fully fill the resource usage.

3. Rugged SSD. All databases are built on top of SSDs, which gives you 0 latency.

4. understand your read-write load.

5. Efficient code means fewer hosts. Only when new projects are on-line will the hardware be added for special requirements, typically adding memory, but beyond this, efficient code means 0 of hardware additions. So we often discuss only two issues: adding new SSDs for storage, and adding hardware to new projects.

6. Don't be afraid of customization. So a complex query is used on the tag, so the required tag Engine is specially developed.

7. do only what must be done. There is no need to test because there is an active community support, for example, developers do not have to worry about the "Square Wheel" effect, if developers can make a more lightweight components, then replace it.

8. focus on hardware knowledge, such as IL. Some code uses IL instead of C #. Focus on the SQL query plan. What exactly did you do with the memory dump of the Web server. Explore, such as why a split will produce 2GB of rubbish.

9. never be bureaucratic. There are always some new tools you need, such as an editor, a new version of Visual Studio, to reduce all resistance in the ascension process.

garbage Collection driver programming. So much effort has been made to reduce the cost of garbage collection, skipping practices like TDD, avoiding abstraction layers, and using static methods. Although extreme, it does create very efficient code.

The value of efficient code is far beyond your imagination, it can make the hardware run faster, reduce the use of resources, remember to make the code easier for programmers to understand.

Original link: StackOverflow update:560m pageviews A Month, Servers, and It's All About performance (compilation/Zhonghao review/Wei Wei)

Stack Overflow Architecture

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.