(turn) Various large-scale web site technology architecture

Source: Internet
Author: User

Introduction in recent times, through contact with the massive data processing and search engine of many technologies, often see a lot of exquisite architecture diagram. In addition to every exclamation on the surface of each picture of the fine, more than the structure behind the hidden design ideas are amazed. Individuals these two days have been collecting the structure of the large web site design, one for feast, to understand the various types of large-scale Web site architecture and design of the wonderful, and secondly also can be used for leisure time to ponder over the experience, why not? hereby, summarize and collate such as foreign wikipedia,facebook,yahoo! , Youtube,myspace,twitter, the technical architecture of large-scale websites such as Youku (This paper focuses on the technology architecture of Youku), readers.
This paper highlights the highlights of each picture and the meaning behind it, while the illustrative text of the figure is withheld. OK, enjoy the architecture feast. Of course, if you have any suggestions or questions, please do not hesitate to correct me. Thank you.

    • 1. WikiPedia Technology Structure

WikiPedia Technical Architecture Copy @Mark Bergsma
    1. Data from Wikipedia: Spikes of 30,000 HTTP requests per second for 3Gbit traffic, nearly 375MB 350 PC servers.
    2. Geodnsa:40-line patch for bind to add geographical filters-the existent views in BIND ", takes the user to the nearest server. GeoDNS's role in the Wikipedia architecture is, of course, determined by the nature of Wikipedia's content--for every country, every region.
    3. Load balancer: LVS, see:
.
    • 2. Facebook Architecture

The architecture of the Facebook search feature

Careful readers will be able to find that the upper-secondary architecture diagram appears in this article: stealing from a few architectural drawings of the ocean data processing experience. This article and the largest difference is that the previous article only a few, this article series will have hundreds of architectural drawings, let you enjoy.

    • 3. Yahoo! Mail Architecture

Yahoo! Mail Architecture

The Yahoo! Mail Architecture deploys Oracle RAC to store Meta data related to Mail services.

    • 4. Twitter technology architecture

Twitter's overall architecture design diagram

The Twitter platform is broadly comprised of twitter.com, mobile phones, and third-party applications, as shown (where traffic is primarily based on mobile phones and third-party sources):

Caching plays an important role in large Web projects, after all, the closer the data gets to the faster the CPU accesses. Is the twitter cache architecture diagram:

For the cache system, you can also look at the following image:

    • 5. Google APP Engine Technology architecture

The architecture diagram of Gae

In simple terms, the above Gae architecture is divided into three parts: Front end, datastore and service group.

    1. The front end consists of 4 modules: Front end,static files,app Server,app Master.
    2. Datastore is a distributed database based on BigTable technology, although it can also be understood as a service, but because it is the only place for the entire app engine to store persistent data, it is a very central module in App engine. The specifics will be discussed in the next article.
    3. The entire service group includes many services for app server calls, such as memcache, graphics, users, URL crawls, and task queues.
    • 6. Amazon Technology architecture

Amazon's dynamo Key-value storage architecture Diagram

Some readers may not be familiar with Amazon, which is now the world's largest online retailer and the world's 2nd largest Internet company. And it was just a small online bookstore. OK, below, let's see how it's structured.
Dynamo is Amazon's key-value-mode storage platform with good usability and scalability and good performance: 99.9% response times in read-write access are within 300MS. Divide the data by hashing algorithms commonly used in distributed systems and put them on different node. The read operation is also based on the hash value of key to find the corresponding node. Dynamo uses the consistent hashing algorithm, node is no longer a definite hash value, but a hash value range, key hash value falls in this range, then clockwise along the ring, encountered the first node is required.
Dynamo the improvement of the consistent hashing algorithm is that it is placed on the ring as a node is a set of machines (rather than memcached a machine as node), a set of machines that ensure data consistency through a synchronization mechanism.
is a distributed storage system that readers can observe:

Amazon's cloud architecture diagram looks like this:


Amazon's cloud architecture diagram
    • 7, the technology structure of Youku

From the beginning, Youku built a set of CMS to solve the front page display, the separation between the modules is more appropriate, the front-end extensibility is very good, the separation of the UI, so that development and maintenance becomes very simple and flexible, is the Youku front-end Module call Relationship:

In this way, the module, method, and params are determined to call the relatively independent modules, which is very concise. Is Youku's front-end local architecture diagram:

Youku's database architecture has also undergone many twists and turns, from a single MySQL server (Just Running) to a simple MySQL master-slave copy, SSD Optimizer, vertical sub-Library, and horizontal sharding sub-Library.

  1. Simple MySQL master-slave replication.
    MySQL master-slave replication to solve the database read and write separation, and good to improve the read performance, the original diagram is as follows:
    The process of its master-slave replication is as follows:
    However, master-slave replication also brings a number of other performance bottlenecks:
    1. Write cannot be extended
    2. Write cannot be cached
    3. Replication delay
    4. Lock Example Rise
    5. Table becomes larger, cache rate drops
    The problem has to be solved, which results in the following optimization scheme.
  2. MySQL Vertical partitioning
    If the business is cut enough to be independent, it will be a good idea to put different business data into different database servers, and in case one of the services crashes, it will not affect the normal operation of other business, and also play a role of load shunt, greatly improving the throughput of the database. The database schema diagram after vertical partitioning is as follows:
    However, although the business is already independent enough, but some of the business is more or less connected to each other, such as the user, basically will be associated with each business, and this partitioning method does not solve the problem of the single-sheet data explosion, so why not try the level sharding it?
  3. MySQL horizontal shard (sharding)
    This is a very good idea, the user according to a certain rule (by ID hash) group, and the group of users of the data stored in a database shard, that is, a sharding, so as the number of users increased, as long as the simple configuration of a server, the schematic is as follows:
    How to determine the Shard of a user, you can build a user and shard corresponding data table, each request from this table to find the user's Shard ID, and then from the corresponding shard query the relevant data, as shown: But, Youku is how to solve the cross-shard query, this is a difficult point, According to the introduction of Youku is to try not to cross the Shard query, really not through the multi-dimensional Shard index, distributed search engine, the worst is a distributed database query (this is very cumbersome and consumption performance).
  4. Caching policies
    Seemingly large systems have a "cache" feeling, from the HTTP cache to the memcached memory data cache, but Youku means that there is no memory cache, for the following reasons:
    1. Avoid memory duplication and avoid memory locks
    2. If the Big Brother notice to take a video off, if in the cache is more troublesome
    and squid write () user process space consumption, Lighttpd 1.5 of AIO (asynchronous I/O) read files to user memory resulting in inefficient.
    But why would our visit to Youku be so smooth that the video loading speed of Youku is notch above compared to potatoes? This is due to Youku established a relatively perfect content distribution network (CDN), it is distributed across the country in a variety of ways to ensure that users in the vicinity of the visit-the user click on the video request, Youku will be based on the user's location, the closest to the user, the best service status video server address to the user, So that users can get a fast video experience. This is the advantage that CDN brings, the nearest visit.

Note: 1, this segment of the technology structure of Youku is organized here:/system-analysis/20110918/264936.html; 2, recommend a very good site:http://www.dbanotes.net/). From the hundreds of architectural plans to learn a bit of large-scale website construction experience (above), finish.
PostScript This article finally finished, from yesterday to organize this article motives, to this morning to find the computer online and not, and then to the moment in the Internet bar to complete this article. I really appreciate what it's like to be a tech fanatic. Large Web site architecture is a very strong real thing, and you and I may be a temporary outside to see a lively layman just now. However, it doesn't matter, small fish shrimp can still swim in the ocean, not to mention the future can also grow into big fish sharks.

(turn) Various large-scale web site technology architecture

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.