Introduction
In recent times, through the contact with the massive data processing and search engine of many technologies, often see a lot of exquisite architecture diagram. In addition to every exclamation on the surface of each picture of the fine, more than the structure behind the hidden design ideas are amazed. Individuals these two days have been collecting the structure of the large web site design, one for feast, to understand the various types of large-scale Web site architecture and design of the wonderful, and secondly also can be used for leisure time to ponder over the experience, why not? hereby, summarize and collate such as foreign wikipedia,facebook,yahoo! , Youtube,myspace,twitter, the technical architecture of large-scale websites such as Youku (this paper focuses on the technology architecture of Youku), readers.
This paper highlights the highlights of each picture and the meaning behind it, while the illustrative text of the figure is withheld. OK, enjoy the architecture feast. Of course, if you have any suggestions or questions, please do not hesitate to correct me. Thank you.
1. WikiPedia Technology Structure
WikiPedia Technical Architecture Copy @Mark Bergsma
Data from Wikipedia: Spikes of 30,000 HTTP requests per second 3Gbit of traffic, nearly 375MB
350 PC servers.
Geodnsa:40-line patch for bind to add geographical filters-the existent views in BIND ", takes the user to the nearest server. GeoDNS's role in the Wikipedia architecture is, of course, determined by the nature of Wikipedia's content--for every country, every region.
Load balancer: LVS, see:
2. Facebook Architecture
The architecture of the Facebook search feature
Careful readers will be able to find that the upper-secondary architecture diagram appears in this article: stealing from a few architectural drawings of the ocean data processing experience. This article and the largest difference is that the previous article only a few, this article series will have hundreds of architectural drawings, let you enjoy.
3. Yahoo! Mail Architecture
Yahoo! Mail Architecture
The Yahoo! Mail Architecture deploys Oracle RAC to store Meta data related to Mail services.
4. Twitter technology architecture
Twitter's overall architecture design diagram
The Twitter platform is broadly comprised of twitter.com, mobile phones, and third-party applications, as shown (where traffic is primarily based on mobile phones and third-party sources):
Caching plays an important role in large Web projects, after all, the closer the data gets to the faster the CPU accesses. Is the twitter cache architecture diagram:
For the cache system, you can also look at the following image:
5. Google APP Engine Technology architecture
The architecture diagram of Gae
In simple terms, the above Gae architecture is divided into three parts: Front end, datastore and service group.
The front end consists of 4 modules: Front end,static files,app Server,app Master.
Datastore is a distributed database based on BigTable technology, although it can also be understood as a service, but because it is the only place for the entire app engine to store persistent data, it is a very central module in App engine. The specifics will be discussed in the next article.
The entire service group includes many services for app server calls, such as memcache, graphics, users, URL crawls, and task queues.
6. Amazon Technology architecture
Amazon's dynamo Key-value storage architecture diagram
Some readers may not be familiar with Amazon, which is now the world's largest online retailer and the world's 2nd largest Internet company. And it was just a small online bookstore. OK, below, let's see how it's structured.
Dynamo is Amazon's key-value-mode storage platform with good usability and scalability and good performance: 99.9% response times in read-write access are within 300MS. Divide the data by hashing algorithms commonly used in distributed systems and put them on different node. The read operation is also based on the hash value of key to find the corresponding node. Dynamo uses the consistent hashing algorithm, node is no longer a definite hash value, but a hash value range, key hash value falls in this range, then clockwise along the ring, encountered the first node is required.
Dynamo the improvement of the consistent hashing algorithm is that it is placed on the ring as a node is a set of machines (rather than memcached a machine as node), a set of machines that ensure data consistency through a synchronization mechanism.
is a distributed storage system that readers can observe:
Amazon's cloud architecture diagram looks like this:
Amazon's cloud architecture diagram
7, the technology structure of Youku
From the beginning, Youku built a set of CMS to solve the front page display, the separation between the modules is more appropriate, the front-end extensibility is very good, the separation of the UI, so that development and maintenance becomes very simple and flexible, is the Youku front-end Module call Relationship:
In this way, the module, method, and params are determined to call the relatively independent modules, which is very concise. Is Youku's front-end local architecture diagram:
Youku's database architecture has also undergone many twists and turns, from a single MySQL server (Just Running) to a simple MySQL master-slave copy, SSD Optimizer, vertical sub-Library, and horizontal sharding sub-Library.
1. Simple MySQL master-slave replication.
MySQL master-slave replication to solve the database read and write separation, and good to improve the read performance, the original diagram is as follows:
The process of its master-slave replication is as follows:
However, master-slave replication also brings a number of other performance bottlenecks:
Write cannot be extended
Write cannot be cached
Replication delay
Lock Example Rise
Table becomes larger, cache rate drops
The problem has to be solved, which results in the following optimization scheme.
2. mysql Vertical partitioning
If the business is cut enough to be independent, it will be a good idea to put different business data into different database servers, and in case one of the services crashes, it will not affect the normal operation of other business, and also play a role of load shunt, greatly improving the throughput of the database. The database schema diagram after vertical partitioning is as follows:
However, although the business is already independent enough, but some of the business is more or less connected to each other, such as the user, basically will be associated with each business, and this partitioning method does not solve the problem of the single-sheet data explosion, so why not try the level sharding it?
3. mysql level shard (sharding)
This is a very good idea, the user according to a certain rule (by ID hash) group, and the group of users of the data stored in a database shard, that is, a sharding, so as the number of users increased, as long as the simple configuration of a server, the schematic is as follows:
How to determine the Shard of a user, you can build a user and shard corresponding data table, each request first from this table to find the user's Shard ID, and then from the corresponding shard query the relevant data, as shown in:
is how to solve the cross-shard query, this is a difficult point, according to the introduction of Youku is to try not to cross Shard query, really not through multi-dimensional Shard index, distributed search engine, the worst is a distributed database query (this is very cumbersome and consumption performance).
Caching policies
Seemingly large systems have a "cache" feeling, from the HTTP cache to the memcached memory data cache, but Youku means that there is no memory cache, for the following reasons:
Avoid memory duplication and avoid memory locks
If the Big Brother notice to take a video off, if in the cache is more troublesome
and squid write () user process space consumption, Lighttpd 1.5 of AIO (asynchronous I/O) read files to user memory resulting in inefficient.
But why would our visit to Youku be so smooth that the video loading speed of Youku is notch above compared to potatoes? This is due to Youku established a relatively perfect content distribution network (CDN), it is distributed across the country in a variety of ways to ensure that users in the vicinity of the visit-the user click on the video request, Youku will be based on the user's location, the closest to the user, the best service status video server address to the user, So that users can get a fast video experience. This is the advantage that CDN brings, the nearest visit.
Well-known Internet company website structure diagram