Wikipedia is a global multilingual encyclopedia collaboration program based on Wiki technology. It is also a network encyclopedia presented on the Internet, its purpose and purpose is to provide free encyclopedias for all mankind-a dynamic, free and global body of knowledge written in the language of their choice.
Wikipedia's experience in IT architecture is of great reference for us to build websites because the information provided by Wikipedia is very detailed and conclusive. The following is a summary of the Wikipedia architecture.
1. Wikipedia related data
The peak value is 30 thousand HTTP requests per second.Request
3 GB per secondBitTraffic, almost375 MB
350 PCsServer
Wikipedia data comes from Wikimedia ubunture.pdf
2. System Architecture
3. geodns
This geodns may be novel. In fact, the principle is very simple. geodns is a 40-line program written for BIND, allows users to access the Web server closest to the region when DNS resolution is taken into account.
4. Use LVS to achieve Load Balancing
Wikipedia
LVS is a project initiated by Dr. Zhang Wenyu to perform load balancing. It is also a rare pride of the Chinese in the Open Source Field. An old LVS maintenance problem is monitoring. Wikipedia technicians use
Pybal.
5. Use Lighttpd as an image server
Lighttpd is an open-source software led by Germans. Its fundamental goal is to provide a secure, fast, compatible, and flexible web server environment for high-performance websites. It features low memory overhead, low CPU usage, good performance, and rich modules. Lighttpd is a lightweight Web
Server. FastCGI,
CGI, auth, output compress, URL rewriting,
Alias and other important functions. Apache is popular because of its rich functions. Many functions are implemented in Lighttpd. This is very important for Apache users, because the migration to Lighttpd must face these problems.
6. Use mediawiki Software
The mediawiki application layer is optimized to the extreme. Use a relatively small overhead method to locate code hotspots. For more information, see real-time performance reports. See the figure tree to see where the bottleneck is. Another important experience is to discard complex algorithms, expensive queries, and mediawiki features that may bring excessive overhead.
7. A large number of caches)
The first key factor for successful Wikipedia websites is cache. CDN (actually called cache) distributes content to different continents and uses Squid as the reverse proxy. The database cache uses memcached and 30 servers, each of which is 2 GB. Cache is used as much as possible for all possible data, but they also remind that the cache overhead is not always the smallest. It is used as much as possible, but cannot be used excessively.
Squid ,~ 17 machines, P4, 3 ~ 4 GB memory, 1u rack server, fedoracore3; squid mostly meets the needs of Unlogged users, with a cache hit rate of 75%, effectively reducing Apache load. Server Load balancer is achieved by the round-robin DNS method.
8. Use a MySQL database cluster
The DB used by mediawiki is a common extension solution of MySQL. MySql in Web 2.0 technology. They are also using it. Copy, read/write splitting... application in dBLoad Balancing on
Loadbalancer. php provides a good reference.
9. Web Server
Apache, 49, P4, 1 ~ 4 GB memory, 1u rack server, fedoracore2; run php and use the turck PHP cache system to improve efficiency. These servers share the working directory with NFS for synchronous operation.
References
Wikipedia Meta
Source Wikipedia server, Chinese