Wikipedia Technical Architecture Study Notes

Source: Internet
Author: User
Tags mediawiki

Wikipedia is a global multilingual encyclopedia collaboration program based on Wiki technology. It is also a network encyclopedia presented on the Internet, its purpose and purpose is to provide free encyclopedias for all mankind-a dynamic, free and global body of knowledge written in the language of their choice.

Wikipedia's experience in IT architecture is of great reference for us to build websites because the information provided by Wikipedia is very detailed and conclusive. The following is a summary of the Wikipedia architecture.

1. Wikipedia related data

  • The peak value is 30 thousand HTTP requests per second.Request

  • 3 GB per secondBitTraffic, almost375 MB

  • 350 PCsServer

Wikipedia data comes from Wikimedia ubunture.pdf

2. System Architecture


3. geodns

This geodns may be novel. In fact, the principle is very simple. geodns is a 40-line program written for BIND, allows users to access the Web server closest to the region when DNS resolution is taken into account.

4. Use LVS to achieve Load Balancing

Wikipedia
LVS is a project initiated by Dr. Zhang Wenyu to perform load balancing. It is also a rare pride of the Chinese in the Open Source Field. An old LVS maintenance problem is monitoring. Wikipedia technicians use
Pybal.

5. Use Lighttpd as an image server

Lighttpd is an open-source software led by Germans. Its fundamental goal is to provide a secure, fast, compatible, and flexible web server environment for high-performance websites. It features low memory overhead, low CPU usage, good performance, and rich modules. Lighttpd is a lightweight Web
Server. FastCGI,
CGI, auth, output compress, URL rewriting,
Alias and other important functions. Apache is popular because of its rich functions. Many functions are implemented in Lighttpd. This is very important for Apache users, because the migration to Lighttpd must face these problems.

6. Use mediawiki Software

The mediawiki application layer is optimized to the extreme. Use a relatively small overhead method to locate code hotspots. For more information, see real-time performance reports. See the figure tree to see where the bottleneck is. Another important experience is to discard complex algorithms, expensive queries, and mediawiki features that may bring excessive overhead.

7. A large number of caches)

The first key factor for successful Wikipedia websites is cache. CDN (actually called cache) distributes content to different continents and uses Squid as the reverse proxy. The database cache uses memcached and 30 servers, each of which is 2 GB. Cache is used as much as possible for all possible data, but they also remind that the cache overhead is not always the smallest. It is used as much as possible, but cannot be used excessively.

Squid ,~ 17 machines, P4, 3 ~ 4 GB memory, 1u rack server, fedoracore3; squid mostly meets the needs of Unlogged users, with a cache hit rate of 75%, effectively reducing Apache load. Server Load balancer is achieved by the round-robin DNS method.

8. Use a MySQL database cluster

The DB used by mediawiki is a common extension solution of MySQL. MySql in Web 2.0 technology. They are also using it. Copy, read/write splitting... application in dBLoad Balancing on

Loadbalancer. php provides a good reference.

9. Web Server

Apache, 49, P4, 1 ~ 4 GB memory, 1u rack server, fedoracore2; run php and use the turck PHP cache system to improve efficiency. These servers share the working directory with NFS for synchronous operation.

References
Wikipedia Meta

Source Wikipedia server, Chinese

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.