Exploring the architecture of medium and large websites

Source: Internet
Author: User

I believe many it people have the experience of building their own home page, more than 10 years ago, the personal homepage is very simple, many built by FrontPage, mostly static HTML pages, up to add a little effect. But in 10 years, the progress of technology is amazing. Now, a website can never be composed of just a few HTML pages. We casually cite an example, the domestic picture website yupoo.com, in Chinarank ranking about 1000, while Alexa ranking is about 5000, this site is not big, is such a medium-sized site, with more than 60 servers, the architecture involved in the The Web server includes LIGHTTPD, Apache, and Nginx. Yupoo traffic is not large, already has 60 servers, in fact, the top several sites, have thousands of servers, how to coordinate the workload between these servers, how to unify the command and dispatch, how to maintain these server hardware is a tricky challenge.

Load Balancing:

Load balancing is an essential deployment for all medium to large sites. Obviously, the large web site tens of millions of independent IP access per day, a Web server can not afford, the site backstage must have more than one server to work together, so various load balancing technology came into being.

Earlier load balancing is DNS load balancing. The principle is very simple, as long as the domain name resolution, the multiple addresses are configured to the same domain name, load balancing is completed. When different users click on the same domain name, they actually only resolve to the user an address, so that users actually access a different Web server, reducing the burden on each server. This DNS load Balancing method, in general, is a randomly extracted address. DNS load Balancing is widely used early on, with the advantage of being easy to use, but DNS load balancing still has some problems. If one server fails, and the next refresh cycle of DNS does not occur, this can result in situations where some users cannot access the site. Another drawback is that DNS load balancing is too random, such as a period of time, many visits are pointed to the same address, while the other address is idle, resulting in a local busy bad phenomenon. And sometimes a server is running other applications while in a busy state, DNS load balancing is not known, but still the average resolution domain name.

A slightly more complex load balancer, is the reverse proxy, when the external request to the proxy server, the proxy server will then forward the request evenly to the intranet server. This method is widely used, such as the above mentioned and the net yupoo.com, the use of nginx as a reverse proxy. In addition, you can now purchase professional hardware equipment, such as plentyoffish.com (the world's largest matchmaking website) adopted the network company's web switch ServerIron as hardware load balancing, ServerIron can effectively handle 16,000, 000 concurrent connections, and can improve server load balancing and buffer conversion, such as Serveriron hardware products are not only provided by the network, because the large website budget is abundant, so you can also choose some other hardware equipment to do load balancing. Of course, let's not overlook the most basic software load Balancing--windows server has this capability.

A very simple way to load balance is to create a mirror site. such as Huajun software or Sky software, are directly using the mirror site. This approach is straightforward and saves a lot of trouble. Take Huajun Software Park as an example, when landing in the Chinese military software Park, we will have a variety of options, the choice of telecommunications, netcom and other networks; while downloading a software, the sky and the Huajun have servers in all parts of China to provide the closest download service for the user to get faster speed. However, there are some problems, that is, each time the choice is manual manually selected. All in all, this series of load balancing methods allows the load of large Web sites to be evenly distributed, without any server having too much pressure.

Cdn:

CDN (Content Delivery network), which is also one of the most essential deployments for large Web sites. The principle of CDN is not difficult to understand, is to store the content of the Web page to the user closer to the cache server, reduce routing, thereby speeding up the long-distance access speed. For example, you can easily log on to a foreign station, the speed may be very slow. Because the path of the foreign website to the domestic end client is lengthy, but if you log on to the site that deploys the CDN, for example PlentyofFish.com, you will find that the speed is very fast, and the speed difference with the domestic website access can not be judged from the perception. Depending on the location of the cache, there are some categories of CDN, and different websites will have different choices according to the specific needs. CDNs are usually provided by independent CDN vendors. To cite an example, is NetEase, my query time is February 28, 2008, we found that the same domain name has many IP addresses, which explains the home CDN deployment.

C:>nslookup www.163.com

Server:ns.lnpta.net.cn

address:202.96.64.68

Non-authoritative Answer:

Name:www.cache.split.netease.com

addresses:202.108.9.37, 202.108.9.38, 202.108.9.39, 202.108.9.51

202.108.9.52, 202.108.9.31, 202.108.9.32, 202.108.9.33, 202.108.9.34

202.108.9.36

Aliases:www.163.com

If we query a simple personal site, it is impossible to have a CDN; In addition, if interested, we can also take a closer look at a site multi-level two domain name CDN situation.

Platform Design:

Large Web sites generally have very complex user interaction with the content, must be a large number of calls to the database, so a perfect database design for large sites is very important. For example, the above mentioned plentyoffish.com, this station is actually a personal site, but the traffic is amazing, the site has a major database, two search databases, earlier, PlentyofFish.com database design problems frequently, often to the database blocked, so the webmaster spends the most time is the database optimization. Database optimization There is no special shortcut, in fact, there is rarely a perfect database building, can only be designed according to the specific needs of the database, if there is insufficient to proceed to improve. However, large sites still have some similarities, such as image storage using the image database alone, try to use static pages to reduce database calls and so on.

There are many large sites, have a very deep technical strength, can develop their own platform. For example, Google, Google.com has its own unique platform, mainly including GFS, MapReduce and BigTable. Because of the massive data storage, so the regular database call query is very scary, each query will call the Bai page, thousands of concurrent retrieval is enough to make the Google system crashes, so the Google File system will be a number of pages in a unique way to compress and then provide retrieval The entire system consists of more than 200 clusters, which are then co-operated by MapReduce. Not only Google, such as Baidu, search and so on the site also has its own unique platform for research and development.

Hardware configuration:

Is the hardware configuration of a large web site necessarily good? The answer is in the negative. For example, the world's largest website, Google, Google.com's entire architecture is based on hundreds of thousands of ordinary PC-level servers. The details of some of Google's servers are trade secrets, but according to information Google has disclosed, Google has 450,000 servers before 2006, which are very common PC-class servers, and even hard-disk interfaces are outdated IDE interfaces. This is Google's unique architecture decision, and compared to Google, Wikipedia has a very strong server, all SCSI hard disk, and the main host has up to 6 drives, more than 16GB of memory. This is easy to understand, because Google has many data centers around the world, a large number of employees, fully capable of managing the operation of tens of thousands of servers, and Wikipedia is a nonprofit organization, relying mainly on donations to survive, the number of staff is very scarce, so must be equipped with a strong server. In fact, each site should be based on their own unique situation to configure the hardware, the current 1TB SATA hard drive has entered the production stage, but 2 years ago 1TB hard drive can only be achieved through RAID, the hardware update speed is very alarming, so even if the budget is abundant, When configuring the server, you should also consider the actual use, not necessarily the best configuration.

Summarize:

The above is just a summary of the large web site, in fact, each site has its own unique side, so each of the above rules are not necessarily dead rules. For example, focus on communication twitter.com, the essence is an asynchronous chat room, so the static page is not necessary. In short, the site architecture has no laws of death, as long as the appropriate site, is a good architecture.

Exploring the architecture of medium and large websites

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.