Analysis of domestic large-scale portal site architecture-static web site architecture

Source: Internet
Author: User
Tags aliases ip number web services nslookup to domain

"Domestic large-scale portal site architecture analysis" is divided into two parts, the original author was written in 2004. But it still has a good reference meaning to the present large-scale website structure.

The architecture of large community websites

Analysis on the structure of large portal site

How to do a mirrored server

Realization scheme of Intelligent load balancing for domain name double line

System architecture of large high concurrent high load Web sites

Sina and Sohu in the domestic visibility is no one knows nobody. Their daily hits are more than tens of millions. Such a large number of visits for Sina and Sohu How to make use of limited resources to enable netizens to obtain the fastest speed becomes the first prerequisite, after all, now the network company has left the stage of burning money, began a benign development, each sum of money must be echoed to hit the line. On the other hand, technicians have to rack their brains and not allow users to always be inaccessible, or very slow to access. So even if there are good editors, good sales, they will be difficult to sell ads, waiting for their will be closed. None of this has happened, of course, because their technicians have made full use of the resources available and have played them to the extreme. In the final analysis, squid is used as the Web cache server, while Apache provides real Web services behind squid. Of course, the use of such a schema must ensure that most of the home page is a static page. This requires a programmer's cooperation to convert the page to a static page before giving it back to the client. All right, basic architecture. So, here's how I guessed it and the specific architecture:

One of the Magic weapons: nslookup

Actual combat:

Nslookup www.sina.com.cn

server:ns-px.online.sh.cn

address:202.96.209.5

Non-authoritative Answer:

Name:taurus.sina.com.cn

addresses:61.172.201.230, 61.172.201.231, 61.172.201.232, 61.172.201.233

61.172.201.221, 61.172.201.222, 61.172.201.223, 61.172.201.224, 61.172.201.225

61.172.201.226, 61.172.201.227, 61.172.201.228, 61.172.201.229

Aliases:www.sina.com.cn, jupiter.sina.com.cn

Here you can see Sina on the home page to use so many IP, the beginning of someone will think really sina deep pockets. Actually, keep looking down:

Nslookup news.sina.com.cn

server:ns-px.online.sh.cn

address:202.96.209.5

Non-authoritative Answer:

Name:taurus.sina.com.cn

addresses:61.172.201.228, 61.172.201.229, 61.172.201.230, 61.172.201.231

61.172.201.232, 61.172.201.233, 61.172.201.221, 61.172.201.222, 61.172.201.223

61.172.201.224, 61.172.201.225, 61.172.201.226, 61.172.201.227

Aliases:news.sina.com.cn, jupiter.sina.com.cn

Careful people can find news this channel IP number and the same as the first page, and IP is exactly the same. That is, these IP in Sina DNS name is called taurus.sina.com.cn, those IP is the domain of a record. and news,sports,jczs.news ... It's all CNAME records. Use DNS to do automatic polling. Don't believe it, another one, on the sports channel is good:

Nslookup sports.sina.com.cn

server:ns-px.online.sh.cn

address:202.96.209.5

Non-authoritative Answer:

Name:taurus.sina.com.cn

addresses:61.172.201.222, 61.172.201.223, 61.172.201.224, 61.172.201.225

61.172.201.226, 61.172.201.227, 61.172.201.228, 61.172.201.229, 61.172.201.230

61.172.201.231, 61.172.201.232, 61.172.201.233, 61.172.201.221

Aliases:sports.sina.com.cn, jupiter.sina.com.cn

Others can try it on their own. Okay, let's look at the Sohu:

Nslookup www.sohu.com

server:ns-px.online.sh.cn

address:202.96.209.5

Non-authoritative Answer:

Name:pagegrp1.sohu.com

addresses:61.135.132.172, 61.135.132.173, 61.135.132.176, 61.135.133.109

61.135.145.47, 61.135.150.65, 61.135.150.67, 61.135.150.69, 61.135.150.74

61.135.150.75, 61.135.150.145, 61.135.131.73, 61.135.131.91, 61.135.131.180

61.135.131.182, 61.135.131.183, 61.135.132.65, 61.135.132.80

Aliases:www.sohu.com

--------------------------------------------

Nslookup news.sohu.com

server:ns-px.online.sh.cn

address:202.96.209.5

Non-authoritative Answer:

Name:pagegrp1.sohu.com

addresses:61.135.150.145, 61.135.131.73, 61.135.131.91, 61.135.131.180

61.135.131.182, 61.135.131.183, 61.135.132.65, 61.135.132.80, 61.135.132.172

61.135.132.173, 61.135.132.176, 61.135.133.109, 61.135.145.47, 61.135.150.65

61.135.150.67, 61.135.150.69, 61.135.150.74, 61.135.150.75

Aliases:news.sohu.com

As with Sina, just from the surface to see Sohu IP number more than Sina IP number, then Sohu on each channel with more than Sina server. Of course not, because a server can bind multiple IP, so can not from the number of IP to determine how many servers used.

From these experiments, we can basically see that Sina and Sohu are using the same technology for the channel and so on, that is squid to listen to these IP 80 ports, and the real Web server to listen to another port. There is no difference in the sense of the user, compared to the way the Web server is directly connected to the client, such a way to significantly conserve bandwidth and servers. The speed of user access will also feel faster.

Of course, also can not because of a few domain name IP flatly they use squid to do front-end cache, you can directly access one of the IP to try, the results as shown:

This can prove that Sina is in DNS set a lot of IP to point to domain name sqsh-19.sina.com.cn, while all the other channels of the same nature are just sqsh-19.sina.com.cn an alias, with CNAME specified. The DNS settings should be this way, and then the server listens to 80 ports via Squid 2.5.stable5 (the latest stable version is STABLE6). These are based on a number of information analysis, should be basically correct. Here are some of my personal guesses:

Its real Web server is also listening on port 80 because one of the squid profiles is:

Httpd_accel_port 80

If you set the other port number (for example, 88), the error message on the map becomes

While trying to retrieve the url:http://61.172.201.19:88

Tool 2:nmap Scanner: can be used to check what port the server has opened.

I'm using Nmap to scan a ip:61.172.201.19 in Sina for analysis.

bash-2.05$ Nmap 61.172.201.19

Starting Nmap 3.50 (http://www.insecure.org/nmap/) at 2004-07-30 13:31 GMT

Interesting ports on 61.172.201.19:

(The 1657 ports scanned but not shown below are in state:filtered)

PORT State SERVICE

22/TCP Open SSH

80/TCP Open http

Nmap Run completed--1 IP address (1 host up) scanned in 73.191 seconds

Can see he opened only 2 ports, 80 port is just what we said Squid opened, which has just been verified. and 22 ports are used to SSH remote connection, mainly the SA is used to remotely operate the server with very high security methods.

Tool 3:lynx or other tools and programs that can read HTTP headers: Just look at the example better understand:

http/1.0 OK

Date:fri, June 05:49:47 GMT

server:apache/2.0.49 (Unix)

Last-modified:fri, June 05:48:16 GMT

Accept-ranges:bytes

Vary:accept-encoding

Cache-control:max-age=60

Expires:fri, June 05:50:47 GMT

content-length:180747

Content-type:text/html

Age:37

X-cache:hit from sqsh-230.sina.com.cn

Connection:close

The above is the feedback information of Sina's HTTP header. There are a lot of valuable things oh: For example, the Apache behind it is used 2.0.49, also set the expiration time of 2 minutes. Last modified time. These are loaded at the time of compiling Apache, especially last-modified need a small change of source code-at least I do.

Sum up

Sina's architecture should be the front squid, according to the current server 2u,2g memory in general each server can run at least 4 squid2.5stable5. In this way, it uses 4 servers for 16 IP. The back layer is apache2.0.49 should use 2 units. All the 2 possible uses are private IP, specified in the Hosts file by the Squid server in front. The specific implementation method I will organize my experiment document next time: The Apache Htdocs may have one or 2 disk arrays for NFS. The Apache Mount NFS server should be read-only, and then there will be a server turnstile used as an editor server to edit people to update the article. This server should have writable Permissions for NFS server.

----This is a complete set of Sina's use of the program, of course, many are by guessing, I did not have any communication with Sina's technical staff (because one does not know), otherwise I will not write out. Other sohu,163 should also have such an architecture.

Final statement: This is just some static page composition channel of a structure, Sina there are many other servers, what downloads, blogs, search engines, pictures, forums, etc. are not in this architecture.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.