Large data volume, high Concurrency website solution

Source: Internet
Author: User

With the speed of informatization of large-scale it enterprises in China, the data volume and traffic volume of most applications increase sharply.

, large enterprise Web sites are under pressure for performance and high data access, and for storage, security, and information retrieval

and other aspects have put forward higher requirements ...

In this article, I would like to use a few foreign large IT companies and Web site success stories, from the perspective of Web technicians to explore

How to actively respond to the expansion of large domestic web sites (mainly technical aspects, and less involved in management and

Sales and other aspects).

First, the success of foreign large-scale IT website

(a) MySpace

Today, MySpace has become the king of all the world's most-visited community sites. Despite first-class and marketing and management experience

Nature is the primary factor in every it enterprise's success, but in this section we abandon this and focus on

This paper explores how MySpace has responded to the technical aspects of the crisis in the face of a number of system expansions.

First generation architecture-add more Web servers

MySpace's initial system was small, with only two Web servers (sharing the workload of processing user requests) and a

Database server (all data is stored in this one place). At that time, the Dell dual CPU, 4G memory, was used

System. In the early stages, MySpace was basically about increasing the number of Web servers to deal with the problem of user explosion. But

By the early 2004, after the number of MySpace users grew to 500,000, their database servers were already on the go

The

Second generation architecture-Increase database server

Unlike adding Web servers, adding a database is not as easy. If a site is supported by multiple databases

, the designer must consider how to keep the pressure on multiple databases under the premise of ensuring data consistency.

MySpace runs on three SQL Server database servers-a primary and all new data is raised to it

And then copied to the other two, and the other two database servers are fully supplying data to the user for a blog

and the personal data bar display. This approach works well over time--as long as the database server is increased, increasing

The number of users and the amount of traffic that can be increased.

This time, the database schema is designed in vertical split mode, and different databases serve different functions of the site.

, such as logins, user profiles, and blogs. The vertical segmentation strategy facilitates the sharing of access pressures across multiple databases, when user requirements

When new features are added, MySpace only needs to invest in a new database to support it. After the account reaches 2 million, Myspa

Ce also switches to SAN (storage area network) from the storage device directly interacting with the database server-with high bandwidth

, a specially designed network connects a large number of disk storage devices, and the database is connected to a SAN. This measure greatly

Improves system performance, uptime, and reliability. However, when the user continues to increase to 3 million, the vertical sub-

Cutting strategy has become difficult to maintain.

Third generation architecture-go to distributed computing architecture

Over and over, eventually, MySpace shifted its gaze to the distributed computing architecture-its physical distribution of many services

The whole must be logically equivalent to a single machine. With the database, you can no longer detach the app as you did in the past.

Different databases, and the entire site must be considered an application. Now, in the database model

There is only one user table, and data that supports blogs, profiles, and other core features is stored in the same database.

Now that all the core data is logically organized into a database, MySpace must find new ways to

Load sharing-Obviously, a single database server running on normal hardware is powerless. This time, no longer

By dividing the database by site function and application, MySpace began to divide its users by each fellow group and then

are stored in separate instances of SQL Server, respectively. Currently, each database server in MySpace is actually shipped

Rows of two SQL Server instances, which means that each server serves approximately 2 million users. According to MySpace's technical people

In the future, this model can be used to divide the architecture into smaller granularity, thus optimizing load sharing.

Fourth generation architecture-Microsoft solutions

In the early 2005, with an account of 9 million, MySpace began writing ASP. NET programs in Microsoft's C #. Upon receiving

After some success, MySpace began a massive migration to ASP.

When the account reaches 10 million, MySpace is again experiencing storage bottlenecks. The introduction of Sans solves some of the early sexual

The site's current requirements have begun to periodically exceed the San's I/O capacity-that is, from the disk storage system

Limit speed of reading and writing data.

Fifth generation architecture-increase the data cache layer and go to SQL Server 2005 that supports 64-bit processors

In the spring of 2005, MySpace accounts reached 17 million, and MySpace enabled new strategies to reduce storage

System pressure, that is, increasing the data cache layer--between the Web server and the database server, whose sole function is to

A copy of the frequently requested data object is established in memory, so that no database access can be supplied to the Web application

Data.

In the middle of 2005, when the number of service accounts reached 26 million, MySpace switched because of our memory cravings.

To a 64-bit processor-enabled SQL Server 2005 that is also in beta testing. Upgrade to SQL Server 2005 and

After 64-bit Windows Server 2003, MySpace each server equipped with 32G of memory, after 2006 again will be equipped with

The standard is increased to 64G.

In fact, MySpace's Web servers and databases are still frequently overloaded, and its users frequently encounter "

External error "and" Site offline maintenance "and so on, they have to complain in the forum not stop ...

MySpace is in this way constantly refactoring site software, databases and storage systems, just step to today.

In fact, MySpace has successfully solved many of the system extensibility issues, and there is considerable experience worth borrowing

Authentication The MySpace system architecture has remained relatively stable so far, but its technicians are still supporting SQL Server

Keep the same number of simultaneous connections, and so on to do the best thing possible.

(ii) Amazon

Amazon Bookstore is undoubtedly a milestone in the development of e-commerce. 2000 to the present, the World Network industry reign.

Amazon has become the number one representative of the dotcom bubble. Today, when this "biggest bubble" is used several easy-to-change numbers

Turn yourself into a solid it giant.

The successful experience of Amazon's development process is that it creatively explores every aspect of e-commerce.

, including the construction of the system platform, program preparation, website establishment, distribution system and so on. With Amazon family PayPal

"The most powerful weapon in the real-world store is the lot, lots, lots, and for me," said Pegasus.

The most important three things are technology, technology, technology. ”

(c) EBay

ebay is a world-renowned auction site, and Kevin Pasgrav, head of communications at ebay, says, "ebay becomes

The most important reason for work is the management and service of the company. ”

The mysteries of its success can be enumerated as follows:

① Dare for the world first-in the internet is not popular era, ebay first entered the field of online auctions;

② 's unique "zero inventory" from virtual malls is another important reason for ebay's success. The

The company's core business does not have any inventory risk, all goods are provided by the customer, it is only responsible for providing virtual

Auction platform-Network and software. Therefore, ebay's financial statements do not appear on the "inventory costs" and "

Storage costs "and so on.

③ since the founding of ebay, it has been adhering to the two "golden principles": the construction of virtual community, to the Netizen to home

To ensure the stable and safe operation of the website.

Second, the domestic large-scale website development time several suggestions

Starting from this section, we will combine the deep lessons and successful experiences of the large it websites at home and abroad in technology expansion.

To explore how the Web 2.0 era, which is just starting today, should increase the amount of data traffic that the domestic web site will face

Expansion), and put forward a number of strategies and recommendations for reference.

(iv) Building a scientific system structure

Building large commercial web sites is never as simple as building a small, regular web site, and requires a rigorous

Software Engineering management perspective of careful planning, there are steps to develop logically. For large websites,

The technology used is extremely extensive, from hardware to software, programming languages, databases, Web servers, firewalls

and other areas have a high demand, is not the original simple HTML static site can be compared. With the famous

Yahoo!, for example, each of their large web site projects requires a large number of corresponding professionals involved.

(v) static page

Do not underestimate the pure static HTML page! In fact, in many cases, HTML often means "the most efficient

, consumption is minimal, so we try to make the pages on our site as static pages as possible. However, the

In a large number of content and frequently updated sites, we can not implement all manually, so that the corresponding automation could be developed

Update tools, such as our Common Information Release system CMS. News frequency like the various portal sites we visit frequently

Channel, and even their other channels, are managed and implemented through an information distribution system. Information dissemination System can

To achieve the simplest information input automatically generated static pages, but also with channel management, rights management, automatic capture, etc.

functionality, it is essential for a large web site to have an efficient, manageable cms.

(vi) Storage issues

Storage is also a big problem, one is the storage of small files, such as tablets, and the other is a large file

such as the index of the search engine.

As you know, for Web servers, whether it's Apache, IIS, or other containers, pictures are the most resource-intensive

, so it is necessary to separate the image from the page, which is basically the strategy of large-scale web site, he

We have independent image servers, and even many picture servers. This architecture can reduce the provision of page access

The requested server system pressure, and can ensure that the system does not crash due to picture problems, in the application server and

On the image server, different configuration optimizations can be made to ensure higher system consumption and execution efficiency.

(vii) Database technology-cluster and library table hashing

For large web sites, it is necessary to use a large database server. However, in the face of large visits

When asked, the database bottleneck will still appear, when a database will soon be unable to meet the application, so

We need to use database clustering or library table hashing techniques.

In the context of database clustering, many database vendors have their own solutions, Oracle, Sybase, SQL

Server and so on have a good plan, commonly used MySQL provides master/slave also similar scheme. Therefore, you

What kind of database to use, refer to the corresponding solution to implement it.

The database clusters mentioned above are subject to the type of database used in terms of architecture, cost, and extensibility

Constraints, we need to consider improving the system architecture from an application perspective, where the library table hash is a common

and the most effective solution. We install the business and application or function modules in the application to make the database

Separate, different modules correspond to different databases or tables, then follow a certain strategy to a page or function into

Smaller database hashes, such as user tables, are hashed by user ID, allowing for low-cost elevation

The performance of the system and has a good scalability. A ready-made example in this regard is Sohu. Its forum is

The use of such a framework, the Forum users, settings, posts and other information database separation, and then the post,

Users hash databases and tables by plate and ID, and can eventually be configured in a configuration file to allow

The system adds a low-cost database at any time to supplement system performance.

(eight) caching policy

This is not only a low-level cache technology-related programming, from the overall architectural perspective, in-depth study of Web services

and database server, and finally the low-level buffer technology programming. The different we

b servers, database servers, and web programming languages all have their own different buffering strategies. For example, database storage aspects

, the proactive caching mechanism in SQL Serve 2005, the cache group technology for Oracle data, Hibernate's

Cache includes session cache and sessionfactory cache; for Web server, Apache provides its own

Cache module can also be cached using an additional squid module, both of which can effectively improve the APA

Che's access response capability, IIS buffer technology; As for Web development languages, the caching techniques used are much different

, for example, in ASP. NET 2.0, two strategies for caching application Data and Caching service page output are presented, both of which

The storage technology is independent but not mutually exclusive, PHP has pear cache module, and so on.

(ix) Mirror

Mirroring is often used by large web sites to improve performance and data security, and mirroring technology can address different

The difference in user access speed caused by network access providers and geographies. In the detail technical aspect of mirroring, this is not too deep

, there are many professional ready-made solution architectures and products to choose from. There are also inexpensive ways to implement software, such as Li

Tools such as Rsync on the Nux.

(10) Load balancing

Load balancing will be the ultimate solution for large web sites to address high-load access and a large number of concurrent requests.

Load balancing technology has developed over the years, there are many professional service providers and products to choose from, based on the lamp solution side

The lighttped+squid of the case is quite a good way to solve the load balancer and accelerate the system effectively.

(11) Hardware four-layer switching

The fourth layer Exchange uses the header information of the third and fourth layer packets to identify the business flow based on the application interval and

The entire interval segment of the business flow is assigned to the appropriate application server for processing. The fourth layer switching function is like a virtual IP,

Point to the physical server. It transmits a variety of business compliance protocols, with HTTP, FTP, NFS, Telnet, or its

His agreement. These operations are based on physical servers and require complex load balancing algorithms. In the IP World, business class

Type is determined by the terminal TCP or UDP port address, and the application interval in layer fourth switching is determined by the source and terminal IP addresses

, TCP, and UDP ports are determined together.

In the hardware four-layer Exchange product field, there are some well-known products can be selected, such as Alteon, F5, etc., these

Products are expensive, but value for money, can provide very good performance and very flexible management capabilities. Yahoo China

Nearly 2000 servers used three or four alteon to get it done.

(12) Software four layer Exchange

Once you know the principle of the hardware four layer switch, the software four layer exchange based on the OSI model is should be shipped

, this solution achieves the same principle, but the performance is slightly worse. But to meet a certain amount of pressure or swim

More than a blade.

A typical strategy for using load balancing is to build squid clusters on the basis of software or hardware four-layer switching

, this approach is adopted on many large websites including search engines, which are low-cost, high-performance and

Strong extensibility, it is easy to add or subtract nodes to the architecture at any time.

(13) Software investment issues

It is reported that, in addition to some listed companies and particularly big well-known large companies, there are few enterprises in the cost

The purchase cost of genuine software is considered. This kind of thinking is very likely to bring nightmares to the Chinese Internet. If some companies

Faced with the difficulties of software funding, it is entirely possible to consider using the open source world's lamp solution (linux+a

Pache+mysql+perl, PHP, or Python web programming language); otherwise, with China's accession to the WTO scope of

Continue to expand, piracy will inevitably become more stringent. Therefore, "ignoble" will inevitably boomerang.

In addition, with the increasing network bandwidth, WEB 2.0 technology will inevitably affect almost every corner of the network world

。 Therefore, how to accumulate technical personnel to carry out technological research and further strengthen the security guard has become an increasingly serious

Issues, it is advisable to include the company's agenda as soon as possible.

Iv. Summary

A symbol of the real rational development of e-commerce in China is that a large number of traditional enterprises have actually started to use the interconnection

Network to deal with business, doing business, and now such a wave has begun. Beijing Distribution Group, United Sina, 6688

. com and other units jointly launched the online virtual bookstore-The new bookstore is such a sign.

With the increasing network bandwidth, along with the network concept and Web 2.0 technology, a variety of business-to-business, business-to-business, C

E-commerce models such as 2C are likely to be integrated into a variety of large-scale business sites. Therefore, as a public

Rescue, as a "white knight" in the face of the crisis, how to deal with mass storage, mass access problems, the sea

The problem of volume information retrieval, increasingly serious security problems, etc., has been urgently needed.

Large data volume, high Concurrency website solution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.