Talk about how to achieve scalability of a large web site architecture __ Scalable

Source: Internet
Author: User
Tags failover message queue web services

The scalability of the Web site is designed to allow for the ability to continuously expand and enhance the system's capabilities with minimal impact on existing systems.

Here is a detailed description of the "extensibility" and "scalability" concepts that are easy to confuse:

Extensibility

Performance: infrastructure does not need to change frequently, applications are less dependent or coupled, and can respond quickly to changes in requirements. It is open to expansion, closed to modifications. Architecture design takes into account the scalability of future functions, so when the system adds new functionality, there is no need to modify the structure and code of the existing system.

Scalability of

Refers to the ability of the system to increase (or reduce) the processing of a business by increasing (or reducing) its own resource size. If this increase or decrease is proportional, it can be called linear scalability. In general, the cluster is used to increase the number of servers in order to improve the overall business throughput capacity of the system. 1 building an extensible Web site architecture

An important yardstick for measuring the pros and cons of a development framework, design pattern, or programming language is its ability to make software development processes and software products less coupled.

Because low coupling systems are easier to scale and easier to reuse, they also make development processes and maintenance easier. But how to decompose each module of the system, how to define the interface of each module, how to reuse, and how to combine different modules to construct a complete system is the most challenging part of software design.

The greatest value of software architects is the ability to decompose a large system into N low coupling sub modules that contain horizontal business modules and vertical basic technology modules. This ability comes from professional skills and experience, business scene understanding, human nature and knowledge of the world.

The core idea of building scalable Web site architecture is modularization, and on this basis, the coupling between modules is reduced and the reusability of modules is improved.

It is possible to divide the software into several low coupling and independent component modules, and then aggregate them into a complete system in the form of message passing or dependency calls between these modules.

These modules can be deployed on separate servers in a distributed manner. The coupling relationship between modules is physically separated, which can further reduce the coupling.

The distributed deployment of modules can be aggregated in the following ways:
* Distributed message queues.
* Distributed services. 1 Reducing coupling using distributed Message Queuing

If there is no direct call relationship between the modules, the new or modified modules have the least impact on the other parts, and this extensibility is naturally better. 1.1 Event-driven architecture

The event-driven architecture refers to the transmission of event messages between low coupling modules, the retention of loose coupling between modules, and the completion of communication between modules through event messages. The most common way to implement event-driven architectures is to use distributed message queues.

Message Queuing is based on the publish-subscribe mode work, the message sender publishes the message, and one or more message recipients subscribe to the message. The message sender processes the message after it has been sent to the distributed message queue, and then gets the message from the message queue by the message Subscriber. For a new business, you can subscribe to a message if you are interested in it, which has no effect on the original system or business, thereby enabling you to design your system in an extensible and scalable state.

The message receiver can also construct a message that is received, define a new message type, and then send the message to the recipient who subscribed to the message type. So the event-driven architecture based on message objects can be a series of processes.

Because the sender of the message can return without waiting, the system has better response time and, at peak access, the message can be staged in the message queue, easing the storage load pressure on the database. 1.2 Distributed Message Queuing

Queues are a first-in, first-out data structure that lets you deploy Message Queuing to separate servers. Applications use Message Queuing through the remote access interface for message access to implement distributed asynchronous calls:

The more popular distributed message queues are Apache ActiveMQ.

Because the data on the Message Queuing server can be treated as instant processing, on scalability , after we join the new server in the distributed Message Queuing cluster, we just need to notify the producer server to change the server list for Message Queuing. O (∩_∩) o~

On availability , if the memory queue is full, the message is written to disk so that when the message push module finishes processing the messages in the memory queue, the messages on the disk are loaded into the queue to continue processing.

To avoid Message Queuing server downtime causing message loss, the message is stored on the message's producer server, so that messages are actually processed by the messaging consumer server before they are deleted. If the Message Queuing server is down, the producer server selects other servers within the cluster of distributed Message Queuing servers to post messages.

Distributed Message Queuing can be complex, such as supporting an ESB (Enterprise service Bus) and SOA (service-oriented architecture). It can also be simple, such as using MySQL as a distributed message queue: The producer of the message writes the message as a record to the database, and the consumer queries the database (sorted by the time stamp of the record-writing table), which is a distributed message queue. With sophisticated MySQL operations, can also achieve a high availability and performance metrics Oh O (∩_∩) o~ 2 use distributed services to build reusable business platforms

The distributed service can reduce the coupling of the system through the interface, and the different subsystems invoke the service through the same interface description.

With the increasingly complex functions of the site, the system will gradually develop into a Big Mac, which aggregates a large number of applications and service components, such a system will give development, maintenance, deployment of great trouble:
* Compile, deploy difficult.
* Code Branch Management Difficult: Reusable code modules are shared by multiple teams to maintain changes, so there will always be conflicts when code merges.
* Run out of database connections: Assuming that an application has 10 database connections, then an application with hundreds of server clusters creates thousands of connections on the database.
* New business difficulties. New business in a system that is constantly cut and messed up. You're joking, O (∩_∩) o~

So we're going to split and separate the modules and reduce the coupling of the systems:
* Vertical Split-splits a large application into multiple small applications. If the new business is more independent, design it and deploy it as a standalone WEB application.
* Split-split-the reuse of the business, independent deployment as a distributed service, the new business only need to call these distributed services, you can quickly build an application system. Even if the business logic in the module changes, as long as the interface is consistent, it will not affect other modules.

The vertical split is simpler, by combing the business, the less relevant business is stripped, making it an independent WEB application. Horizontal splitting not only needs to identify reusable business, design service interface and standardize the dependencies between services, but also need a perfect distributed service management framework. 2.1 Web Service distributed Services

Web Service used to be one of the most fashionable terms in the development field of enterprise application Systems, which integrates heterogeneous systems and builds distributed systems:

Service providers describe the service interface content they can provide through the WSDL (Web services Description language,web Service Description Language), and then the registry uses the UDDI (Universal Description, Discovery, and integration, unified description, Discovery, and integration) services provided by the Publishing service provider. After a service requester retrieves a service from the registry, it communicates with the service provider through SOAP (Simple Object access Protocol, the simply-accessible protocol), using the service.

Although the Web Service has the mature technical specification and the realization, but has the following shortcoming:
1. The bloated registration and discovery mechanism.
2. Inefficient means of XML serialization.
3. Higher-cost HTTP remote communications.
4. Complex deployment and maintenance tools.

These problems make it difficult for web Service to meet the requirements of large Web sites for high performance, high availability, easy deployment, and easy maintenance. 2.2 Requirements for distributed services for large web sites

The distributed service framework needs to be able to support the following features:
* Load balancing-for service requesters to use a configurable load-balancing algorithm to access hot services (such as logins or commodity services, which are deployed on a cluster).
* Failover-reusable services are invoked by multiple applications, and once the service becomes unavailable, it can affect the availability of many applications. So even the services that are rarely accessed require a clustered deployment. When a service is not available, the Distributed service framework switches to other service instances to ensure overall high availability.
* Efficient remote communication
* Integration of heterogeneous systems
* Incremental evolution and iteration of the application minimum intrusion-distributed service Framework support services (which support centralized deployment and distributed deployment) for the service module.
* Versioning-Web services are non-disruptive, so the distributed Services Framework requires multiple versions of the support service, while the service provider upgrades the new release interface and continues to support older versions of the service until the interface that the requester invokes is upgraded before the old version of the service is closed.
* Real-time monitoring-monitors service providers and callers ' metrics to provide operational support and operations. 2.3 Distributed Service Framework Design

Large Web sites require a simpler, more efficient distributed service framework to build their SOA (Service Oriented architecture, services-oriented architecture). At present, there are more successful implementation cases of open source distributed service Framework is Alibaba's Dubbo.

The service consumer uses the service through the interface, the interface loads the concrete service through the proxy, can be the local code, can also be the remote service, therefore to the application intrusion is small.

The client module loads the list of service providers through the service registry (the service provider automatically registers its own list of service interfaces to the service registry after startup), and then sends the service call request to a service provider's server according to the configured load balancing policy. If a service invocation fails, the client module automatically selects a server that can provide the same service from the list of service providers, i.e., automatic failover to ensure high availability of the service.

Dubbo uses the NIO communication framework, so it has high network communication performance. 4 Scalable data Structures

The columnfamily (column family) technology using NoSQL databases (such as Cassandra) enables scalable data structure design. It is a storage format of sparse matrices for the column family.

You can create a table by specifying only the name of the columnfamily. A field can be specified when the data is written, in this way, a table can contain millions of fields. This allows the application of data structures to be arbitrarily extended. You only need to specify any field names and values to query. 5 using open platform to build ecological circle

Users only get the value they want, will be willing to use the site's services, such sites have the meaning of existence. But a website cannot meet the needs of all users after all.

Users will not pay for the value of the site, so the site must provide more value-added services to make money. According to the long tail effect, the greater the number of value-added services, the more species, the more profit. But a website can develop its own value-added services is also limited.

Large Web sites to better serve users, for them to develop more value-added services, will be the site's internal services packaged into interfaces open for external third-party developers to use, this platform is called an open platform. Third-party developers can use these open interfaces to develop applications (such as apps) or Web sites to provide users with more value. Websites, users, third party developers
Interdependent, forming an ecological circle.

An open platform is an interface between the internal and external interactions of a Web site. External will face a large number of third-party developers, the internal face of the site is a large number of business services. The following is an open platform architecture:

API interface: A set of APIs exposed to developers, which can be in the form of RESTful, WebService, RPC, and so on. Protocol conversion: Converts the input of various APIs into an identifiable form of internal services and encapsulates the return information of the internal service into the API format. Security: In addition to identity, access control and other means, but also the access to the classification limit to ensure that the platform resources are used by the third party fair and reasonable use, but also to ensure that the site's own internal services will not be dragged down by external applications. Audit: Monitoring access to third party applications and billing. Routing: Maps the various access routes of an open platform to specific internal services. Process: Organize a loosely set of services into a new context-related service that provides interfaces for developers to use.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.