Large Web site technology architecture-Getting started grooming "turn"

Source: Internet
Author: User
Tags http redirect sql injection attack asymmetric encryption csrf attack

Lists the concepts involved in large-scale website architectures, with a simple explanation

Objective
    • This article is the "Large Web site architecture design" (HAE) a book, similar to the text version of the "Mind map"
    • The full text focuses on the "performance, availability, scalability, scalability, security" of the Five Elements
    • Performance, availability, scalability these elements are basically related to the application server, cache server, storage server these aspects
Overview
    • Three latitude: Evolution, patterns, elements
    • Five elements: performance, availability, scalability, scalability, security
Evolutionary history

The legend can refer to the evolution of large site architectures:

    1. initial phase of the site architecture : A server, which has all the resources of applications, databases, files, and so on. such as LAMP architecture
    2. Application and Data service separation : Three servers (different hardware resources), application server, file server and database server, respectively
    3. using caching to improve Web site performance : There are two types, local caches cached on application servers, and remote caches of specialized distributed cache servers
    4. Improve web site concurrency with Application server clusters : Distribute access requests to any machine in the application server cluster through a load Balancer dispatch server
    5. database reads and writes separates : The database uses the master-slave hot preparation, the application server accesses the main database when writes the data, the main database synchronizes the data update through the master-slave replication mechanism to the slave database. Application servers use specialized data access modules for transparent applications
    6. accelerating site response with reverse proxy and CDN : Both fundamentals are caching. The reverse proxy is deployed in the central room of the website and the CDN is deployed in the network provider's room
    7. using Distributed file systems and distributed Database Systems : The last means of database splitting, and more commonly, business sub-Libraries
    8. using NoSQL and search engines : Better support for scalable distribution
    9. Business split : Split the entire Web site into different applications, each application is independently deployed for maintenance, and applications are linked via hyperlinks/message queues for data distribution/access to the same data storage system
    10. Distributed Services : Public service extraction for standalone deployment

The values of evolution

    • The core value of a large Web site architecture is the flexibility to respond to the needs of the site
    • Driving the development of large-scale website technology is the main force of the website's business development

Misunderstanding

    • Follow the big company's solution
    • For technical and technical
    • Trying to solve all the problems with technology
Schema mode

The key to the pattern is the repeatability of the pattern

    • layering : Horizontal slicing
    • Split : Portrait slicing
    • distributed : The primary purpose of tiering and partitioning is to facilitate distributed deployment of segmented modules. Common scenarios:
      • Distributed Applications and Services
      • Distributed static resources
      • Distributed Data and storage
      • Distributed computing
      • Distributed configuration, distributed locks, distributed files, etc.
    • cluster : Multiple servers deploy the same application to form a cluster that provides services through load balancing devices
    • caching : Data is put to the nearest location to speed up processing speed, improve performance first means, can speed up access, reduce back-end load pressure. Use cache Two prerequisites : 1. Data access hotspot is unbalanced; 2. Data is valid for a certain period of time and will not expire soon
      • Cdn
      • Reverse Proxy
      • Local cache
      • Distributed cache
    • Asynchronous : Designed to decouple the system. The asynchronous architecture is a typical consumer-producer pattern with the following characteristics:
      • Improve system availability
      • Speed Up website access
      • Eliminate concurrent access spikes
    • Redundancy : High availability is achieved. Cold and hot backup of the database
    • Automation : Including release process automation, automated code management, automated testing, automated security testing, automated deployment, automated monitoring, automated alarms, automated failover, automated failure recovery, automated degradation, automated allocation of resources
    • security : password, mobile check code, encryption, verification code, filtering, risk control
Core elements

Architecture is "the highest level of planning, difficult to change the rules." The main focus is on five elements:

    • Performance
    • Availability (availability)
    • Scalability (Scalability)
    • Extensibility (Extensibility)
    • Security
Architecture

These five elements are summarized in turn below

Performance

The performance of the test indicators mainly include:

    • Response Time: The time required for an application to perform an operation
    • Concurrency: Refers to the number of requests that the system can process simultaneously
    • Throughput: The number of requests processed by the system within a unit of time
    • Performance counters: Some data metrics that describe the performance of a server or operating system

Performance test methods:

    • Performance testing
    • Load test
    • Pressure test
    • Stability Testing

Performance optimization, according to the site hierarchy structure, can be divided into three categories:

    • Web Front End performance optimization
      • Browser Access Optimization
        • Reduce HTTP Requests
        • Using browser caching
        • Enable compression
        • CSS is placed on top of the page, JavaScript is placed at the bottom of the page
        • Reduce Cookie transmission
      • CDN acceleration: Essentially a cache that caches static resources in general
      • Reverse Proxy
        • Secure Your site
        • Accelerating WEB requests with the Configure caching feature
        • Achieve load Balancing
    • Application Server Performance Optimizations : The main means are cache, cluster, asynchronous
      • Distributed caching ( site performance optimization First Law: Optimization consider using caching to optimize performance )
      • Asynchronous operations ( Message Queuing, peak shaving )
      • Using the cluster
      • Code optimization
        • Multithreading (designed to be stateless, using local objects, concurrent access to resources using locks)
        • Resource reuse (Singleton, object Pool)
        • Data
        • Garbage collection
    • Storage server Performance Optimizations
      • Mechanical HDD vs. solid-State Drive
      • B + Tree vs. LSM tree
      • RAID vs. HDFS
Highly Available
  • Highly Available Web site architecture: The purpose is to ensure that the service is still available when the server hardware fails, that the data is still saved and can be accessed, and that the primary means of redundant backup and failover of data and services
  • Highly available applications: Notable features are the stateless nature of the application
    • Failover of stateless services through load balancing
    • Session Management of Application server cluster
      • Session Copy
      • Session binding
      • Use cookies to record Session
      • Session Server
  • Highly Available services: stateless services that can use a load-balanced failover strategy, as well as the following strategies
    • Tiered management
    • Timeout settings
    • asynchronous invocation
    • Service downgrade
    • Power-Equal design
  • High-availability data: The primary means is data backup and failover mechanisms
    • CAP principle
      • Data Consistency (consisitency)
      • Data Availability (availibility)
      • Zoning resistance (Partition tolerance)
    • Data backup
      • Cold: The downside is that data eventual consistency and data availability are not guaranteed
      • Hot spare: Divided into asynchronous standby and synchronous
    • Fail-over: consists of the following three parts
      • Failure confirmation
      • Access transfer
      • Data recovery
  • Software quality assurance for highly available websites
    • Website release
    • Automated testing
    • Pre-release validation
    • Code control
      • Backbone development, Branch Publishing
      • Branch development, Backbone Publishing
    • Automated Publishing
    • Grayscale Publishing
  • Website Operation Monitoring
    • Monitoring data acquisition
      • User behavior log capture (server side and client)
      • Server performance Monitoring
      • Run a data report
    • Monitoring management
      • Alarm system
      • Fail-over transfer
      • Automatic Graceful downgrade
Elasticity of

Large sites "large" refers to:

    • User level: Large number of users and extensive access
    • Functional aspects: Numerous functions, many products
    • Technical level: Web sites need to deploy a large number of servers

Scalability is divided into the following aspects

    • Scalable design of website architecture
      • Physical separation of different functions for scaling
        • Longitudinal separation (split after delamination)
        • Horizontal Separation (separation after business division)
      • Single function scaling with cluster size
    • Scalability design of Application server cluster
      • HTTP REDIRECT Load Balancing
      • DNS Domain name resolution load Balancing
      • Reverse Proxy load Balancing (application-tier load balancing at the HTTP protocol level)
      • IP load Balancing (completion of data distribution in kernel processes)
      • Data Link Layer load Balancing (data Link layer modified MAC address, triangle transfer mode, LVS)
      • Load Balancing algorithm
        • Polling (Round Robin, RR)
        • Weighted polling (Weighted Round Robin, WRR)
        • Stochastic (Random)
        • Minimum link (Least Connections)
        • Source Address hash (source Hashing)
    • Scalable design of distributed cache cluster
      • Memcached access model of distributed cache cluster
        • Memcached client (including API, routing algorithm, server list, communication module)
        • Memcached server Cluster
      • Memcached scalability challenges for distributed cache clusters
      • Distributed cache consistency hash algorithm (consistent hash ring, virtual layer)
    • Scalability design of data storage service cluster
      • Design of scalability of relational database cluster
      • The design of the scalability of NoSQL database
Can be extended

The "opening and shutting principle" of system architecture design level

    • Building a scalable Web site architecture
    • Reduce coupling with distributed Message Queuing
      • Event-driven architecture (driven Architecture)
      • Distributed Message Queuing
    • Build a reusable business platform with distributed services
      • WEB Service and Enterprise-class distributed services
      • Features of distributed services for large web sites
      • Distributed Service Framework Design (Thrift, Dubbo)
    • Extensible data structure (e.g. columnfamily design)
    • Using open platform to construct website Ecological Circle
Security Architecture of the website

XSS attacks and SQL injection attacks are two of the most important means of Web application attack, and also include csrf,session hijacking and other means.

  • attacks and defenses
    • XSS attacks: cross-site scripting attacks (crosses sites script)
      • reflective
      • persistent
    • XSS Defensive means
      • disinfect (that is, escape from certain HTML dangerous characters)
      • httponly
    • injection attack
      • SQL injection Attack
      • OS injection attack
    • Inject defense
      • avoid being guessed database table structure information
      • disinfection
      • parameter bindings
      • CSRF attack: cross-site request forgery (crosstab site requests forgery)
      • CSRF Defense: The primary means is to identify the requestor identity
        • form Token
        • Verification Code
        • Referer Check
      • other attacks and vulnerabilities
        • Error Code
        • HTML comments
        • file uploads
        • path traversal
      • Web Application Firewall (modsecurity)
      • Web site security vulnerability scanning
  • Information encryption technology and key security management
    • One-way hash encryption: information of different input lengths to obtain fixed-length output by hashing
      • Irreversible, non-plaintext
      • Added salt to increase safety
      • Small changes in input cause the output to be completely different
    • Symmetric encryption: Encryption and decryption using the same key
    • Asymmetric encryption
      • Information transfer: Public key encryption, private key decryption
      • Digital Signature: Private key encryption, public key decryption
    • Key Security management: Information security transmission is ensured by the key, the improvement means are:
      • Put the keys and algorithms on a separate server
      • Put the decryption algorithm in the application system, the key is placed on the stand-alone server
  • Information filtering and anti-spam
    • Text match
    • Classification algorithm
    • Blacklist

Large Web site technology architecture-Getting started grooming "turn"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.