Large-scale Web Service Development Technology)

Source: Internet
Author: User

Some time ago, I read the book "large-scale Web Service Development Technology". Today, I used the afternoon to repeat it again, and wrote down the key points, so that I can review and forget it. As you are still familiar with data compression and full-text retrieval, note content mainly involves the first five chapters, followed by sporadic notes. This article may be helpful to the following: 1. I am interested in this book, but I am not sure about the content. 2. I have some experience in large-scale Web Services.

Hatena Scale (April 2010)

  • 150 million registered users, uu1900w/month
  • Requests: Billions/month
  • Traffic during busy hours: 850 Mbps (excluding images)
  • There are 600 hardware (servers) and over 1300 hosts through virtualization technology.
  • The number of logs per day (GB to TB)

System Growth Strategy

  • Management and design for minimal start and foreseeable changes

Balance efficiency and quality

  • Meeting, standardization, documentation, agility, etc.

The text database of GB-level (10 million) does not need to be indexed, and a SELECT query fails to be completed within S.

Speed Difference between memory and Hard Disk

  • Addressing: the former is 10 to times the latter.
  • Transmission speed (bus): the former is 7.5 g/s, and the latter is 58 m/s.

Find the bottleneck of a single machine (use the performance of a single machine, do not speculate, to measure)

  • SAR or vmstat check whether the problem is CPU or I/O
  • CPU Problems

    • Top or SAR: Check whether the process is a system process or a user program.
    • PS: view the process status and CPU usage, and determine the problematic Process
    • Use strace or oprofile to locate the specific problem of a program or process
  • I/O Problems

    • Frequent page switching ---> insufficient memory

      • PS: view the memory used by the program
      • Can programs be improved to reduce memory usage?
      • No hardware or distributed
    • If none, the cache memory may be insufficient.

      • Increase memory
      • Add machines, distributed

CPU expansion is more convenient, but Io load expansion is more difficult

  • View the actual load: Load average in the top result (1 minute 5 minutes 15 minutes)
  • Check whether the IO load is too high or the CPU load is too high: Sar-P (multi-core)

 

Focus on processing large-scale technologies

  • Try to implement it in the memory to achieve distribution and use locality
  • Complexity of algorithms, O (n) --> O (logn) has a qualitative leap
  • Data Compression and Retrieval Technology

Cache Mechanism

  • Page Cache)

    • All modern operating systems use virtual memory.
    • The memory allocated by the kernel will be left as much as possible, and no disk access is required next time, that is, the page cache.
    • The operating system caches pages, that is, the minimum unit of virtual memory.
    • Increasing the memory can increase the cache hit rate and reduce the IO load.
  • Sar command

    • Sar-R to view the current memory status (the physical memory size used by kbbuffered cache)
    • SAR: 1 3 times per second, 3 times in total
    • Sar-u view CPU usage
    • Sar-Q view average load
    • Sar-r view memory usage

Policies for Io Load Reduction

  1. Increase cache, that is, add memory
  1. Extend to multiple servers
  1. 2. The actual cache hit rate may not be increased (the data on each machine remains unchanged), and data needs to be split (partition ).

Partition-distributed by locality

  • In RDBMS tables
  • Split from data center

    • A-C Server 1, D-F Server 2 ......
    • Consistent hash
  • Divide the system into different "islands" by Purpose"

    • Crawler
    • Image API
    • General access

Basic O & M rules based on page Cache

  • When the operating system is started, do not immediately put it into the production environment. You must first push it to read all the files.
  • Performance testing should be conducted after Cache Optimization

 

Database horizontal scaling policy

Flexible application operating system cache

  • Minimize database size to physical memory
  • Consider the impact of Table Structure Design on database size

Create an index

  • B + tree
  • Improves search efficiency (logn) and disk track retrieval times
  • MySQL explain command to help check whether the index is valid

MySQL distributed

  • Master/Slave design (Master update, slave read)

    • Scalable query (slave)
    • But the master cannot be expanded (Data Consistency)

      • However, in most cases, 90% of Web applications read queries.
      • Master load can be solved through database/table sharding or replacement.

MySQL Partition

  • Place tables that are not closely related to each other on different machines
  • Avoid join operations on tables on different machines

    • Use Inner join or where... in...
    • Use custom ORM
  • Cost of Partition

    • O & M becomes complex, failure rate increases, and cost increases
  • Minimum number of machines required for redundancy

    • Four -- one master and three slave
    • One of the three slave servers is used to provide continuous services, one server may fail, and the other server may be used to replicate instances after a fault occurs.

 

Three important aspects of Web Service Infrastructure

  1. Low cost and high efficiency

    • 100% reliability should not be pursued
  1. Design is important

    • Scalability and response time
  1. Development speed is very important

    • Web services are often added or changed to provide flexible resources for services.

 

Traffic limit that one server can handle

  • Hatena Standard Server: 4-core CPU, 8 GB memory;
  • Performance: thousands of requests per minute during busy hours
  • If 4-core cpu x 2, 32 GB memory

    • ~ 200wpv/month

Optimization

  • Load Control

    • Server monitoring tools

 

Redundancy and System Stability

Master Redundancy

  • Multi-Master

    • Generally, there are two servers in the active/standby structure.
    • One is active and the other is standby.
    • The two servers act as Server Load balancer instances. One server writes data to the other server, and two-way replication is enabled.
    • When standby detects active downtime through vrrp, standby automatically becomes active and becomes a new master.
    • The active server has a virtual IP address, which is assigned to which machine and which machine is the active master.
    • Disadvantages

      • There is still a risk of inconsistency

System Stability

  • Resources should be kept at a certain margin, only about 70%
  • Remove unstable factors (automate as much as possible)

    • Limit SQL Load
    • Reduce Memory leakage and restart automatically
    • Self-discipline control of abnormal behaviors

      • Automatic dos judgment
      • Automatic Restart
      • Automatic termination time query

 

Virtualization Technology

  • Benefits

    • Scalability

      • Minimize additional overhead
      • Dynamic migration
    • Cost effectiveness

      • Improve resource utilization
      • Improve O & M flexibility

        • Software-level Host Control
    • High Availability

      • Environment isolation
  • Hatena's virtualization application

    • Xen (centos 5.2, xen 3.0.3) + local disk construction LVM
    • Replacing IPMI with hypervisor
    • Use paravirtualization)
    • Control Resource Consumption

      • High Load warning
      • Adjust Load
    • Detection tool: monit
    • Improve resource utilization

      • CPU idle --> Web Server
      • I/O idle --> Database Server
      • Memory idle --> Cache Server
      • Avoid consumption tends to combine the same
    • Additional virtualization overhead

      • CPU: 2% ~ 3%
      • Memory: 10%
      • Network Performance: 50%
      • Io performance: 5%

SSD lifetime

  • Loss degree indicator: Media wearout indicator in the value of S. m.a. R. T ---> smartctl command
  • It took about nine months for hatena to write the most frequently-written SSD.

 

Network demarcation point

  • 1 Gbps, that is, 30 WPPS, is the limit of the PC Router (1 Gbps is the limit of Gigabit Ethernet, and 30wpps is the limit of the Linux kernel)

    • Countermeasure: purchase expensive finished routes for multiple PC routes
  • 500 hosts, the limit of subnet and ARP tables

    • Countermeasure: hierarchical network

 

RDBMS or K-V Storage

  • Judgment basis

    • Average data size
    • Maximum data size
    • Increase frequency of new data
    • Update frequency
    • Deletion frequency
    • Access Frequency
  • MyISAM vs. InnoDB

    • MyISAM

      • Advantages

        • A table without update or delete can also be quickly inserted.
        • Start and Stop very quickly
        • The table can be moved or renamed directly from the file system.
      • Disadvantages

        • Abnormal stop may damage the table
        • Transactions are not supported.
        • Update, delete, and insert locks the table (except the append data), and the performance is poor in applications with many updates.
      • Applicable scenarios

        • Only data appending
        • Use select count (*)
    • InnoDB

      • Advantages

        • Support transactions
        • Exception stop recovery
        • Execute row lock when data is updated
      • Disadvantages

        • Slow Start and Stop
        • Table operations are performed through the database.
      • Applicable scenarios

        • High update frequency
        • Transactions required
  • Distributed K-V

    • Memcached
    • Tokyotyrant

 

Cache System

  • Squid

    • Used as multiple (reverse) proxies such as HTTP, https, and FTP
    • Access Control and authentication
  • Varnish

    • High-performance HTTP accelerator
    • Flexible language settings
    • Basically all executed in memory
    • Faster than squid
  • Nginx, pound ......
  • Note when the cache server goes online
    • If one Server Load balancer instance fails, the other server is unable to withstand the load.

      • Backup Server
    • Even if you have enough servers, pay attention to them.

      • New servers (or just started) need to be pushed, and the traffic increases from small to large

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.