Large-scale Web Service Development Technology)

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Some time ago, I read the book "large-scale Web Service Development Technology". Today, I used the afternoon to repeat it again, and wrote down the key points, so that I can review and forget it. As you are still familiar with data compression and full-text retrieval, note content mainly involves the first five chapters, followed by sporadic notes. This article may be helpful to the following: 1. I am interested in this book, but I am not sure about the content. 2. I have some experience in large-scale Web Services.

Hatena Scale (April 2010)

150 million registered users, uu1900w/month

Requests: Billions/month

Traffic during busy hours: 850 Mbps (excluding images)

There are 600 hardware (servers) and over 1300 hosts through virtualization technology.

The number of logs per day (GB to TB)

System Growth Strategy

Management and design for minimal start and foreseeable changes

Balance efficiency and quality

Meeting, standardization, documentation, agility, etc.

The text database of GB-level (10 million) does not need to be indexed, and a SELECT query fails to be completed within S.

Speed Difference between memory and Hard Disk

Addressing: the former is 10 to times the latter.
Transmission speed (bus): the former is 7.5 g/s, and the latter is 58 m/s.

Find the bottleneck of a single machine (use the performance of a single machine, do not speculate, to measure)

SAR or vmstat check whether the problem is CPU or I/O

CPU Problems
- Top or SAR: Check whether the process is a system process or a user program.
- PS: view the process status and CPU usage, and determine the problematic Process
- Use strace or oprofile to locate the specific problem of a program or process

I/O Problems
- Frequent page switching ---> insufficient memory
  - PS: view the memory used by the program
  - Can programs be improved to reduce memory usage?
  - No hardware or distributed
- If none, the cache memory may be insufficient.
  - Increase memory
  - Add machines, distributed

CPU expansion is more convenient, but Io load expansion is more difficult

View the actual load: Load average in the top result (1 minute 5 minutes 15 minutes)

Check whether the IO load is too high or the CPU load is too high: Sar-P (multi-core)

Focus on processing large-scale technologies

Try to implement it in the memory to achieve distribution and use locality

Complexity of algorithms, O (n) --> O (logn) has a qualitative leap

Data Compression and Retrieval Technology

Cache Mechanism

Page Cache)
- All modern operating systems use virtual memory.
- The memory allocated by the kernel will be left as much as possible, and no disk access is required next time, that is, the page cache.
- The operating system caches pages, that is, the minimum unit of virtual memory.
- Increasing the memory can increase the cache hit rate and reduce the IO load.

Sar command
- Sar-R to view the current memory status (the physical memory size used by kbbuffered cache)
- SAR: 1 3 times per second, 3 times in total
- Sar-u view CPU usage
- Sar-Q view average load
- Sar-r view memory usage

Policies for Io Load Reduction

Increase cache, that is, add memory

Extend to multiple servers

2. The actual cache hit rate may not be increased (the data on each machine remains unchanged), and data needs to be split (partition ).

Partition-distributed by locality

In RDBMS tables

Split from data center
- A-C Server 1, D-F Server 2 ......
- Consistent hash

Divide the system into different "islands" by Purpose"
- Crawler
- Image API
- General access

Basic O & M rules based on page Cache

When the operating system is started, do not immediately put it into the production environment. You must first push it to read all the files.
Performance testing should be conducted after Cache Optimization

Database horizontal scaling policy

Flexible application operating system cache

Minimize database size to physical memory

Consider the impact of Table Structure Design on database size

Create an index

B + tree

Improves search efficiency (logn) and disk track retrieval times

MySQL explain command to help check whether the index is valid

MySQL distributed

Master/Slave design (Master update, slave read)
- Scalable query (slave)
- But the master cannot be expanded (Data Consistency)
  - However, in most cases, 90% of Web applications read queries.
  - Master load can be solved through database/table sharding or replacement.

MySQL Partition

Place tables that are not closely related to each other on different machines

Avoid join operations on tables on different machines
- Use Inner join or where... in...
- Use custom ORM

Cost of Partition
- O & M becomes complex, failure rate increases, and cost increases

Minimum number of machines required for redundancy
- Four -- one master and three slave
- One of the three slave servers is used to provide continuous services, one server may fail, and the other server may be used to replicate instances after a fault occurs.

Three important aspects of Web Service Infrastructure

Low cost and high efficiency
- 100% reliability should not be pursued

Design is important
- Scalability and response time

Development speed is very important
- Web services are often added or changed to provide flexible resources for services.

Traffic limit that one server can handle

Hatena Standard Server: 4-core CPU, 8 GB memory;

Performance: thousands of requests per minute during busy hours

If 4-core cpu x 2, 32 GB memory
- ~ 200wpv/month

Optimization

Load Control
- Server monitoring tools

Redundancy and System Stability

Master Redundancy

Multi-Master
- Generally, there are two servers in the active/standby structure.
- One is active and the other is standby.
- The two servers act as Server Load balancer instances. One server writes data to the other server, and two-way replication is enabled.
- When standby detects active downtime through vrrp, standby automatically becomes active and becomes a new master.
- The active server has a virtual IP address, which is assigned to which machine and which machine is the active master.
- Disadvantages
  - There is still a risk of inconsistency

System Stability

Resources should be kept at a certain margin, only about 70%

Remove unstable factors (automate as much as possible)
- Limit SQL Load
- Reduce Memory leakage and restart automatically
- Self-discipline control of abnormal behaviors
  - Automatic dos judgment
  - Automatic Restart
  - Automatic termination time query

Virtualization Technology

Benefits
- Scalability
  - Minimize additional overhead
  - Dynamic migration
- Cost effectiveness
  - Improve resource utilization
  - Improve O & M flexibility
    - Software-level Host Control
- High Availability
  - Environment isolation

Hatena's virtualization application
- Xen (centos 5.2, xen 3.0.3) + local disk construction LVM
- Replacing IPMI with hypervisor
- Use paravirtualization)
- Control Resource Consumption
  - High Load warning
  - Adjust Load
- Detection tool: monit
- Improve resource utilization
  - CPU idle --> Web Server
  - I/O idle --> Database Server
  - Memory idle --> Cache Server
  - Avoid consumption tends to combine the same
- Additional virtualization overhead
  - CPU: 2% ~ 3%
  - Memory: 10%
  - Network Performance: 50%
  - Io performance: 5%

SSD lifetime

Loss degree indicator: Media wearout indicator in the value of S. m.a. R. T ---> smartctl command

It took about nine months for hatena to write the most frequently-written SSD.

Network demarcation point

1 Gbps, that is, 30 WPPS, is the limit of the PC Router (1 Gbps is the limit of Gigabit Ethernet, and 30wpps is the limit of the Linux kernel)
- Countermeasure: purchase expensive finished routes for multiple PC routes

500 hosts, the limit of subnet and ARP tables
- Countermeasure: hierarchical network

RDBMS or K-V Storage

Judgment basis
- Average data size
- Maximum data size
- Increase frequency of new data
- Update frequency
- Deletion frequency
- Access Frequency

MyISAM vs. InnoDB
- MyISAM
  - Advantages
    - A table without update or delete can also be quickly inserted.
    - Start and Stop very quickly
    - The table can be moved or renamed directly from the file system.
  - Disadvantages
    - Abnormal stop may damage the table
    - Transactions are not supported.
    - Update, delete, and insert locks the table (except the append data), and the performance is poor in applications with many updates.
  - Applicable scenarios
    - Only data appending
    - Use select count (*)
- InnoDB
  - Advantages
    - Support transactions
    - Exception stop recovery
    - Execute row lock when data is updated
  - Disadvantages
    - Slow Start and Stop
    - Table operations are performed through the database.
  - Applicable scenarios
    - High update frequency
    - Transactions required

Distributed K-V
- Memcached
- Tokyotyrant

Cache System

Squid
- Used as multiple (reverse) proxies such as HTTP, https, and FTP
- Access Control and authentication

Varnish
- High-performance HTTP accelerator
- Flexible language settings
- Basically all executed in memory
- Faster than squid

Nginx, pound ......
Note when the cache server goes online
- If one Server Load balancer instance fails, the other server is unable to withstand the load.
  - Backup Server
- Even if you have enough servers, pay attention to them.
  - New servers (or just started) need to be pushed, and the traffic increases from small to large

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More