Notes on large-scale Web Service Development Technology

Source: Internet
Author: User


1. Differences from small-scale services with only a few servers

Scalable Load Balancing

Ensure Redundancy

Reduced CED Operation: reduces manual intervention (too many machines can't be remembered)

2. Difficulties in large-scale Data Processing Memory vs Disk

Memory is 1 million times faster than disk

3. Techniques for large-scale data

Write programs

Complete in memory whenever possible

Use algorithms that can cope with data growth (Binary Tree O (logn ))

Use data compression and search technologies

Prerequisites: underlying foundation

Operating system cache

Distributed architecture is a prerequisite for applications that must be done by rdbms

How to use large-scale environmental data structures and algorithms


4. operating system cache

Virtual Memory: The process does not directly use the memory address, the kernel is transparent, and access starts from 0. The operating system allocates memory on the page.

Page cache: The data read by the process into the memory is not directly released and cached for later use.

Vfs: the disk cache is implemented by the page cache. vfs is responsible for shielding different underlying file systems for caching.

File Cache: LRU is used. The minimum unit is page size.

Reduce I/O load: if the memory is greater than the data file, all data can be cached, and data compression is not considered.

Local distribution: implement distributed data based on the Access Mode


5. Database horizontal scaling policy

Key Points of distributed mysql

Flexible application operating system cache

Set indexes correctly

Design applications on the premise of horizontal scaling


Mysql distributed

Replication: master slave

Extended update/Write: Table segmentation, key-value


6. Special-purpose Indexes

Inverted index: used for full-text search. You can create an Index Server separately.

7. Full-text index implementation

Step: Create an index in the crawling storage to display the search score

Reverse index structure: Directiory + Position

Directory Creation: dictionary + AhoCoraSic or elemental analysis

8. scalable ideas

Load Optimization: Visual Load

Considering machine purpose: Crawlers


9. Ensure Redundancy

Application server: Increase the number of servers. server Load balancer implements failover and recovery failures.

Database server: increases the data volume. multi-master instances have the risk of non-synchronization during replication switching. If this is ignored, manual recovery occurs.

Storage Server:

System Stability

There is a trade-off with resource utilization: maintain the appropriate margin

Resource increase and Memory leakage may affect: Automatic dos checks abnormal restart termination time query


10. Improve Efficiency

Virtualization: scalable, cost-effective, and highly available. Disadvantage: performance overhead, cpu2 % memory 10% network 50% IO5 %

Effective use of cheap hardware: multi-core cpu SSD hard drive


11. Network

Demarcation point: 1G bps 500 global host CDN



Benefit life: Understanding the focus of Server Load balancer: operating systems, caching, multithreading/multi-process, virtual memory, and file systems

View single server load: Check the average load to determine whether there is a CPU/IO bottleneck

Average load: top uptime, waiting for cpu + waiting for io tasks/unit time

CPU bottleneck: sar vmstat

Top/sar check whether the process is a user process or a system process

Ps: view the process status and cpu usage time, and confirm the problematic process.

Use strace or oprofile to locate the problem after determining the process

IO bottleneck: the disk is frequently accessed due to too many io requests or page switching. The status of the SWAp zone is confirmed by sar or vmstat.

If page switching occurs

Ps check whether a large amount of memory is consumed

Program reasons, improve the program

The memory is indeed insufficient to add memory, so it cannot be increased to consider distributed

There is no exchange, io is frequently due to insufficient Cache

Increase memory

Memory cannot be increased or is not enough. Consider distributed storage or adding cache servers.

Operating system optimization is to find and solve the bottleneck.


This article from the "Ying: Good memory as bad pen" blog, please be sure to keep this source http://yingtju.blog.51cto.com/3760152/1299911

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.