"Core principles and case studies of large Web site technology Architecture" Reading notes

Source: Internet
Author: User
Tags html comment message queue website performance asymmetric encryption

Chapter

Notes

1. Overview

  1. Site architecture Patterns: tiering, partitioning, distribution, clustering, caching, async, redundancy, automation, security.

  2. Core architectural elements: performance, availability, scalability, scalability, security.

4. High Performance
  1. Generally repeated requests 10,000 times the total response time is calculated and divided by 10,000 to get the word response time.

  2. Instead of starting multiple threads and sending requests continuously, the test program adds a random wait time between two requests.

  3. Throughput: Number of vehicles passing through toll stations per day, number of concurrent traffic: number of vehicles being driven; response time: speed. TPS: Number of transactions per second, HPS: Requests per second, QPS: number of queries per second.

  4. Performance counters: Metrics such as system load, which is the most desirable number of CPUs, number of objects and threads, memory usage, CPU usage, disk and network I/O, and so on.

  5. The number of concurrent increments phase: Performance Test--load test--stress test, stability test.

  6. As the request increases the system processing power increases to the maximum, which is the maximum load point of the system. The processing power of the system continues to decrease, and the final crash can be seen as a system crash point.

  7. Browser Access optimizations: 1. Reduce HTTP requests (merge requests, CSS offset responses), 2. Use the browser cache (Cache-control and expires properties in HTTP, change the file name to update progressively), 3. Enable compressed HTML, CSS, JS enable gzip;4. CSS placed above (before the page is rendered after download), JS put the bottom (download immediately after the execution, it may be blocked); 5. Reduce cookie transfer (static resources can enable independent domain names and do not require cookies).

  8. CND acceleration: Placing a CDN on a static resource greatly improves the Web page opening speed.

  9. Reverse proxy: Located on the side of the website room, Proxy Web server receives HTTP requests, can protect the website security, can store the cache, can realize automatic load balancing.

  10. Application Server Performance Optimizations: Caching, clustering, asynchronous.

  11. Website performance Optimization The first law: Prioritize the use of cache (high-speed storage media, improve access speed, no need to repeat the calculation). Calculates the hashcode corresponding hash table index of the key in KV to achieve fast access.

  12. Where caching is reasonably applied: frequently modified data (read and write ratios greater than 2:1), access without hotspots, inconsistent data and dirty reads (time intervals after accepting timeouts), cache availability (cache downtime does not have a significant impact on the database), cache warming (pre-loading data into the cache, LRU: The most recent unused algorithm), cache penetration (cache no data, direct access to the database, preferably cache non-existent values to null).

  13. Distributed cache: Update synchronization distributed (JBoss cache, all machines save cache content Same), non-communication distributed (Memcached, Memory: Based on size block grouping, find the smallest chunk larger than the data, using the LRU algorithm to release the most recently inaccessible space, The consistent hashing algorithm allows unlimited scaling).

  14. Asynchronous operation: The user requests direct access to the database link to increase the message queue, the request sent to the message queue directly back, there is a message queue operation database, so as to achieve peak shaving effect.

  15. Use a cluster: Use load balancing technology to build a server cluster of abortion servers for an application, spreading requests to multiple servers for processing.

  16. Code optimization: Multithreaded (number of startup threads =[Task Execution time/(Task execution Time-io wait time)]*cpu cores), Resource reuse (database connection, network communication connection, thread, complex object, etc.), data structure (character-->MD5 fingerprint-->hash calculation-- >hashcode), garbage collection.

  17. Storage Performance Optimizations: 1. Mechanical (fast sequential read/write, slow random) vs Solid State 2.b+ tree vs LSM 3.RAID vs HDFS

  18. RAID:RAID0 (read-write multi-block, 100%), RAID1 (simultaneous write 2, 50%), RAID10 (combined with 1 and 0,50%), RAID5 ((n-1)/n), RAID6 (reliability is higher than 5, (n-2)/n).

  19. HDFS: In blocks, a file is split into chunks, and when a block is written, it is automatically copied to another 2, with 3 copies guaranteed. The task framework is computed by MapReduce concurrently, and multiple block concurrent processing is read, which is equivalent to RAID0 concurrency.

5. High Availability
  1. Web site availability of up to 4 9,99.99%; fault points = failure time * fault weight; Use load balancing for failover of stateless services.

  2. Session Management of Application Server cluster: 1. copy; 2. IP and server bindings via hash algorithm 3. Use cookies to record session;4. Deploy session server independently using distributed cache and database (recommended).

  3. Highly Available services: Hierarchical management, timeout settings, asynchronous invocation, service demotion, idempotent design.

  4. Cap principle: A data service cannot simultaneously satisfy data consistency (consistency, strong consistency, user consistency, eventual consistency), data availability (availibility), partition tolerance (Partition tolerance, scalability), priority availability, scaling.

  5. Data backup: Cold (regular replication, no guarantee of data consistency and availability), hot standby (Asynchronous Hot spare (written by agent slave), synchronous hot standby (client reads and writes Master-slave)).

  6. When the update is paused, a subset of the servers in the load balancer are updated, then automated tests are implemented using similar selenium, and then pre-release is used (and the only difference on the line is not placed in the Load Balancer list).

  7. Automate the release of a train release model and publish it using Grayscale Publishing (AB testing) for easy rollback.

  8. Monitoring data acquisition: 1. User behavior Log collection (server-side logs, client browsing logs (based on Storm real-time computational framework Log Statistics analysis)); 2. Performance monitoring ganglia;3. Running Data report (cache hit rate, average response, pending task)

  9. Monitoring management: System alarm, failure transfer, automatic graceful downgrade.

6. Flexibility
  1. Scalability: No need to change the hardware and software design of the website, By changing the number of servers deployed, you can expand or narrow your website processing power.

  2. Scaling Design: A class of physical separation through functionality to achieve scaling (any stage, horizontal business, vertical basic services), one is a single function through the cluster to achieve scaling. The

  3. Implements load balancing: 1. HTTP redirection (to request 2 times, directed server bottleneck, SEO cheat); 2. DNS domain name polling (configuration A records multiple IPs, disadvantages are slow, permissions are low, usually resolves to the Load Balancer server as the first step) 3. Reverse proxy (receiving public network, forwarding intranet server, application layer load Balancing); 4.IP load Balancing (network modified IP, better than reverse proxy performance, but also possible bottleneck); 5. Data Link Layer load Balancing (data link layer modifies MAC for distribution, and then responds directly to the data to the client, also known as direct routing Dr, the best Product LVs).

  4. Load Balancing algorithm: 1. Polling, 2. Weighted polling, 3. Random, 4. Minimum connection (to the server with the least number of current connections), 5. Source Address hash (IP hash, make fixed access to a server with IP).

  5. Routing algorithm: 1. Take-up (in fixed server quantity can satisfy all, unable to expand); 2. Consistent hash (distributes the cache server to multiple virtual nodes (150) on a circle, looking for the closest node in one direction).

  6. Read-Write separation--Library-->mysql can be partitioned using amoeba and Cobar two products (splitting a table into multiple databases)

  7. Nosql:not only SQL, as a complement to relational databases, discards structured query language and transactional consistency guarantees, and enhances usability and scalability. Use HBase for scaling.

  8. NoSQL Scaling: Application-to-zookeeper request Hmaster address-and then enter key to Hmaster to request Hregionserver address-- Then enter key to Hregionserver to query data-->hregionserver access hregion get data

7. Extensible
  1. Extensibility: Refers to the ability of the system function to be continuously expanded or improved in the case of minimal impact on existing systems. In addition to module distribution, there are distributed Message Queuing and distributed services in two ways.

  2. Message Queuing: 1. Event-driven Architecture (EDA, producer consumer model, with event messaging to complete inter-module collaboration); 2. Distributed Message Queuing (FIFO, ESB, SOA)

  3. Distributed service requirements and features: Load balancing, fail-over, efficient remote communication, integration of heterogeneous, minimal intrusion on applications, versioning, real-time monitoring. Thrift, Dubbo. The

  4. Leverages NoSQL to implement an extensible data structure.

8. Security
  1. XSS attack: Cross-site Scripting Attack (reflective, persistent); Defense: 1. Filter symbol; 2.HTTPONLY (disable JS access to this attribute cookie);

  2. Injection attack, CSRF (cross-site request forgery) Defense: 1. Form token;2. Verification code; Referer Check;

  3. Other attacks: error echo, html comment, file upload, path traversal.

  4. Web application firewall: modsecurity, Siteshell.

  5. Information encryption: 1. One-way hash encryption (MD5, sha+ salt); 2. Symmetric encryption (same as DES, RC), 3 Asymmetric encryption (public, private, RSA, certificate is the key);

  6. Information filtering: Using regular and trie tree algorithm or hash table filter text, classification algorithm ([no association is naïve] Bayesian classification algorithm), black list (hash table, extract 8-bit fingerprint filter).

  7. Risk: Account risk, buyer risk, seller risk, transaction risk; Wind control: Rule engine, statistical model.

9. Other
  1. Second Kill: System independent deployment, page static, lease seconds to kill the active bandwidth, dynamically generate random URLs. Separate the timed JS files to different servers, limiting the total number of requests each application server accepts, and discarding them.

  2. Failure cases: Write log failure, high concurrent Access database failure, high concurrency lock failure, cache-triggered failure, application-initiated asynchronous failure, large file read/write exclusive disk failure, misuse of production environment failure, non-canonical process-induced failure (through caching), programming habits

"Core principles and case studies of large Web site technology Architecture" Reading notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.