Download address: Network disk download
Content IntroductionThe design and practice of large distributed Web site architecture mainly introduces some technical details involved in the architecture of large distributed Web sites, including the implementation of SOA architecture, Internet security architecture, infrastructure, system stability assurance and mass data analysis of distributed Web sites. , the core principles of large-scale distributed Web site architecture design are deeply described, and some typical cases of architecture design are used to help readers understand some common scenes and problems encountered in the design of large distributed Web sites. The author combines his actual work experience in Alibaba and Taobao to discuss. "Large-scale distributed Web site architecture design and practice" for beginners to learn, to help readers understand the structure of large-scale distributed web sites, as well as the solution to the problem of ideas and methods, but also for the industry peer reference, to bring inspiration to the day-to-day work.Author IntroductionEditor Chen Kang [1], Taobao nickname Ronglong, a senior research and development engineer Taobao technology, long-term in Taobao under the distributed environment, in the Distributed system architecture design, high concurrency system design, system stability, such as the protection of the field has accumulated a wealth of practical experience. I have published in the programmer "Ramble on the HTTP protocol SOA architecture" "on the HTTP platform security stability Architecture" two articles, an in-depth study of the SOA architecture based on HTTP protocol and extensive practical experience in troubleshooting online problems and failures, and being adept at using data analysis to solve practical problems. Have a strong interest in the new technology.Famous recommendationEdited in 2008, TB network as the volume of traffic/data volume increased, as well as the growth of developers, the original architecture system has been unable to support, so in that year TB Network transformation system for a large distributed web site. The author is currently working in Ali Group, clearly see the current TB large-scale distributed Web site architecture system, this architecture is in fact a very multifaceted technology integration, to master the most important first is to see the whole picture, but this is also the hardest. This book shows you the full picture of the technology a large distributed web site needs.
--Alibaba Group senior technical expert Lin Hao went (Bi Xuan)
The author through a lot of practice to analyze the distributed Web site design process of common problems and gradually give answers, through this book can be a distributed web site design for a systematic study, it is worth reading.
--Guohua (Sony), senior technical expert of poly-cost technology
Now the scale of the Web site, are implemented using a distributed architecture. Then how the Web site is distributed, and what the underlying distributed systems are, are important for our architects and developers to understand, and the related security issues, as well as the issues of stability, performance, and application of online problem positioning analysis, are also essential, and this book gives readers a complete picture of the relevant knowledge , so that practitioners can have a more comprehensive understanding of this knowledge. And Kang Xian is also a fight in the front-line technical staff, personal experience of the summary will appear more combat and precious.
--TB, director of Technology Zeng Xianjie (Huali)DirectoryEdit 1th Service Oriented Architecture (SOA) 1
This chapter mainly introduces and solves the following questions, which is also the basis of the book:
The HTTP protocol works in the same way as the HTTP network protocol stack structure.
How to implement RPC calls based on HTTP protocol and TCP protocol, what is the difference between them and what kind of scenarios to adapt to.
How to implement dynamic registration and routing of services, and the realization of soft load balance.
1.1 RPC 3 based on TCP protocol
1.1.1 RPC noun explanation 3
Serialization of the 1.1.2 object 4
1.1.3 Implementation of RPC 6 based on TCP protocol
1.2 RPC 9 based on HTTP protocol
1.2.1 HTTP protocol Stack 9
1.2.2 HTTP request and Response 15
1.2.3 Send HTTP request via HttpClient 16
1.2.4 Advantage of using HTTP protocol 17
1.2.5 JSON and XML 18
1.2.6 RESTful and RPC 20
1.2.7 implementation of RPC based on HTTP protocol 22
1.3 Routing and load balancing of services 30
Evolution of the 1.3.1 Service 30
1.3.2 Load Balancing Algorithm 33
1.3.3 Dynamic Configuration rule 39
1.3.4 Zookeeper Introduction and environment construction 40
1.3.5 Zookeeper API Usage Overview 43
Use of 1.3.6 Zkclient 47
Implementation of 1.3.7 Routing and load Balancing 50
1.4 HTTP Service Gateway 54
2nd Chapter Distributed System Infrastructure 58
This chapter mainly introduces and solves the following questions:
The use of distributed cache memcache and distributed strategy, including the choice of hash algorithm.
Common Distributed system storage solutions, including distributed extensions of MySQL, HBase APIs and usage scenarios, redis usage, etc.
How to use Distributed message system ACTIVEMQ to reduce the coupling degree between systems, and to communicate between applications.
The use of vertical search engines in distributed systems, including the fundamentals of search engines, the detailed use of Lucene, and the use of SOLR, an Open-source search engine tool based on Lucene.
2.1 Distributed Cache 60
2.1.1 Memcache Introduction and Installation 60
2.1.2 Memcache API and distributed 64
2.1.3 Distributed Session 69
2.2 Persistent Storage 71
2.2.1 MySQL Extension 72
2.2.2 HBase 80
2.2.3 Redis 91
2.3 Message System 95
2.3.1 ActiveMQ & JMS 96
2.4 Vertical Search Engine 104
2.4.1 Lucene Profile 105
Use of 2.4.2 Lucene 108
2.4.3 SOLR 119
2.5 Other Infrastructure 125
3rd Internet Security Architecture 126
This chapter mainly introduces and solves the following questions:
Common Web attack methods and defenses, such as XSS, CRSF, SQL injection, and so on.
Some common security algorithms, such as Digital Digest, symmetric encryption, asymmetric encryption, digital signature, digital certificate and so on.
How to use the Digest authentication method to prevent the information tampering, verifying the legality of the communication by digital signature, and guaranteeing the data in the communication process by the HTTPS protocol is not monitored and intercepted by the third party.
Under the open platform system, how to guarantee the ISV's access to data is an authorized legal behavior in the OAuth protocol.
3.1 Common Web attack methods 128
3.1.1 XSS attack 128
3.1.2 CRSF attack 130
3.1.3 SQL injection Attack 133
3.1.4 File Upload Vulnerability 139
3.1.5 DDoS Attack 146
3.1.6 Other attack means 149
3.2 Common Security Algorithms 149
3.2.1 Number Summary 149
3.2.2 Symmetric encryption Algorithm 155
3.2.3 Asymmetric Encryption Algorithm 158
3.2.4 Digital Signature 162
3.2.5 Digital Certificate 166
3.3 Summary Certification 185
Why 3.3.1 Need certification 185
Principle of 3.3.2 Summary authentication 187
Implementation of 3.3.3 Digest authentication 188
3.4 Signature Certification 192
Principle of 3.4.1 Signature authentication 192
Implementation of 3.4.2 Signature authentication 193
3.5 HTTPS Protocol 200
Principle of 3.5.1 HTTPS protocol 200
3.5.2 Ssl/tls 201
3.5.3 Deploy HTTPS Web 208
3.6 OAuth Protocol 215
Introduction to 3.6.1 OAuth 215
3.6.2 OAuth Licensing Process 216
4th Chapter System Stability 218
This chapter mainly introduces and solves the following questions:
Use of commonly used online log analysis commands and the writing of log analysis scripts such as Cat, grep, WC, less, and the writing of awk and shell scripts.
How to monitor the cluster, including the definition of monitoring index, heartbeat detection, capacity evaluation, etc.
How to ensure the stable operation of high concurrency system, such as the use of traffic control, dependency management, service classification, switches and other strategies, as well as how to design a high concurrency system.
How to optimize the application performance, including front-end optimization, Java program optimization, database query optimization, and so on.
How to do the online troubleshooting of Java application, including a series of troubleshooting tools, as well as a number of practical examples of the introduction.
4.1 Online Log Analysis 220
4.1.1 Log Analysis Common commands 220
4.1.2 Log Analysis Script 230
4.2 Cluster Monitoring 239
4.2.1 Monitoring Index 239
4.2.2 Heartbeat Detection 247
4.2.3 Capacity Assessment and application level 252
4.3 Flow control 255
4.3.1 flow Control Implementation 255
4.3.2 Service Stability 260
4.3.3 High concurrent System design 265
4.4 Performance Tuning 277
4.4.1 How to find performance bottlenecks 277
4.4.2 Performance Testing Tool 285
4.4.3 Performance Optimization Measures 292
4.5 Java application Failure Troubleshooting 314
4.5.1 commonly used tools 314
4.5.2 Typical Case Analysis 331
5th Chapter Data Analysis 337
This chapter mainly introduces and solves the following questions:
The architecture of the log collection system in a distributed system.
How to analyze the flow data in real time through storm.
How to analyze off-line data through Hadoop, and build data warehouse through hive.
How to import data stored in a relational database into HDFs, and import data from HDFs into a relational database.
How to display the analysis of good data to the user through graphics.
5.1 Log Collection 339
5.1.1 INotify Mechanism 339
5.1.2 Activemq-cpp 343
5.1.3 Architecture and Storage 359
5.1.4 Chukwa 362
5.2 Offline Data Analysis 369
5.2.1 Hadoop Project Introduction 370
5.2.2 Hadoop Environment Build 374
Written by 5.2.3 MapReduce 384
5.2.4 Hive uses 389
5.3 Streaming Data Analysis 403
Introduction to 5.3.1 Storm 404
5.3.2 Installation Deployment Storm 407
Use of 5.3.3 Storm 418
5.4 Data Synchronization 422
5.4.1 Offline data Synchronization 423
5.4.2 Real-time data synchronization 429
5.5 Data Report 431
What 5.5.1 data reports can provide 431
5.5.2 Report Tool Highcharts 432
Ref. 445
Download address: Network disk download