Computer Architecture: quantitative research methods: Version 5th

Source: Internet
Author: User
Tags prefetch intel core i7

Computer Architecture: quantitative research methods: Version 5th
Basic Information
Original Title: computer architecture: a quantitative approach, Fifth Edition
Author: (US) John L. Hennessy David A. Patterson [Translator's profile]
Translator: Jia Hongfeng
Series name: Turing programming Series
Press: People's post and telecommunications Press
ISBN: 9787115297655
Mounting time:
Published on: February 1, January 2013
Start: 16
Page number: 1
Version: 1-1
Category: Computer> Computer Organization and architecture> comprehensive
 

For more information, computer architecture: quantitative research methods: Version 5th
Introduction
Books
Computer books
Computer Architecture: Quantitative research method: Version 5th is the most authoritative book on computer architecture, and is a well-known classic work. The book systematically introduces the design basics of computer systems, the instruction set system structure, pipeline and instruction set parallel technology, Hierarchical Storage Systems and storage devices, Interconnect Networks, and multi-processor systems. In this latest version, the author adds the hottest cloud computing and mobile client technologies, this paper discusses how to implement cloud computing on mobile phones, tablets, laptops, and other mobile computing devices.
Computer System Structure: Quantitative research method: Version 5th can be used as teaching materials for undergraduates or graduate students of computer science in colleges and universities, it can also be used as a reference for engineering and technical personnel engaged in computer architecture or computer system design.
Directory
Computer Architecture: quantitative research methods: Version 5th
Chapter 1 Basis of quantitative design and analysis 1
1.1 Introduction 2
1.2 computer classification 4
1.2.1 personal mobile device 5
1.2.2 desktop computing 5
1.2.3 Server 6
1.2.4 cluster/warehouse-level computer 6
1.2.5 embedded computer 7
1.2.6 classification of parallelism and parallel architecture 7
1.3 definition of computer architecture 8
1.3.1 instruction set architecture: a closer look at the computer architecture 9
1.3.2 real computer architecture: design components and hardware 12 that meet the objectives and functional requirements
1.4 technology trends 13
1.4.1 performance trend: bandwidth is better than Latency by 15
1.4.2 Development of transistor performance and connection 17
1.5 power and energy consumption trends in integrated circuits 17
1.5.1 power and energy consumption: System Viewpoint 17
1.5.2 energy consumption and power within the microprocessor 18
1.6 cost trend 21
1.6.1 influence of time, production and popularization 21
1.6.2 cost of Integrated Circuits 22
1.6.3 cost and price 26
1.6.4 manufacturing and operation costs 26
1.7 trust level 26
1.8 performance measurement, reporting and summary 28
1.8.1 Benchmark Test 29
1.8.2 report Performance Test Result 32
1.8.3 Summary of Performance Results 33
1.9 quantification principle of computer design 34
1.9.1 make full use of parallel 35
1.9.2 board comment principle 35
1.9.3 focus on common situations 35
1.9.4 Amdahl's Law 36
1.9.5 processor performance formula 38
1.10 integration: performance, price, and power consumption 40
1.11 paradox and prone to mistakes 42
1.12 conclusion 46
1.13 Historical Review and references 47
Chapter 3 storage hierarchy design 53
2.1 Introduction 54
2.2 10 Advanced Optimization Methods for cache performance 59
2.2.1 first optimization: A small and simple first-level cache to shorten the hit time and reduce the power by 59
2.2.2 second optimization: use of road prediction to shorten the hit time 61
2.2.3 Third optimization: Achieve a streamlined cache access to increase the cache bandwidth by 61
2.2.4 fourth optimization: Use of non-blocking cache to increase the cache bandwidth by 62
2.2.5 fifth optimization: Multiple caches are used to increase the cache bandwidth by 64
2.2.6 6 optimization: keyword priority and Early Restart to reduce the missing cost 64
2.2.7 7 optimization: Merge write buffer to reduce the missing cost 65
2.2.8 eighth optimization: Using Compiler Optimization to reduce the loss rate 66
2.2.9 ninth optimization: hardware prefetch of commands and data to reduce the cost or loss rate of missing 68
2.2.10 10 optimization: Use the compiler to control prefetch to reduce the missing cost or loss rate. 69
2.2.11 Cache Optimization summary 72
2.3 storage technology and optimization 72
2.3.1 SRAM technology 73
2.3.2 DRAM technology 73
2.3.3 improve memory performance of DRAM chips 75
2.3.4 reducing the power consumption in SDRAM 77
2.3.5 flash 77
2.3.6 improving the reliability of the memory system 78
2.4 protection: Virtual Memory and Virtual Machine 79
2.4.1 protection through virtual memory 79
2.4.2 protection through virtual machines 81
2.4.3 requirements for virtual machine monitors 82
2.4.4 Virtual Machine (missing) instruction set architecture support 82
2.4.5 impact of virtual machines on virtual memory and I/O 83
2.4.6 vmm instance: xen VM 84
2.5 crossover problem: Design of Memory Hierarchy 84
2.5.1 protection and instruction set architecture 84
2.5.2 cache data consistency 85
2.6 integration: Memory Hierarchy in arm cortex-a8 and Intel core i7 85
2.6.1 arm cortex-a8 86
2.6.2 Intel core i7 89
2.7 paradox and mistakes 95
2.8 conclusion: Outlook 98
2.9 Historical Review and references 99
Chapter 1 script-level parallel development and Development 3rd
3.1 command-level parallelism: Concepts and challenges 110
3.1.1 what is script-Level Parallel 111
3.1.2 data-related and adventure 111
3.1.3 control related 114
3.2 reveal the basic Compiler Technology of ILP 116
3.2.1 basic assembly line scheduling and loop expansion 116
3.2.2 circular expansion and scheduling summary 119
3.3 Use advanced branch prediction to reduce branch costs by 120
3.3.1 competition forecaster: Adaptive Joint between a local forecaster and a global forecaster 122
3.3.2 Intel core i7 branch estimator 123
3.4 overcome data risks with dynamic scheduling 124
3.4.1 dynamic scheduling: idea 124
3.4.2 use the Tomasulo Algorithm for Dynamic Scheduling 126
3.5 dynamic scheduling: Example and algorithm 130
3.5.1 Tomasulo algorithm: Details 132
3.5.2 Tomasulo algorithm: Example of loop-based 133
3.6 hardware-based speculation 135
3.7 develop ILP 143 with multi-emission and static scheduling
3.8 develop ILP 146 with dynamic scheduling, multi-launch and speculation
3.9 advanced technology for instruction delivery and speculation 150
3.9.1 increase the instruction extraction bandwidth by 150
3.9.2 speculation: implementation problems and expansion 155
3.10 ILP limitations 158
3.10.1 hardware model 158
3.10.2 limits of ILP on the processor 160
3.10.3 beyond the limitations of this study 163
3.11 cross-problem: ILP method and memory system 164
3.11.1 hardware speculation and software speculation 164
3.11.2 speculative execution and memory system 165
3.12 multithreading: Development thread-level parallel increase of single processor throughput by 165
3.12.1 Effect of fine-grained multithreading on Sun T1 168
3.12.2 The effect of simultaneous multithreading on over-standard processors is 170
3.13 integration: Intel core i7 and armcortex-a8 173
3.13.1 arm cortex-a8 173
3.13.2. Intel core i7 176
3.14 paradox and mistakes 179
3.15 conclusion: The Road Ahead: 182
3.16 Historical Review and references 183
Chapter 2 data-Level Parallelism in vector, SIMD, and GPU architectures 4th
4.1 Introduction 194
4.2 vector architecture 195
4.2.1 vmips 196
4.2.2 how the vector processor works: Example 198
4.2.3 vector execution time 199
4.2.4 Multiple lanes: Each clock cycle exceeds 201 of an element
4.2.5 vector length register: Process cycles not equal to 64 and 203
4.2.6 vector mask register: process if statement 204 in Vector Loop
4.2.7 memory group: provides a bandwidth of 205 for vector loading/storage units
4.2.8 stride: process multi-dimensional array 206 in the vector Architecture
4.2.9 concentration-dispersion: Process sparse matrix 207 in the vector Architecture
4.2.10 vector architecture programming 208
4.3 SIMD Instruction Set multimedia expansion 209
4.3.1 multimedia SIMD architecture programming 212
4.3.2 roofline visual performance model 212
4.4 graphics processor 214
4.4.1 GPU programming 214
4.4.2 nvidia gpu computing structure 216
4.4.3 nvida GPU instruction set architecture 222
4.4.4 condition branch 224 in GPU
4.4.5 nvidia gpu memory structure 226
4.4.6 innovation in Fermi GPU architecture 228
4.4.7 similarity between vector architecture and GPU 230
4.4.8 similarities and differences between multimedia SIMD computers and GPUs 233
4.4.9 conclusion 233
4.5 parallel detection and enhanced loop strength 235
4.5.1 search related 238
4.5.2 elimination of related computing 240
4.6 cross-problem 240
4.6.1 energy consumption and DLP: slow, wide, and fast, narrow, 240
4.6.2 group memory and graphics memory 241
4.6.3 missing step views and TLB 241
4.7 integration: mobile and server GPU, Tesla and core i7 241
4.8 paradox and mistakes 247
4.9 conclusion 248
4.10 Historical Review and references 250
Chapter 2 thread-Level Parallel 5th
5.1 Introduction 257
5.1.1 multi-processor architecture: Problems and Methods 258
5.1.2 challenges of Parallel Processing 260
5.2 centralized shared storage Architecture 262
5.2.1 what is multi-processor cache consistency 263
5.2.2 basic implementation scheme of consistency 264
5.2.3 listener consistency protocol 265
5.2.4 basic implementation technologies 265
5.2.5 Protocol example 267
5.2.6 expansion of the basic consistency protocol 270
5.2.7 limitations of symmetric shared memory multi-processor and listening protocols 271
5.2.8 listener cache consistency 272
5.3 performance of symmetric shared memory multi-processor 273
5.3.1 274 commercial workload
5.3.2 performance measurement of commercial workloads 275
5.3.3 multi-programming and operating system workload: 279
5.3.4 performance of Multi-programming and operating system workloads: 280
5.4 consistency between distributed shared storage and directory 282
5.4.1 directory-based Cache consistency Protocol: 283 basic knowledge
5.4.2 example of directory protocol 285
5.5 synchronization: Basic knowledge 288
5.5.1 basic hardware primitive 288
5.5.2 use consistency to implement the lock 289
5.6 memory coherence model: Overview 291
5.6.1 views of programmers 292
5.6.2 loose coherent model: Basic knowledge 293
5.6.3 final description of the coherent model 293
5.7 cross-problem 294
5.7.1 Compiler Optimization and coherence model 294
5.7.2 use speculation to hide the latency of 294 in a strictly coherent model
5.7.3 inclusion and implementation 295
5.7.4 performance gains of 295 using multiple processing and Multithreading
5.8 integration: multi-core processors and performance 297
5.9 paradox and mistakes 301
5.10 conclusion 304
5.11 Historical Review and references 306
Chapter 2 develop request-level and data-level parallel computing at the warehouse level 6th
6.1 Introduction 320
6.2 programming model and workload of warehouse-level computers 323
6.3 warehouse-level computer architecture 327
6.3.1 storage 328
6.3.2 array switch 328
6.3.3 WSC Memory Level 329
6.4 physical infrastructure and cost of warehouse-level computers 331
6.4.1 WSC measurement efficiency: 334
6.4.2 WSC cost 335
6.5 cloud computing: Public computing returns 338
6.6 cross-problem 342
6.6.1 WSC network 342
6.6.2 efficient use of 343 energy in the server
6.7 integration: Google warehouse-Level Computer 344
6.7.1 344 containers
6.7.2 cooling and power supply 346 in Google WSC
6.7.3 server 348 in Google WSC
6.7.4 networking 348 in Google WSC
6.7.5 monitoring and repair of Google WSC 349
6.7.6 summary 349
6.8 paradox and mistakes 350
6.9 conclusion 353
6.10 Historical Review and references 354
Appendix A Basic Principles of Instruction Set 365
A.1 Introduction 366
A.2 classification of instruction set architecture 366
A.3 memory addressing 369
A.4 type and size of operands: 374
A.5 operation 375 in Instruction Set
A.6 control flow command 376
A.7 Instruction Set code 380
A.8 crossover problem: Compiler role 382
A.9 convergence: mips architecture 388
A.10 paradox and mistakes 396
A.11 concluding remarks 399
A.12 Historical Review and references 400
Review of the storage hierarchy of appendix B 405
B .1 Introduction 406
B .2 416 caching Performance
B .3 6 basic Cache Optimization 421
B .4 virtual memory 435
B .5 protection and example of virtual storage 441
B .6 paradox and mistakes 447
B .7 conclusion 448
B .8 Historical Review and references 449
Appendix C pipeline: basic and intermediate concepts 454
C.1 Introduction 455
C.2 main obstacles to streamline-pipeline adventure 461
C.3 how to achieve streamline 476
C.4 difficulties affecting pipeline implementation 485
C.5 extend the MIPs assembly line to process multi-cycle operations 490
C.6 integration: MIPS r4000 assembly line 498
C.7 cross question 504
C.8 paradox and error-prone 511
C.9 conclusion 512
C.10 Historical Review and references 512
References 518 index 543

Source of this book: China Interactive publishing network

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.