Attribute |
NVIDIA GPU |
Intel mic |
Single-core |
Stream processor/Cuda Core Each core runs a thread. |
X86 Core Each core supports up to four hardware threads. |
Clock speed |
Close to 1 GHz |
1.0-1.1 GHz |
Number of cores |
Dozens to thousands |
57-61 |
Degree of Parallelism |
Multi-Level Parallel Processing of grid, block, and thread Fine-grained parallelism (number of threads> Number of cores) The thread overhead is 0. |
Thread + Vectoring Number of threads <= (cores-1) * 4 Vectorized width 512bit (single precision: 16, double precision: 8) |
Memory size (GB) |
Up to 12 GB |
6/8/16 GB |
Memory bandwidth |
288 Gb/s |
240-352 Gb/s |
Data Access Requirements |
The data accessed by threads in the warp is continuously optimal. |
The data accessed in the thread is continuous. If vectorized, it is the best continuous data access. |
Peak Performance |
Single precision: up to 4.29 tflops Dual precision: up to 1.43 tflops Calculation method: Command throughput * Number of computing units * Frequency |
Precision: 2.0-2.2 tflops Dual precision: 1.0-1.1 tflops Sample DP calculation: 16 DP flops/clock/Core * 61 cores * 1.1 GHz = 1073.6 gflop/s |
Programming Language |
Cuda, opencl, and openacc |
OpenMP, opencl, cilk, and openacc |
Programming Mode |
Offload |
Offload, native, and distributed ric |
Power Consumption |
235 million |
225-300 W |
Bandwidth PCI-E |
2.0 (8 Gb/s for each two-way operation) 3.0 (16 Gb/s each in two directions) |
2.0 (8 Gb/s for each two-way operation) Currently, 3.0 is not supported. |
Operating Platform |
PC, server, workstation You can configure a geforce card on the PC to run Cuda at a low cost and high performance. |
Server Relatively professional with high costs and few configurations for individuals |
Product |
Geforce: several hundred to several thousand yuan, used on the PC (current mainstream GTX710-780) Tesla: 1 w-3w yuan, used on servers (currently mainstream k20, k40) Quadro: tens of thousands of RMB, used on workstation (currently mainstream Quadro k4100m, Quadro k3100m, Quadro k2100m, Quadro k610m) |
Knc: About 1 w-2w Current mainstream 7110 P, 5110 P, 3110 P |
Supported Operating Systems |
Windows: XP, win7, Win8 Linux x86: fedora, opensuse, RHEL/centos, sles, steamos, Ubuntu, etc. Linux arm: Ubuntu Mac OSX |
Windows: Windows 8 server, win 7, win 8 Linux: redhat6.0 and above, Suse sles11 and above |
Built-in OS on the card |
None |
Built-in UOS with independent IP addresses |