Because of the project needs, our deep learning algorithm must be accelerated, so the group gave me two gpu:gtx-750 Ti GRID-K2
GTX-750 Ti was I installed in the local, GRID-K2 installed on the server, need to SSH login to use, followed by a variety of pits ......... .....
First, let's talk about Grid-k2, server-side installation:
1. First, if you have only this card, sorry, you can not click here to see Cuda supported GPU here to find the information
compiler and language improvements for CUDA9
Increased support for C + + 14 with the Cuda 9,NVCC compiler, including new features
A generic lambda expression that uses the Auto keyword instead of the parameter type;
Auto lambda = [] (auto A,auto b) {return a * b;};
The return type of the feature is deducted (using the Auto keyword as the return type, as shown in the previous example)
The CONSTEXPR function can contain fewer restrictions, including var
installation methods are completed, the following set environment variables, validation, compile test samplesSet Environment variables FirstOpen profile$ sudo gedit/etc/profileAdd the following two lines at the end to saveExport Path=/usr/local/cuda-7.5/bin: $PATHExport ld_library_path=/usr/local/cuda-7.5/lib64: $LD _library_pathThen make it effective$ source/etc/profileNext verify the
, UltraISO, Chinese cabbage and so on.
Download installation packages and drivers
To download the image file:
(1) Download the corresponding Cuda version on the official website, I choose the 7.0 version here, choose Run on it, official address: [Cuda official DOWNLOAD]Http://developer.nvidia.com/cuda-downloads
(2) Download the co
destination address.
After you have done this, you need to restart the computer to do the following.
B. Verify the driver version
$ cat/proc/driver/nvidia/version
Then perform the following actions in turn:
To verify the Cuda version:
Nvcc-v
The result is the following figure:
Nvidia-smi
C, running examples
Enter the directory where the routine is lo
CUDA and cuda ProgrammingCUDA SHARED MEMORY
Shared memory has some introductions in previous blog posts. This section focuses on its content. In the global Memory section, Data Alignment and continuity are important topics. When L1 is used, alignment can be ignored, but non-sequential Memory acquisition can still reduce performance. Dependent on the nature of algorithms, in some cases, non-continuous access
1. Installing Build-essentialsInstall some basic packages needed for developmentInstall Build-essential2. Install the Nvidia driver (3.4.0) 2.1 Preparation work (2014-12-03 Update)In the case of shutting down the desktop management LIGHTDM, installing the driver seems to implement Intel HD graphics to display + NVIDIA graphics card to calculate. The steps are as follows:1. First select the Intel graphics ca
)implicit invocationThe Library of the Cuda Runtime software layer is implicitly called.Starting with 4.0, the Cuda runtime creates a context for all threads, that is, one device corresponds to a context, and all threads are available.Cuda runtime does not provide the API to create the CUDA context directly, but instead creates the context by delaying initializat
CUDA 5, CUDAGPU Architecture
SM (Streaming Multiprocessors) is a very important part of the GPU architecture. The concurrency of GPU hardware is determined by SM.
Taking the Fermi architecture as an example, it includes the following main components:
CUDA cores
Shared Memory/L1Cache
Register File
Load/Store Units
Special Function Units
Warp Scheduler
Each SM in the GPU is designed to support hundred
Use Python to write the CUDA program, and use python to write the cuda Program
There are two ways to write a CUDA program using Python:
* Numba* PyCUDA
Numbapro is no longer recommended. It is split and integrated into accelerate and Numba.
Example
Numba
Numba optimizes Python code through the JIT mechanism. Numba can optimize the hardware environment of the Loca
write in front
The content is divided into two parts, the first part is translation "Professional CUDA C Programming" section 2. The timing YOUR KERNEL in CUDA programming model, and the second part is his own experience. Experience is not enough, you are welcome to add greatly.
Cuda, the pursuit of speed ratio, want to get accurate time, the timing function is
1, based on VC + + WIN32+CUDA+OPENGL combination of remote sensing image displayIn this combination scenario, OpenGL is set to the following two ways when initialized, with the same effect// setting mode 1glutinitdisplaymode (glut_double | GLUT_RGBA); // setting Mode 2glutinitdisplaymode (glut_double | GLUT_RGB);Extracting the pixel data from the remote sensing image data, the R, G, and b three channels can be assigned to the pixel buffer objects (pb
in a stream affects how the Cuda driver dispatches these operations and flows and how they are executed. Tips1. when the number of thread blocks is twice times the number of processes in the GPU, the optimal performance is achieved.2. the first calculation performed by the kernel function is to calculate the offset of the input data. The starting offset for each thread is a value from 0 to the number of th
Heterogeneous Computing System in terms of GPU. It is very different from the CPU. the CPU is only for one processor, and Cuda is for the GPU. During editing Code Separate from GPU code, GPU code is compiled into work code, and the CPU still needs to be compiled by other C language compiling systems. This may be the biggest difference. Cuda must involve the CPU, which is called heterogeneous computing. The
Cuda Memory Model:
GPU chip: Register, shared memory;
Onboard memory: local memory, constant memory, texture memory, texture memory, global memory;
Host memory: host memory, pinned memory.
Register: extremely low access latency;
Basic Unit: register file (32bit/each)
Computing power 1.0/1.1 hardware: 8192/Sm;
Computing power 1.2/1.3 hardware: 16384/Sm;
The register occupied by each thread is limited. Do not assign too many private variables to it dur
CUDA 6, CUDAWarp
Logically, all threads are parallel. However, from the hardware point of view, not all threads can be executed at the same time. Next we will explain some of the essence of warp.Warps and Thread Blocks
Warp is the basic execution unit of SM. A warp contains 32 parallel threads, which are executed in SMIT mode. That is to say, all threads execute the same command, and each thread uses its own data to execute the command.
A block can be
package, so the Linux here to 64-bit)* Download the corresponding version on the CUDA website (https://developer.nvidia.com/cuda-downloads#linux).* After the download is complete, you can install it using the following command, note that the file name is modified to Cuda-repo-ubuntu1404_6.5-14_amd64.deb$ sudo dpkg-i cuda
First verify that you have an NVIDIA graphics card (Http://developer.nvidia.com/cuda-gpus this site to see if you have a graphics card that supports GPU):
$ LSPCI | Grep-i nvidia
See your Linux distributions (mostly 64-bit or 32-bit):
$ uname-m cat/etc/*release
Look at the version of GCC:
$ gcc--versionFirst download the NVIDIA Cuda Warehouse installation package (my Ubuntu 14.04 64 bit, so the down
Document directory
Function qualifier
Variable type qualifier
Execute Configuration
Built-in Variables
Time Functions
Synchronous Functions
1. Parallel Computing
1) Single-core command-level parallel ILP-enables the execution unit of a single processor to execute multiple commands simultaneously
2) multi-core parallel TLP-integrate multiple processor cores on one chip to achieve line-level parallel
3) multi-processor parallelism-Install multiple processors on a single circuit board and i
Software Foundation, Inc.This is free software; see the source for copying conditions. There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
After checking, go to the NVIDIA website (refer to link 3) to download the driver, which is the Deb package of ubuntu14.04.2. Installation
Deb package installation is relatively simple, but the installation process prompts instability, but there is nothing wrong with it.
Follow
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.