Construction of TensorFlow deep learning environment based on Nvidia-docker under Ubuntu14.04

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

* Record the configuration process, the content is basically the configuration of the problems encountered in each step and the corresponding method found on the Internet, the format will be more confusing. Make some records for the younger brothers and sisters to build a new server to provide some reference (if the teacher to buy a new server), but also hope to help people in need.

System configuration: CPU Xeon e5-2620 V3, Gpu:nvida TITAN X, Os:ubuntu 14.04

Laboratory to block Titan X, the server finally has a video card. On the weekends the outfit drives the GPU-supported TensorFlow development environment (it is slow to run the TF with the CPU)

First of all, the TITAN x rated power is 300W (that is, the high load in the long run when the average power at 300W, so peak power may be super), to ensure that the power of the host power is sufficient.

TITAN X does not have a VGA port and needs to be fitted with a adapter to connect the monitor.

2. After the display of the video card output, into Ubuntu when the black screen, the command line can not enter the kind.

After Baidu, try to enter recovery mode,: Edit Grub
# Vi/etc/default/grub
Modified as follows:

1
grub_cmdline_linux_default= "Quiet Splash nvidia.modeset=0"
2

3

4
#或者这样也可
5

6

7
#GRUB_CMDLINE_LINUX_DEFAULT = "Quiet Splash Nomodeset"

will be able to solve the black screen. In the recovery mode, VI cannot save the changes and will need to be mounted again:

# mount-n-O remount/

Workaround here Reference:

Http://www.2cto.com/os/201307/225026.html

http://blog.csdn.net/sqzhao/article/details/9812527

3. Solve the problem of black screen, the normal boot into the system. Start installing NVIDIA drive. Before I installed a version of the very old video card driver, using the command line installation, very troublesome ... Also have to disable the display, close the graphical interface, Balabala ... Last two or three days of tossing and finishing. However, this piece of Titan X is not necessary, through the Package manager in the graphical interface to click a few. A few minutes to fix, no longer convenient. First go to Ubuntu Software Center to install Synaptic Package Manager (the New installation package manager). Open Synaptic, Input: nvidia, select nvidia-352 (according to the graphics card model selection), and then point Apply,synaptic Package Manager will be installed in nvidia-352, all installed together, after installation, you will find that in fact, many things installed. So this installation drive way, more than one of their own installation of those bags, insurance a lot. After installation, reboot. Click on the upper right corner of the computer, found that the graphics inside the show has been Titan, and accomplished.

4. Then install Docker. Ubuntu is the default with Docker. However, there is usually a shortage of versions that need to be manually upgraded to the latest version.

Install Upgrade DOCEKR Reference Blog:

Http://www.tuicool.com/articles/JBnQja;

Http://www.linuxidc.com/Linux/2015-02/113784.htm

5. Install Nvidia-dockerplugin. Sometimes you may experience a docker-engine version of the problem. Follow the first link in step 4th and try again.

Reference Link: https://github.com/NVIDIA/nvidia-docker

6. Drop down mirror. In the Docker hub, search for tensorflow mirrors. The official image of TensorFlow is used here.

Link: https://hub.docker.com/r/tensorflow/tensorflow/

There is a CPU version, GPU (CUDA) version of the container.

Use the command to start the container, sometimes sudo:

$ nvidia-docker run-it-p 8888:8888tensorflow/tensorflow:latest-gpu

Where-P is the port mapping. You can add bash after the command, so go into the Docker shell and do something. When you need to start Jupyter notebook, run run_jupyter.sh in the root directory

7. Local hard disk mount to container. Command –v/Host directory:/container Directory

8. Sometimes it's a problem to start nvidia-docker.

The advice given in the GitHub of TensorFlow is: "Note:if you would have a problem running the nvidia-docker you are trythe old way we have." But it is not recomended. If you find a bug innvidia-docker the it there and try using the Nvidia-docker asdescribed above. "

Link: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker

To use the command:

$ export cuda_so=$ (\ls/usr/lib/x86_64-linux-gnu/libcuda.* | xargs-i{} echo '-v {}:{} ')

$ export devices=$ (\ls/dev/nvidia* | xargs-i{} echo '--device {}:{} ')

$ docker run-it-p 8888:8888 $CUDA _so$devices Gcr.io/tensorflow/tensorflow:latest-gpu

This method requires that the GPU be mounted manually.

In the Docker shell, view the video card device through Ls-la/dec |grep nvidia, and then mount it in turn.

# mount the GPU command, each device must be mounted

Docker run-it--name name-v/home/:/mnt/home--privileged=true--device/dev/nvidia-uvm:/dev/nvidia-uvm--device/dev/ NVIDIA0:/DEV/NVIDIA0--device/dev/nvidiactl:/dev/nvidiactlmyconda:cuda Bash

#示例:

Docker run-it-p 8888:8888-v/home/:/mnt/home--PRIVILEGED=TRUE--DEVICE/DEV/NVIDIA-UVM:/DEV/NVIDIA-UVM--device/dev/ Nvidia0:/dev/nvidia0--device/dev/nvidiactl:/dev/nvidiactl--device/dev/nvidia-modeset:/dev/nvidia-modeset $CUDA _ So $DEVICESGCR. IO/TENSORFLOW/TENSORFLOW:LATEST-GPU Bash

9. After running Jupyternotebook, test the GPU using the following code, if there is no error, the call succeeds:

Import TensorFlow as TF

# '/GPU: ' Multi-GPU-assigned nth-block GPU

With Tf.device ('/gpu:2 '):

A =tf.constant ([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name= ' a ')

b =tf.constant ([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name= ' B ')

C =tf.matmul (A, B)

# New Session Withlog_device_placement and set to True.

Sess =TF. Session (CONFIG=TF. Configproto (Log_device_placement=true))

# Run this op.

Print Sess.run (c)

I ran a small network before, and the command line will have the following information after it has been successfully run:

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
Name:titan X (Pascal)
Major:6 minor:1 memoryclockrate (GHz) 1.531
Pcibusid 0000:02:00.0
Total Memory:11.90gib
Free Memory:7.96gib
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] dma:0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0)-> (device:0, Name: TITAN X (Pascal), PCI bus id:0000:02:00.0)

10. Monitor the status of the video card, command: Nvidia-smi. Enjoy the acceleration that the GPU brings.

This log is written in a hurry, fragmented, messy. Configuration process also got a lot of articles on the Internet to help, I also write an article, hoping to help others.

Reference Links:

[1]. Nvidia-docker Quick Start: Https://github.com/NVIDIA/nvidia-docker/wiki#quick-start

[2]. Manually assigned GPU/CPU equipment: http://www.tensorfly.cn/tfdoc/how_tos/using_gpu.html

[3]. TITAN X-Drive installation: http://blog.csdn.net/u010167269/article/details/50703948

[4]. How to run a TF in a container (using Nvidia-docker or Docker): Https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker

[4]. TF official mirror in Dockerhub: https://hub.docker.com/r/tensorflow/tensorflow/

[5]. Docker Installation: Http://www.tuicool.com/articles/JBnQja

[6]. Install Nvidia-docker:https://github.com/nvidia/nvidia-docker

[7]. Manual Mount gpu:http://blog.csdn.net/bychahaha/article/details/48493233

[8]. Graphics card Black screen resolution: http://www.2cto.com/os/201307/225026.html

[9]. Using GPU-supported Docker:https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker

[10]. VI Modification in recovery mode: http://blog.csdn.net/sqzhao/article/details/9812527

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Construction of TensorFlow deep learning environment based on Nvidia-docker under Ubuntu14.04

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Construction of TensorFlow deep learning environment based on Nvidia-docker under Ubuntu14.04

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support