Construction of TensorFlow deep learning environment based on Nvidia-docker under Ubuntu14.04

Source: Internet
Author: User
Tags docker hub jupyter notebook docker run xeon e5

* Record the configuration process, the content is basically the configuration of the problems encountered in each step and the corresponding method found on the Internet, the format will be more confusing. Make some records for the younger brothers and sisters to build a new server to provide some reference (if the teacher to buy a new server), but also hope to help people in need.


System configuration: CPU Xeon e5-2620 V3, Gpu:nvida TITAN X, Os:ubuntu 14.04


Laboratory to block Titan X, the server finally has a video card. On the weekends the outfit drives the GPU-supported TensorFlow development environment (it is slow to run the TF with the CPU)

First of all, the TITAN x rated power is 300W (that is, the high load in the long run when the average power at 300W, so peak power may be super), to ensure that the power of the host power is sufficient.

TITAN X does not have a VGA port and needs to be fitted with a adapter to connect the monitor.


2. After the display of the video card output, into Ubuntu when the black screen, the command line can not enter the kind.

After Baidu, try to enter recovery mode,: Edit Grub
# Vi/etc/default/grub
Modified as follows:


1
grub_cmdline_linux_default= "Quiet Splash nvidia.modeset=0"
2

3

4
#或者这样也可
5

6

7
#GRUB_CMDLINE_LINUX_DEFAULT = "Quiet Splash Nomodeset"


will be able to solve the black screen. In the recovery mode, VI cannot save the changes and will need to be mounted again:


# mount-n-O remount/


Workaround here Reference:

Http://www.2cto.com/os/201307/225026.html

http://blog.csdn.net/sqzhao/article/details/9812527


3. Solve the problem of black screen, the normal boot into the system. Start installing NVIDIA drive. Before I installed a version of the very old video card driver, using the command line installation, very troublesome ... Also have to disable the display, close the graphical interface, Balabala ... Last two or three days of tossing and finishing. However, this piece of Titan X is not necessary, through the Package manager in the graphical interface to click a few. A few minutes to fix, no longer convenient. First go to Ubuntu Software Center to install Synaptic Package Manager (the New installation package manager). Open Synaptic, Input: nvidia, select nvidia-352 (according to the graphics card model selection), and then point Apply,synaptic Package Manager will be installed in nvidia-352, all installed together, after installation, you will find that in fact, many things installed. So this installation drive way, more than one of their own installation of those bags, insurance a lot. After installation, reboot. Click on the upper right corner of the computer, found that the graphics inside the show has been Titan, and accomplished.


4. Then install Docker. Ubuntu is the default with Docker. However, there is usually a shortage of versions that need to be manually upgraded to the latest version.

Install Upgrade DOCEKR Reference Blog:

Http://www.tuicool.com/articles/JBnQja;

Http://www.linuxidc.com/Linux/2015-02/113784.htm


5. Install Nvidia-dockerplugin. Sometimes you may experience a docker-engine version of the problem. Follow the first link in step 4th and try again.

Reference Link: https://github.com/NVIDIA/nvidia-docker

6. Drop down mirror. In the Docker hub, search for tensorflow mirrors. The official image of TensorFlow is used here.

Link: https://hub.docker.com/r/tensorflow/tensorflow/

There is a CPU version, GPU (CUDA) version of the container.

Use the command to start the container, sometimes sudo:

$ nvidia-docker run-it-p 8888:8888tensorflow/tensorflow:latest-gpu

Where-P is the port mapping. You can add bash after the command, so go into the Docker shell and do something. When you need to start Jupyter notebook, run run_jupyter.sh in the root directory

7. Local hard disk mount to container. Command –v/Host directory:/container Directory

8. Sometimes it's a problem to start nvidia-docker.

The advice given in the GitHub of TensorFlow is: "Note:if you would have a problem running the nvidia-docker you are trythe old way we have." But it is not recomended. If you find a bug innvidia-docker the it there and try using the Nvidia-docker asdescribed above. "

Link: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker

To use the command:

$ export cuda_so=$ (\ls/usr/lib/x86_64-linux-gnu/libcuda.* | xargs-i{} echo '-v {}:{} ')

$ export devices=$ (\ls/dev/nvidia* | xargs-i{} echo '--device {}:{} ')

$ docker run-it-p 8888:8888 $CUDA _so$devices Gcr.io/tensorflow/tensorflow:latest-gpu

This method requires that the GPU be mounted manually.

In the Docker shell, view the video card device through Ls-la/dec |grep nvidia, and then mount it in turn.

# mount the GPU command, each device must be mounted

Docker run-it--name name-v/home/:/mnt/home--privileged=true--device/dev/nvidia-uvm:/dev/nvidia-uvm--device/dev/ NVIDIA0:/DEV/NVIDIA0--device/dev/nvidiactl:/dev/nvidiactlmyconda:cuda Bash

#示例:

Docker run-it-p 8888:8888-v/home/:/mnt/home--PRIVILEGED=TRUE--DEVICE/DEV/NVIDIA-UVM:/DEV/NVIDIA-UVM--device/dev/ Nvidia0:/dev/nvidia0--device/dev/nvidiactl:/dev/nvidiactl--device/dev/nvidia-modeset:/dev/nvidia-modeset $CUDA _ So $DEVICESGCR. IO/TENSORFLOW/TENSORFLOW:LATEST-GPU Bash

9. After running Jupyternotebook, test the GPU using the following code, if there is no error, the call succeeds:

Import TensorFlow as TF

# '/GPU: ' Multi-GPU-assigned nth-block GPU

With Tf.device ('/gpu:2 '):

A =tf.constant ([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name= ' a ')

b =tf.constant ([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name= ' B ')

C =tf.matmul (A, B)

# New Session Withlog_device_placement and set to True.

Sess =TF. Session (CONFIG=TF. Configproto (Log_device_placement=true))

# Run this op.

Print Sess.run (c)

I ran a small network before, and the command line will have the following information after it has been successfully run:

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA Library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
Name:titan X (Pascal)
Major:6 minor:1 memoryclockrate (GHz) 1.531
Pcibusid 0000:02:00.0
Total Memory:11.90gib
Free Memory:7.96gib
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] dma:0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0)-> (device:0, Name: TITAN X (Pascal), PCI bus id:0000:02:00.0)


10. Monitor the status of the video card, command: Nvidia-smi. Enjoy the acceleration that the GPU brings.


This log is written in a hurry, fragmented, messy. Configuration process also got a lot of articles on the Internet to help, I also write an article, hoping to help others.


Reference Links:

[1]. Nvidia-docker Quick Start: Https://github.com/NVIDIA/nvidia-docker/wiki#quick-start

[2]. Manually assigned GPU/CPU equipment: http://www.tensorfly.cn/tfdoc/how_tos/using_gpu.html

[3]. TITAN X-Drive installation: http://blog.csdn.net/u010167269/article/details/50703948

[4]. How to run a TF in a container (using Nvidia-docker or Docker): Https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker

[4]. TF official mirror in Dockerhub: https://hub.docker.com/r/tensorflow/tensorflow/

[5]. Docker Installation: Http://www.tuicool.com/articles/JBnQja

[6]. Install Nvidia-docker:https://github.com/nvidia/nvidia-docker

[7]. Manual Mount gpu:http://blog.csdn.net/bychahaha/article/details/48493233

[8]. Graphics card Black screen resolution: http://www.2cto.com/os/201307/225026.html

[9]. Using GPU-supported Docker:https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker

[10]. VI Modification in recovery mode: http://blog.csdn.net/sqzhao/article/details/9812527



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.