Dell servers with VOLTA V100 GPUs

How to connect

We use Grid Engine to access to the node.

Only authorized users are allowed to use this machine

Objective : open a shell or run directly commands on the server.

  • login node : mesologin1.univ-fcomte.fr or mesoshared.univ-fcomte.fr
  • queue : volta.q
  • request GPU : -l gpu=N where 0< N < =2 ; default value gpu=1
  • Default SGE setting for h_vmem value is 4Go per core. Use -l h_vmem to request more memory. We suggest using more than 20G.For example: -l h_vmem=20G for 20 Go memory. Warning Job will be killed if memory exceed
  • HOME folder is mounted in read only mode.
  • Use WORK folder (cd $WORK) as your working space.

Examples

Shell session

  • Request shell session with 1 GPU and 20Go memory for 2 hours:
[user@mesologin1 ~]$ qlogin -q volta.q  -l h_vmem=20G -l h_rt=2:00:00

Once connected, change directory to WORK

[user@node2-69 ~]$ cd WORK

Check for allocated #GPU:

[user@node2-69 ~]$ nvidia-smi 
Thu Feb 27 15:29:50 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:3B:00.0 Off |                    0 |
| N/A   26C    P0    25W / 250W |      0MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
 
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
  • Request shell session with 2 GPU and 64Go memory:
[user@mesologin1~]$ qlogin -q volta.q -l gpu=2 -l h_vmem=64G

All software are installed in the default path, you can invoke directly nvcc for example:

[user@node2-69 ~]$ nvcc program.cu -o program

Objective : run programs in batch using SGE.

Example Script to adapt for your needs

gpu.sge
#!/bin/bash -l
 
#$ -q volta.q
 
#$ -l gpu=1  ## adapter selon besoin 
 
#$ -l h_vmem=30G ## job will cancel if memory exceed
 
#$ -N Test_GPU
 
#################"
## Please adapt this script for your need
#################
 
## 1-vanila python 
python GPU_Program.py
 
## 2-anaconda
export PATH="/opt/anaconda3/bin:$PATH"
python GPU_Program.py
 
## 2-bis anaconda with env
conda activate $WORK/conda/meso && python GPU_Program.py
 
##3-singularity 
 
singularity exec --nv tensorflow-gpu.simg python mytensorflow.py

Submit SGE job:

[user@mesologin1 ~]$ qsub  gpu.sge

Installing and Running software

This host has its own software installed. It doesn't share any software with the rest of cluster (module system). Only Deep-leaning Software based on python are installed.

Please use anaconda for installing packages.

Vanilla Python

Several programs are installed by default python2.* and python3.*

Use pip list or pip3 list command to view installed packages:

$ pip list
gpustat             0.5.0    
grpcio              1.18.0   
h5py                2.9.0    
Keras               2.2.4    
Keras-Applications  1.0.6    
Keras-Preprocessing 1.0.5    
Markdown            3.0.1    
numpy               1.16.0   
nvidia-ml-py        7.352.0  
pbr                 5.1.1    
protobuf            3.6.1    
scipy               1.2.0    
setuptools          20.7.0   
six                 1.10.0   
tensorboard         1.12.2 
tensorflow-gpu      1.12.0

Run python program:

[user@node2-69~]$ python my_python_program.py     

You can install new package without root access. Packages needs to be installed in $WORK because $HOME is read only. The Idea is to set WORK as HOME directory (export HOME=$WORK) and using –user option of pip or pip3

For example let's install panda:

$ export HOME=$WORK
$ pip install --user panda 

Voilà ! ; That's all!

For better performances Anaconda python V3 is installed in /opt/. Several optimized scientific packages are available.

Using anaconda with conda

Conda is powerfull package system management and environment.

To use conda with anaconda:

EDIT (10/03/2020): we added module command only for local software in these nodes. To load anaconda simply use:

$ module load anaconda

Check for available packages:

$ conda list
mkl                       2019.1                      144  
mkl-service               1.1.2            py37he904b0f_5  
mkl_fft                   1.0.6            py37hd81dba3_0  
mkl_random                1.0.2            py37hd81dba3_0  
more-itertools            4.3.0                    py37_0  
mpc                       1.1.0                h10f8cd9_1  
mpfr                      4.0.1                hdf1c602_3  
mpmath                    1.1.0                    py37_0  
msgpack-python            0.5.6            py37h6bb024c_1  
multipledispatch          0.6.0                    py37_0  
navigator-updater         0.2.1                    py37_0  
nbconvert                 5.4.0                    py37_1  
nbformat                  4.4.0                    py37_0  
ncurses                   6.1                  he6710b0_1  
networkx                  2.2                      py37_1  
nltk                      3.4                      py37_1  
nose                      1.3.7                    py37_2  
notebook                  5.7.4                    py37_0

You may need to install additional package without having root access. We instead recommend building a virtual environment locally and install packages there.
But before we need to tell conda to use WORK to store pkgs : export HOME=$WORK

  1. Create a conda environment
    $ conda create -y --prefix $WORK/conda-env

    replace conda-env by a valid name.

  2. Activate the environment you just created
    $ source activate $WORK/conda-env  
  3. Download and install the conda package in the environment
     $ conda install package

    replace package by the package you wan to install, for example panda

  4. Run your analysis from within the environment
  5. Deactivating environment, once you are done with your analysis- you can deactivate the conda environment by running the command
    $ conda deactivate

Example: installing Tensorflow GPU on local environment

We will use an environment locally (in our WORK) and we install all needed package on it.

$ conda create -y --prefix $WORK/conda/meso
$ conda activate $WORK/conda/meso
$ (tensorflowGPU) user@node2-69:~$ conda install tensorflow-gpu 
$ (tensorflowGPU) user@node2-69: python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf # this may take too long
>>> tf.test.gpu_device_name()
...
2020-02-27 15:18:44.636689: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56308c48ef00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-27 15:18:44.636746: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.0
'/device:GPU:0'

  • To list your conda environment use
    conda env list
  • To remove an environment
    conda remove --name myenv --all

This section is for exprerts and advanced usage

We can use Singularity's pull sub-command to import a container image directly from Docker Hub without having root or superuser privileges (or Docker) on your host system.

For example, to use Singularity to import the image of the latest Tensorflow (GPU version) into the present working directory.

$ singularity pull tensorflow-gpu.simg docker://tensorflow/tensorflow:latest-gpu

Notice that we prepend the original URL with

docker://

prefix.

This will download and build singularity image named tensorflow-gpu.simg

We can spawn an interactive shell within a container with the shell sub-command.

$ singularity shell --nv tensorflow-gpu.simg

Or execute directly tensorflow within container:

$ singularity exec --nv tensorflow-gpu.simg python mytensorflow.py

The parameter –nv is mandatory for GPU applications.

Command line

From login or mesoshared nodes you can access to all GPUs stats using voltastat command.

For example:

user@mesologin1:~# voltastat
 
node2-69                 Wed Mar  4 23:47:01 2020  440.33.01
[0] Tesla V100-PCIE-16GB | 36'C,   80 % | 15553 / 16160 MB | plop(15541M)
[1] Tesla V100-PCIE-16GB | 28'C,   0 % |     0 / 16160 MB |
[2] Tesla V100-PCIE-16GB | 28'C,   0 % |     0 / 16160 MB |
 
 
node3-70                 Wed Mar  4 23:46:01 2020  440.33.01
[0] Tesla V100-SXM2-32GB | 34'C,   0 % |     0 / 32510 MB |
[1] Tesla V100-SXM2-32GB | 32'C,   0 % |     0 / 32510 MB |
[2] Tesla V100-SXM2-32GB | 30'C,   0 % |     0 / 32510 MB |
[3] Tesla V100-SXM2-32GB | 34'C,   0 % |     0 / 32510 MB |

voltastat has 10s granularity. i.e it request GPU stats each 10 second interval

Ganglia Graph


FAQ

Use this SGE option: -l dgx=1

On the submit host, use this command: voltastat this will pull the GPU each 10s

Once connected to the node, create or enter to your environment. For example :

$ conda activate $WORK/conda/meso 

and install caffe-gpu with conda:

$ conda install caffe-gpu

To test your installation

import caffe
caffe.set_mode_gpu()
caffe.set_device(0)