TensorFlow + AMD GPU + Docker + WSL2

I got stucked for the first time in a long time when I tried to set up my ML environment. I hope this will help someone.

ML環境を整えようとしたら久しぶりに詰まったのでメモ。

Executed Commands
- TensorFlow with DirectML
- Docker
How to solve it
Known Issues
- OOM when training over hours
Reflections

Executed Commands

TensorFlow with DirectML

$ pip install tensorflow-cpu>=2.9 tensorflow-directml-plugin

docs.microsoft.com (accessed on 2022/07/31) github.com (accessed on 2022/07/31)

Docker

$ docker run -it --device /dev/dxg --mount type=bind,src=/usr/lib/wsl,dst/usr/lib/wsl -e LD_LIBRARY_PATH=/usr/lib/wsl/lib
python:3.10 /bin/bash

How to solve it

I'm going to talk about my trouble shooting and how to find the solution.

The Problem and What I want

The problem I faced with is that "TensorFlow doesn't use GPU for calculation".

So I googled how to build a ML environment, and found some helpful information but not compatible with my environment.

I don't feel bad when Google doesn't give me any answers because it is a good chance to test my ability of problem-solving. But I had a hard time of half a day as the problem was more difficult than I expected.

The followings are ML environment examples I found on Google.

TensorFlow + CUDA
TensorFlow + CUDA + Docker
TensorFlow + ROCm + (Ubuntu)
TensorFlow + DirectML

TensorFlow with DirectML

TensorFlow DirectML is a Python library that allows your DirectX 12-compatible hardwares to be used for operations in TensorFlow. Though it supports TensorFlow<=1.15 for production, TensorFlow DirectML Plugin, which is still in early development, can be used with TensorFlow 2.

github.com (accessed on 2022/07/31)

Architecture

My assumptive case is to use GPU acceleration of TensorFlow on Docker of WSL2 backend.

Layer	Application
Library	TensorFlow
Container Virtualization	Docker
Linux Environment	WSL2
OS	Windows10/11
Hardware	AMD GPU

DirectX on WSL

First, you need to make DirectX available on WSL. Maybe you have nothing to do if you use the latest driver.

devblogs.microsoft.com (accessed on 2022/07/31)

www.amd.com (accessed on 2022/07/31)

/dev/dxg is the interface to DirectX. Make sure it exists on your WSL of any distribution.

$ ls -l /dev | grep dxg
crw-rw-rw- 1 root root  10,  63 Jul 31 02:55 dxg

TensorFlow on WSL

Next, you are going to check TensorFlow recognizes your GPU.

Create a Python virtual environment as you like on WSL and install libraries.

$ pip install tensorflow-cpu>=2.9 tensorflow-directml-plugin

For your information, you will get the following error if you install tensorflow instead of tensorflow-cpu.

stackoverflow.com (accessed on 2022/07/31)

Now your GPU is recognized by TensorFlow.

>>> from tensorflow.python.client import device_lib
 I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdirectml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.so
 I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdxcore.so
 I tensorflow/c/logging.cc:34] Successfully opened dynamic library libd3d12.so
 I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
>>> device_lib.list_local_devices()
 I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
 I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (AMD Radeon(TM) Graphics)
 I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
 W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
 I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/device:GPU:0 with 10995 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 16873716271696621736
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 11529922944
locality {
  bus_id: 1
}
incarnation: 7266701298382521906
physical_device_desc: "device: 0, name: DML, pci bus id: <undefined>"
xla_global_id: -1
]

Make sure the GPU is really used during calculation.

>>> import tensorflow as tf
>>> tf.debugging.set_log_device_placement(True)
>>> tf.add([1.0, 2.0], [3.0, 4.0])
 I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
 I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10995 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
 I tensorflow/core/common_runtime/eager/execute.cc:1323] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
 I tensorflow/core/common_runtime/eager/execute.cc:1323] Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
 I tensorflow/core/common_runtime/eager/execute.cc:1323] Executing op AddV2 in device /job:localhost/replica:0/task:0/device:GPU:0
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([4., 6.], dtype=float32)>

TensorFlow with DirectML on Docker

Create an image.

FROM python:3.10
RUN pip install -U pip \
 && pip install tensorflow-cpu>=2.9 tensorflow-directml-plugin

$ docker build -t tensorflow .

Tentatively, I created a container with /dev/dxg mounted.

$ docker run -it --rm --device /dev/dxg tensorflow
>>> import tensorflow as tf
 W tensorflow/c/logging.cc:37] Could not load dynamic library 'libdirectml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.so'; dlerror: libd3d12.so: cannot open shared object file: No such file or directory
 W tensorflow/c/logging.cc:37] Could not load DirectML.
2022-07-30 18:39:38.308633: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 0 compatible adapters.

TensorFlow tells me a shared library libd3d12.so could not be loaded. Then seach WSL for it.

$ which libd3d12.so
/usr/lib/wsl/lib/libd3d12.so

Start a container with /usr/lib/wsl mounted and with LD_LIBRARY_PATH set to /usr/lib/wsl/lib in order to be loaded by Python.

$ docker run -it --rm --device /dev/dxg --mount type=bind,src=/usr/lib/wsl,dst=/usr/lib/wsl -e LD_LIBRARY_PATH=/usr/lib/wsl/lib tensorflow
>>> import tensorflow as tf
 I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdirectml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.so
 I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdxcore.so
 I tensorflow/c/logging.cc:34] Successfully opened dynamic library libd3d12.so
 I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.

Now you are finished building your ML environment.

Known Issues

OOM when training over hours

W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.

As warning in log, the device demands as much memory as possible, which means your program will be down because of OOM if running for hours.

Reflections

今回嵌った原因は、いきなり TensorFlow から GPU を認識する方法を探したからであろう。一度アーキテクチャを整理して、WSL+TensorFlow で調査を進めたら上手くいった。複雑な問題は部分問題に分割して解くという、基本的な問題解決手法を思い出させてくれた。

CUDA であれば、TensorFlow + CUDA と Docker + CUDA が公式にサポートされているので環境構築はより楽だろうが、特定の企業のデバイスに依存しているのが気に入らない。 DirectML も DirectX という時点で Windows(Microsoft)依存ではあるが、デバイスのほうは Nvidia でも Intel でも AMD でも良い。

ハードウェア依存性を吸収するのがソフトウェアの役目なのだから、CUDA(Nvidia)には規格整備などを頑張ってもらいたいものである。