[python-tensorflow-opt-cuda] Shared object symbol not found with CUDA

GhostOverFlow256 · 2026-04-10 01:43:41

I'm struggling to get GPU acceleration working with tensorflow on my system while using the [icode]extra/python-tensorflow-opt-cuda[/icode] package. Ultimately, tensorflow complains of a mismatch in lib files (expecting one version older), and symlinking them as a workaround does not work (kernel crashes).
System Info
uname:

Linux ghostdog 6.18.21-1-lts #1 SMP PREEMPT_DYNAMIC Thu, 02 Apr 2026 15:44:36 +0000 x86_64 GNU/Linux

Nvidia Driver Version: 595.58.03
Nvidia CUDA Version: 13.2
Device: NVIDIA GeForce RTX 3060
List of all my packages
My lspci data:

2b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3060] [10de:2487] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd Device [1458:407b]
Flags: bus master, fast devsel, latency 0, IRQ 78, IOMMU group 16
Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e0000000 (64-bit, prefetchable) [size=32M]
I/O ports at f000 [size=128]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: 
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
2b:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1) (prog-if 00 [HDA compatible])
Subsystem: Gigabyte Technology Co., Ltd Device [1458:407b]
Flags: bus master, fast devsel, latency 0, IRQ 79, IOMMU group 16
Memory at fc080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: 
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

The (to my knowledge) relevant packages are installed:

core/linux-firmware-nvidia 20260309-1 [installed]
extra/cuda 13.2.0-1 [installed]
extra/cudnn 9.20.0.48-1 [installed]
extra/egl-gbm 1.1.3-1 [installed]
extra/egl-wayland 4:1.1.21-1 [installed]
extra/egl-wayland2 1.0.1-1 [installed]
extra/egl-x11 1.0.5-1 [installed]
extra/ffnvcodec-headers 13.0.19.0-1 [installed]
extra/libnvidia-container 1.19.0-1 [installed]
extra/libvdpau 1.5-4 [installed]
extra/libxnvctrl 595.58.03-1 [installed]
extra/nccl 2.29.7-1 [installed]
extra/nvidia-container-toolkit 1.19.0-1 [installed]
extra/nvidia-open-lts 1:595.58.03-3 [installed]
extra/nvidia-settings 595.58.03-1 [installed]
extra/nvidia-utils 595.58.03-1 [installed]
extra/nvtop 3.3.2-1 [installed]
extra/opencl-nvidia 595.58.03-1 [installed]
extra/python-pycuda 2026.1-2 [installed]
extra/python-tensorflow-opt-cuda 2.20.0-5 [installed]
multilib/lib32-icu 78.3-1 [installed]
multilib/lib32-nvidia-utils 595.58.03-1 [installed]

Since (afaik) python-tensorflow-opt-cuda is an official extra package, meant to allow developers to use tensorflow cuda within the rolling environment of arch, is there a "neat" or "official" way to get this to work that I am missing?
Please see below for my full install process, rationaile, and code samples.
First, I installed the python-tensorflow-opt-cudapackage.
Then, created a new venv for my project, that can share that package:

python -m venv --system-site-packages training_venv
conda deactivate
source ./training_venv/bin/activate

Then I added it as a kernel to run my code in:

ipython kernel install --user --name training_venv --display-name "Python (training)"

And made sure to select it.
Then I added "/opt/cuda/lib64" to "/etc/ld.so.conf.d/cuda.conf" so jupyter can see the drivers, and ran "ldconfig" to update cache.
At this point I was getting multiple errors when trying to initialize tensorflow about it being unable to find libraries, which was found using this snippet:

import os
import ctypes
os.environ['TF_CUDA_PATHS'] = '/opt/cuda'
os.environ['LD_LIBRARY_PATH'] = '/opt/cuda/lib64:/usr/lib'
os.environ['TF_CPP_MAX_VLOG_LEVEL'] = "3"
## Manually try to load the runtime library to see the error
try:
ctypes.CDLL("/opt/cuda/lib64/libcudart.so")
print("CUDA Runtime library found and loaded!")
except Exception as e:
print(f"Failed to load CUDA library: {e}")
import tensorflow as tf
print("Physical Devices:", tf.config.list_physical_devices('GPU'))

To try fix this, I attempted to symlink the .so files, which is hacky, but I thought might work... Unfortunately, this meant tensorflow imports correctly, but cannot run even the most basic example, e.g.

import tensorflow as tf
## 1. Create two constants
a = tf.constant([[1.0, 2.0]])
b = tf.constant([[3.0, 4.0]])
## 2. Add them
c = a + b
## 3. Print the result and the device it lives on
print("Result:", c)
print("Device used:", c.device)

Spits out:

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1775783044.890444 150458 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9437 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:2b:00.0, compute capability: 8.6
Failed to initialize GPU device #0: shared object symbol not found

And more complex examples crash the entire python kernel.

GhostOverFlow256 · 2026-04-12 21:28:41

Running:

sudo downgrade cuda cudnn

and downgrading to the versions specified in the pkgbuild, which at time of writing is cuda-13.0.2-3-x86_64 and cudnn-9.16.0.29-1-x86_64.

This does not require downgrading nvidia drivers (which is a massive pain and borderline breaks my system).
After that I ran `source /etc/profile` as per the hints provided via pacman.

Running a log level 3 snippet (TF_CPP_MAX_VLOG_LEVEL=3) that simply loads and lists GPUs then generates the following output:

CUDA Runtime library found and loaded!
2026-04-13 07:17:45.056147: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:861] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2026-04-13 07:17:45.056168: I external/local_xla/xla/tsl/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2026-04-13 07:17:45.056172: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:901] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2026-04-13 07:17:45.056174: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:931] GCS additional header DISABLED. No environment variable set.
2026-04-13 07:17:45.056178: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:310] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2026-04-13 07:17:45.056180: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:310] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2026-04-13 07:17:45.056418: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcudart.so.12
2026-04-13 07:17:46.655123: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcudart.so.12
2026-04-13 07:17:46.656659: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:861] GCS cache max size = 0 ; block size = 67108864 ; max staleness = 0
2026-04-13 07:17:46.656664: I external/local_xla/xla/tsl/platform/cloud/ram_file_block_cache.h:64] GCS file block cache is disabled
2026-04-13 07:17:46.656667: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:901] GCS DNS cache is disabled, because GCS_RESOLVE_REFRESH_SECS = 0 (or is not set)
2026-04-13 07:17:46.656669: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:931] GCS additional header DISABLED. No environment variable set.
2026-04-13 07:17:46.656672: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:310] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
2026-04-13 07:17:46.656674: I external/local_xla/xla/tsl/platform/cloud/gcs_file_system.cc:310] GCS RetryConfig: init_delay_time_us = 1000000 ; max_delay_time_us = 32000000 ; max_retries = 10
Physical Devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2026-04-13 07:17:47.646524: I external/local_xla/xla/parse_flags_from_env.cc:214] For env var TF_XLA_FLAGS found arguments:
2026-04-13 07:17:47.646555: I external/local_xla/xla/parse_flags_from_env.cc:216]   argv[0] = <argv[0]>
2026-04-13 07:17:47.646563: I external/local_xla/xla/parse_flags_from_env.cc:214] For env var TF_JITRT_FLAGS found arguments:
2026-04-13 07:17:47.646565: I external/local_xla/xla/parse_flags_from_env.cc:216]   argv[0] = <argv[0]>
2026-04-13 07:17:47.647426: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcuda.so.1
2026-04-13 07:17:47.765859: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcudart.so.12
2026-04-13 07:17:48.279746: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcublas.so.12
2026-04-13 07:17:48.279823: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcublasLt.so.12
2026-04-13 07:17:48.281415: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcufft.so.11
2026-04-13 07:17:48.287271: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcusolver.so.11
2026-04-13 07:17:48.287312: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcusparse.so.12
2026-04-13 07:17:48.287405: I external/local_xla/xla/tsl/platform/default/dso_loader.cc:76] Successfully opened dynamic library libcudnn.so.9

Note the lack of errors for loading .so files, which is new.
After that, if I attempt to do some GPU operations as per the testing script in the package repo:

import tensorflow as tf
import os

os.environ['TF_CPP_MAX_VLOG_LEVEL'] = "3" # Enable verbose logging to spot any issues...

with tf.device("/GPU:0"):
    a = tf.random.normal([1, 2])

def temp(x):
    return tf.shape(x)[0]

tf.autograph.to_graph(temp)

It crashes the kernel specifying:

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1776029064.971957  381157 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9813 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:2b:00.0, compute capability: 8.6
Failed to initialize GPU device #0: shared object symbol not found

I'll keep trying to figure out further issues, but it seems like this is the most promising path yet...

Arch Linux

#1 2026-04-10 01:43:41

[python-tensorflow-opt-cuda] Shared object symbol not found with CUDA

#2 2026-04-12 21:28:41

Re: [python-tensorflow-opt-cuda] Shared object symbol not found with CUDA

Board footer