Deploying cuDSS on an HPC cluster without root

One may ask about getting cuDSS running on a shared compute cluster. The usual challenge is that NVIDIA only distributes cuDSS through a system-level package manager, and on most HPC systems you don't have root access. This post documents the full process — including every concrete error I ran into and how to fix it. I'm writing it in a Q&A format because that matches how the problems actually surfaced.

Background: What Is cuDSS?

cuDSS (CUDA Direct Sparse Solver) is NVIDIA's GPU-accelerated library for solving large sparse linear systems. If you work on numerical methods, scientific computing, or anything that involves factorising a sparse matrix — LU, Cholesky, LDL — cuDSS is worth knowing about. It sits on top of CUDA and can be dramatically faster than CPU-based solvers like PARDISO or MUMPS for problems where the matrix fits on the GPU.

The installation story is a bit awkward. NVIDIA ships it as a system package (`.deb` or `.rpm`), which assumes you have root. On a shared cluster, you almost certainly don't. The good news is that you can fully unpack the package and install into your home directory with no privileges at all.

Step 0: Check Your Environment First

How do I know if my cluster can even run cuDSS?

Three things need to be true before you start: you need a GPU node, CUDA needs to be installed system-wide, and the CUDA version needs to be recent enough (cuDSS 0.7.x requires CUDA 12.x or 13.x).

On most HPC systems CUDA is provided through the module system. Before doing anything else, check what's available:

module avail cuda

Then load a suitable version:

module load cuda/12.8    # adjust to whatever your cluster has

Now confirm the compiler is reachable:

nvcc --version
nvidia-smi

If nvcc is not found even after loading the module, talk to your sysadmin — the module may be misconfigured. If nvidia-smi fails on the login node, don't worry; login nodes often don't have GPUs. The important thing is that nvcc --version works.

How do I check whether my system is Debian/Ubuntu or something else?

This matters because NVIDIA ships different packages for different Linux families, and using the wrong one will cause subtle problems. The simplest check is:

cat /etc/os-release

You'll get output like this:

PRETTY_NAME="Ubuntu 24.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
ID=ubuntu
ID_LIKE=debian

The key line is ID_LIKE=debian. Ubuntu is a Debian derivative, so the .deb packages NVIDIA provides will work. If you see CentOS, Rocky Linux, or RHEL, the Debian package still technically works for a manual unpack (since we're just extracting files), but the library paths inside it are Debian-style and you'll need to account for that when setting up environment variables.

System	What to download	Notes
Ubuntu 22.04 / 24.04	Ubuntu package	Best match — use this when available
Debian 12	Debian 12 package	Direct match
CentOS / Rocky / RHEL	Debian package (manual unpack only)	Works, but `.so` paths are Debian-style — adjust accordingly

Step 1: Download cuDSS

The NVIDIA page only shows a .deb installer that requires sudo. Can I still use it without root?

Yes. The .deb file is just an archive. You can unpack it entirely without root and manually place the files in your home directory. This is completely standard on HPC clusters — it's how most user-installed software works.

Go to the NVIDIA cuDSS download page, select your OS and architecture, and use wget to grab the local repository installer. For Ubuntu 24.04:

wget https://developer.download.nvidia.com/compute/cudss/0.7.1/local_installers/cudss-local-repo-ubuntu2404-0.7.1_0.7.1-1_amd64.deb

If your login node doesn't have outbound internet access (common on secure clusters), download the file on your laptop and upload it with scp:

scp cudss-local-repo-ubuntu2404-0.7.1_0.7.1-1_amd64.deb user@cluster.example.com:~/

Step 2: Unpack the Package (Two Layers)

What does "unpack a .deb without root" actually mean in practice?

A .deb file is just a specially structured archive (technically an ar archive containing a tar). The tool dpkg-deb -x extracts the file tree without invoking apt or touching the system package database. No root needed.

The cuDSS package is a local repository installer — that means it's a wrapper that contains the actual library packages inside it. So there are two layers to unpack.

Layer 1: unpack the repo container

dpkg-deb -x cudss-local-repo-ubuntu2404-0.7.1_0.7.1-1_amd64.deb cudss_stage1

Common mistake: running dpkg-deb -x filename.deb without specifying a target directory will give you:

dpkg-deb: error: --extract needs a target directory.

The target directory (here cudss_stage1) is a required argument, not optional.

Now navigate into the unpacked repo and look at what's inside:

cd cudss_stage1/var/cudss-local-repo-ubuntu2404-0.7.1
ls *.deb

You'll find a set of library packages targeting different CUDA versions:

cudss0_0.7.1-1_amd64.deb
cudss_0.7.1-1_amd64.deb
cudss-cuda-12_0.7.1.4-1_amd64.deb
cudss-cuda-13_0.7.1.4-1_amd64.deb
libcudss0-cuda-12_0.7.1.4-1_amd64.deb
libcudss0-cuda-13_0.7.1.4-1_amd64.deb
libcudss0-dev-cuda-12_0.7.1.4-1_amd64.deb
libcudss0-dev-cuda-13_0.7.1.4-1_amd64.deb
libcudss0-static-cuda-12_0.7.1.4-1_amd64.deb
libcudss0-static-cuda-13_0.7.1.4-1_amd64.deb

Layer 2: unpack all the library packages into your install directory

mkdir -p ~/software/cudss

for f in *.deb; do
    dpkg-deb -x "$f" ~/software/cudss
done

This loop unpacks every package — both CUDA 12 and CUDA 13 variants — into ~/software/cudss. That's fine; they don't conflict, and you'll select the right version via environment variables later.

Confirm the structure is what you expect:

ls ~/software/cudss/usr
# include  lib  share  src

Step 3: Understand the Directory Layout

The library paths look unusual — why is everything nested under libcudss/12/ instead of a flat lib directory?

cuDSS 0.7.x ships separate builds for CUDA 12 and CUDA 13, kept in separate subdirectories so they can coexist without conflict. The layout looks like this:

~/software/cudss/usr/
├── include/
│   └── libcudss/
│       ├── 12/
│       │   ├── cudss.h
│       │   ├── cudss_distributed_interface.h
│       │   └── cudss_threading_interface.h
│       └── 13/
│           └── ...
└── lib/
    └── x86_64-linux-gnu/
        └── libcudss/
            ├── 12/
            │   ├── libcudss.so
            │   ├── libcudss.so.0
            │   ├── libcudss.so.0.7.1
            │   ├── libcudss_static.a
            │   ├── libcudss_commlayer_nccl.so
            │   ├── libcudss_commlayer_openmpi.so
            │   ├── libcudss_mtlayer_gomp.so
            │   └── cmake/cudss/
            │       ├── cudss-config.cmake
            │       └── cudss-targets.cmake
            └── 13/
                └── ...

You can verify this with:

find ~/software/cudss/usr/lib -name "*cudss*"
find ~/software/cudss/usr/include -name "*cudss*"

If both commands return files, the unpack worked correctly.

Step 4: Check Your CUDA Version and Choose the Right Path

I unpacked both CUDA 12 and CUDA 13 variants. How do I know which one to use?

Match to your system's CUDA installation. Run:

nvcc --version

Output like this tells you everything you need:

nvcc: NVIDIA (R) Cuda compiler driver
Cuda compilation tools, release 12.9, V12.9.41

The major version here is 12, so use the 12 subdirectory throughout. If you were on CUDA 13, you'd use 13 instead. The two builds are not interchangeable — linking the CUDA 13 build against a CUDA 12 runtime will produce errors at runtime or link time.

Step 5: Set Up Environment Variables

Add the following to your ~/.bashrc. These lines let the compiler, linker, and runtime loader all find cuDSS automatically without you having to specify paths manually each time:

export CUDSS_ROOT=$HOME/software/cudss/usr
export CUDSS_CUDA_MAJOR=12   # change to 13 if your system runs CUDA 13

export CPATH=$CUDSS_ROOT/include/libcudss/$CUDSS_CUDA_MAJOR:$CPATH
export LIBRARY_PATH=$CUDSS_ROOT/lib/x86_64-linux-gnu/libcudss/$CUDSS_CUDA_MAJOR:$LIBRARY_PATH
export LD_LIBRARY_PATH=$CUDSS_ROOT/lib/x86_64-linux-gnu/libcudss/$CUDSS_CUDA_MAJOR:$LD_LIBRARY_PATH
export CMAKE_PREFIX_PATH=$CUDSS_ROOT/lib/x86_64-linux-gnu/libcudss/$CUDSS_CUDA_MAJOR/cmake/cudss:$CMAKE_PREFIX_PATH

Then reload it:

source ~/.bashrc

What each variable does:

Variable	Purpose
`CPATH`	Tells the compiler where to find header files (`cudss.h`). Eliminates the need for `-I...` flags.
`LIBRARY_PATH`	Tells the linker where to find `libcudss.so` at compile time. Eliminates the need for `-L...` flags.
`LD_LIBRARY_PATH`	Tells the dynamic loader where to find `libcudss.so` at runtime. Without this, programs compile but crash when run.
`CMAKE_PREFIX_PATH`	Tells CMake where to look for `cudss-config.cmake`, enabling `find_package(cudss)` to work.

Step 6: Compile a Minimal Test

Create a small test file to confirm everything is wired up correctly:

cat > ~/test_cudss.cu <<'EOF'
#include <cudss.h>
#include <cstdio>

int main() {
    printf("cuDSS header found.\n");
    return 0;
}
EOF

With your environment variables set, compiling is now just:

nvcc ~/test_cudss.cu -lcudss

You might see a warning like this during compilation:

nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release

This is harmless. It doesn't mean your installation is broken. Skip ahead to the explanation section if you want to understand it in detail.

Now run the compiled program:

./a.out
# cuDSS header found.

If you see that output, header discovery, linking, and runtime loading all work. Installation is complete.

Step 7: Use cuDSS in Your Own Code

Now that the environment is set up, do I still need to write out the full include/library paths every time I compile?

No. Because CPATH, LIBRARY_PATH, and LD_LIBRARY_PATH are set in your .bashrc, nvcc picks them up automatically. Your standard compile command is now just:

nvcc your_code.cu -lcudss

If you want to suppress the deprecation warning and explicitly target your cluster's GPU architecture (which is generally good practice), add -arch:

# For A100 nodes
nvcc your_code.cu -lcudss -arch=sm_80

# For H100 nodes
nvcc your_code.cu -lcudss -arch=sm_90

# Suppress the warning without specifying architecture
nvcc your_code.cu -lcudss -Wno-deprecated-gpu-targets

If you're not sure what GPU architecture your cluster has, check with:

nvidia-smi --query-gpu=name,compute_cap --format=csv,noheader

Using cuDSS with CMake

The cuDSS package ships with proper CMake config files, so find_package works cleanly once CMAKE_PREFIX_PATH is set. A minimal CMakeLists.txt:

cmake_minimum_required(VERSION 3.20)
project(my_solver LANGUAGES CXX CUDA)

find_package(cudss REQUIRED CONFIG)

add_executable(my_solver solver.cu)
target_link_libraries(my_solver PRIVATE cudss)

Configure and build:

cmake -S . -B build
cmake --build build

No extra flags needed — CMAKE_PREFIX_PATH already points CMake at the right config directory.

Step 8: Running on a Compute Node with SLURM

My compilation works on the login node, but will it work on a compute node too?

Compilation typically stays on the login node. The compute node is where you run your program. The important thing is that your $HOME directory is accessible from compute nodes — on virtually all HPC systems it's on a shared filesystem (NFS or Lustre), so the paths you've set up work everywhere.

The one thing to watch: if your SLURM batch script doesn't automatically source ~/.bashrc, the environment variables won't be set when your job runs. The safest approach is to explicitly export them in the script itself:

#!/bin/bash
#SBATCH --job-name=cudss_job
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G
#SBATCH --gres=gpu:1
#SBATCH --time=02:00:00
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

# Load system CUDA
module load cuda/12.8

# Set cuDSS paths explicitly (don't rely on .bashrc being sourced)
export CUDSS_ROOT=$HOME/software/cudss/usr
export CUDSS_CUDA_MAJOR=12
export CPATH=$CUDSS_ROOT/include/libcudss/$CUDSS_CUDA_MAJOR:$CPATH
export LIBRARY_PATH=$CUDSS_ROOT/lib/x86_64-linux-gnu/libcudss/$CUDSS_CUDA_MAJOR:$LIBRARY_PATH
export LD_LIBRARY_PATH=$CUDSS_ROOT/lib/x86_64-linux-gnu/libcudss/$CUDSS_CUDA_MAJOR:$LD_LIBRARY_PATH

# Run your program
./my_solver

Submit with:

sbatch run.sh

Testing interactively on a GPU node

If you want to run ./a.out interactively on a node that actually has a GPU (which the login node may not), request an interactive session:

srun --pty --ntasks=1 --cpus-per-task=4 --mem=16G --gres=gpu:1 --partition=gpu --time=00:30:00 bash

Once you're in the shell on the compute node, load your modules and run normally. The same environment variable setup applies.

Troubleshooting Common Errors

fatal error: cudss.h: No such file or directory

The compiler can't find the header. Either CPATH isn't set, or it's pointing at the wrong level. The header lives at:

~/software/cudss/usr/include/libcudss/12/cudss.h

So the correct include path is .../include/libcudss/12, not .../include. Verify:

echo $CPATH
# should contain: /home/yourname/software/cudss/usr/include/libcudss/12

If it's missing, re-source your bashrc: source ~/.bashrc

cannot find -lcudss

The linker can't find libcudss.so. Same idea — the library is nested one level deeper than you might expect. The correct lib path is:

~/software/cudss/usr/lib/x86_64-linux-gnu/libcudss/12/

Check it exists:

ls ~/software/cudss/usr/lib/x86_64-linux-gnu/libcudss/12/ | grep "^libcudss\.so"

error while loading shared libraries: libcudss.so: cannot open shared object file

The program compiled but the dynamic loader can't find the .so at runtime. This is almost always a missing or incorrect LD_LIBRARY_PATH. Fix it temporarily to confirm:

export LD_LIBRARY_PATH=$HOME/software/cudss/usr/lib/x86_64-linux-gnu/libcudss/12:$LD_LIBRARY_PATH
./a.out

If that works, make sure the same line is in your ~/.bashrc and your SLURM script.

nvcc fatal: Unknown option '-Wl,-rpath,...'

This trips people up because the syntax for passing flags to the linker is different in nvcc versus gcc/g++. In gcc you write -Wl,-rpath,/path; in nvcc you have to use -Xlinker:

# ❌ Does not work with nvcc
nvcc code.cu -Wl,-rpath,/some/path -lcudss

# ✅ Correct nvcc syntax
nvcc code.cu \
  -Xlinker -rpath -Xlinker /some/path \
  -lcudss

That said, if you have LD_LIBRARY_PATH set correctly, you usually don't need rpath at all.

What is that deprecation warning about GPU architectures?

The full warning is:

nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release

It has nothing to do with cuDSS. It means: in a future CUDA release, nvcc will stop generating native PTX/SASS for GPU architectures older than sm_75 (the Turing generation — RTX 20xx, T4, Quadro RTX). This matters only if you're targeting GPUs older than that — Pascal (sm_60/61), Volta (sm_70), and below.

If your cluster runs A100, H100, L40, or any modern GPU, you're on sm_80 or newer and this warning is completely irrelevant. You can silence it with:

nvcc your_code.cu -lcudss -Wno-deprecated-gpu-targets

Or better, just tell nvcc exactly what you're targeting and it won't warn at all:

# sm_80 = A100, sm_90 = H100
nvcc your_code.cu -lcudss -arch=sm_80

Quick Reference

Once everything is installed and your ~/.bashrc is set up, here's the cheat sheet for daily use:

# Compile (after environment is set in .bashrc)
nvcc your_code.cu -lcudss

# Compile with explicit architecture (recommended)
nvcc your_code.cu -lcudss -arch=sm_80

# Compile with CMake
cmake -S . -B build && cmake --build build

# Check that libraries are visible
ls $CUDSS_ROOT/lib/x86_64-linux-gnu/libcudss/$CUDSS_CUDA_MAJOR | grep "\.so"

# Verify the installed version
dpkg-deb -I ~/cudss-local-repo-ubuntu2404-0.7.1_0.7.1-1_amd64.deb | grep Version

# Interactive GPU session for testing
srun --pty --ntasks=1 --cpus-per-task=4 --mem=16G --gres=gpu:1 --partition=gpu --time=00:30:00 bash

Ask your sysadmin first. Before going through all of this, it's worth running module avail cudss on your cluster. Some HPC systems already have cuDSS installed as a module — if that's the case, module load cudss is all you need and you can skip everything in this post.

Deploying cuDSS on an HPC Cluster Without Root