Nalin Angrish
GPUMan - GPU Memory Guard for Shared Systems

GPUMan - GPU Memory Guard for Shared Systems

Back to all projects
c++
cuda
gpus
cloud
linux
systems-programming
nvml
daemon

GPUMan is a reliability tool designed for shared GPU environments, that prevents misbehaving processes from triggering a catastrophic, system-wide CUDA Out-of-Memory (OOM) failure.

Project Details

Timeline

December 2025 - January 2026