GPUMan is a reliability tool designed for shared GPU environments, that prevents misbehaving processes from triggering a catastrophic, system-wide CUDA Out-of-Memory (OOM) failure.
Timeline
December 2025 - January 2026