BLOG POSTS

MangoHost Blog / Use nvtop to Monitor NVIDIA GPU Performance in Linux

Use nvtop to Monitor NVIDIA GPU Performance in Linux

🚀 performance 🧐 monitoring

Table of Contents

What’s This Post About?
A Real-World GPU Headache
Why GPU Monitoring on Linux Matters
How Does nvtop Actually Work?
Use Cases: When and Why to Use nvtop
Fast and Easy Setup: Your Step-By-Step Guide
Mini Glossary: Real-Talk Definitions
Case Studies, Comic Comparisons & Pro Tips
Beginner Mistakes, Myths, and Alternatives
nvtop or Not? The Decision Flowchart
Automation, Scripting & Unusual Tricks
Short Story: The Drowning Admin
Wrap-up & Recommendations

What’s This Post About?

If you’ve ever run GPU workloads on a Linux server—whether it’s a slick VPS, a Dockerized deep learning rig, or a gnarly rackmount behemoth—you’ve probably wondered: “How busy are my GPUs? Who’s chewing up all that sweet VRAM? Why is everything so slow?”
This post is your fast lane to mastering nvtop, a real-time, terminal-based NVIDIA GPU monitor for Linux. We’ll dig into what makes nvtop tick, how to get it running in minutes, and why every coder, sysadmin, and ML enthusiast should have this tool in their arsenal. Get ready for GIFs-in-your-head, comic metaphors, and the kind of practical advice you won’t find in the man pages.

A Real-World GPU Headache

Picture this: It’s 3am. You’re on call. The production AI recommender has ground to a halt. Slack is blowing up. Your cloud bill ticks up by the minute. You SSH into your dedicated server…
nvidia-smi shows something, but it’s clunky. You want to see which process is eating GPU, how much VRAM you have left, and live stats—not just a snapshot. And you want it now.
That’s where nvtop comes in. It’s like htop, but for NVIDIA GPUs—colorful, interactive, and perfect for the terminal crowd.

Why GPU Monitoring on Linux Matters

GPUs are expensive. You want to squeeze every flop out of them—and not let idle VRAM go to waste.
Multi-user servers are chaos. Who is running what, and why is my training job crawling?
Docker, Kubernetes, and cloud? Dynamic, ephemeral workloads make GPU monitoring a moving target.
nvidia-smi is cool, but static. Sometimes you want streaming, interactive updates. Especially when debugging runaway jobs.

Whether you’re renting a VPS, deploying on a dedicated server, or spinning up containers, nvtop is your new best friend for real-time GPU insights.

How Does nvtop Actually Work?

Under the hood, nvtop is a C-based, ncurses-powered terminal app. It queries NVIDIA’s libnvidia-ml (the same library as nvidia-smi), but presents the info in a dynamic, interactive TUI (text user interface). What does that mean for you?

Live stats. Watch GPU utilization, memory usage, fan speed, and temperature update in real time.
Process list. See which PIDs are using the GPU, how much VRAM each process is eating, and their command lines.
Multiple GPUs? nvtop shows them all, side-by-side. No more guessing which card is melting.
Low overhead. It’s lightweight—perfect for SSH, tmux, or screen sessions.
Keyboard controls. Sort, filter, and zoom without leaving your terminal.

Algorithmically, nvtop is polling the NVIDIA driver via the Management Library every second or so, parsing process tables, and rendering pretty graphs using ASCII art and colors. It’s like htop and nvidia-smi had a beautiful, geeky baby.

Use Cases: When and Why to Use nvtop

Debugging slow jobs. Instantly see if your code is bottlenecked by GPU or CPU.
Multi-user environments. Track who is using what (and when it’s time to send “please kill your job” messages).
Cloud cost optimization. Identify idle GPUs and right-size your fleet.
Docker/K8s visibility. See inside containers (as long as they have access to the NVIDIA device files).
Home lab bragging rights. Show off your RTX 4090’s utilization in glorious ASCII at your next meetup.
Automated alerts/scripts. Use nvtop’s output (or the underlying NVML API) to trigger alerts when thermals spike or VRAM runs out.
Remote monitoring. Combine with SSH, tmux, and even web-based terminal dashboards.

Fast and Easy Setup: Your Step-By-Step Guide

Let’s get you running in five minutes or less. Assumes you have NVIDIA drivers and CUDA set up already.

Step 1: Prerequisites

Linux (Ubuntu, Debian, CentOS, Fedora, Arch, etc.)
NVIDIA GPU with driver installed
libnvidia-ml (part of NVIDIA drivers)
ncurses-dev / development tools (for source install)

Step 2: Installing nvtop

Ubuntu/Debian (20.04+):
```
sudo apt update
sudo apt install nvtop
```
Fedora:
```
sudo dnf install nvtop
```
Arch Linux:
```
sudo pacman -S nvtop
```

Other distros / Building from source:
Clone the repo: https://github.com/Syllo/nvtop
Build with cmake:

sudo apt install cmake libncurses5-dev libncursesw5-dev git
git clone https://github.com/Syllo/nvtop.git
cd nvtop
cmake .
make
sudo make install

Detailed instructions: nvtop GitHub

Step 3: Run It!

nvtop

Use the up/down arrows to navigate, or h for help.
Sort by VRAM, utilization, PID, or command line.
Press q to quit.

Step 4: (Optional) Docker or Headless Use

Make sure your container has access to /dev/nvidia* devices and the right drivers. Use the NVIDIA Container Toolkit.
Install nvtop inside your container, as above.

Mini Glossary: Real-Talk Definitions

VRAM: The GPU’s working memory, like RAM on your PC but faster and shinier.
Utilization: How busy is your GPU? 0% = sleeping, 100% = melting.
Process: A running program using the GPU, e.g., your PyTorch model or a random cryptominer (uh oh).
Fan Speed: Self-explanatory. If it’s at 100%, your server may soon become a jet engine.
ncurses: A library for making terminal apps look not-ugly.
NVML: NVIDIA’s Management Library, the API for querying GPU stats.

Case Studies, Comic Comparisons & Pro Tips

Comic Metaphor Table: “GPU Monitoring Tools Fight Club”

Tool	Personality	Strengths	Weaknesses	Best Use Case
nvidia-smi	The Stoic Librarian	Always available, precise	Static, no live updates, no colors	Scripted logs, one-off checks
nvtop	The Rave DJ	Colorful, real-time, interactive	Terminal only, NVIDIA-only	Debugging live workloads, daily ops
gpustat	The Twitter Addict	Compact summaries, JSON output	No interactive UI	Quick terminal checks, pretty output
htop	The Old Guard	Process-centric, system-wide	No GPU stats	CPU/memory troubleshooting
DCGM	The Corporate Overlord	Enterprise features, telemetry	Complex, overkill for most	Fleet-wide GPU monitoring

Pro Tip:

Combine nvtop with htop in split tmux panes for a full picture of CPU and GPU chaos.
Use gpustat -i for a quick text-based alternative if you only need summary info (gpustat GitHub).

Positive Case:

“I used nvtop to spot a rogue Jupyter notebook that was leaking VRAM. Saved my team three hours and a reboot!”

Negative Case:

“Tried running nvtop on an AMD GPU. Got a blank screen. Oops—NVIDIA only, folks.”

Beginner Mistakes, Myths, and Alternatives

Myth: “I need to install CUDA to use nvtop.” Fact: You just need the driver and NVML.
Myth: “nvtop works for all GPUs.” Fact: It’s NVIDIA only (for now).
Mistake: Not running as root when your user lacks GPU access. Solution: Add user to video group or use sudo.
Mistake: Running inside Docker without --gpus all or missing device files. See NVIDIA Docker docs.
Alternative: For AMD cards, try nvtop (experimental branch) or rocm-smi.

Common Error Messages and Fixes

“No NVML device found” — Check NVIDIA driver install, run nvidia-smi first.
“No GPUs detected” — Is your server virtualized without GPU passthrough? Is the kernel module loaded?

nvtop or Not? The Decision Flowchart

    🖥️
    |
    ├──> Do you have an NVIDIA GPU?
    |        |
    |      No --> Try rocm-smi (AMD) or intel_gpu_top (Intel)
    |
    └──> Need real-time, interactive stats?
             |
           Yes --> Use nvtop!
             |
           No
             |
      ┌───────────────┬─────────────┐
      |               |             |
  Need JSON?    Want summary?   Fleet metrics?
      |               |             |
  Use DCGM      Use gpustat     Use DCGM or Prometheus

Automation, Scripting & Unusual Tricks

Cool Things You Can Do With nvtop & Friends

Monitor GPU health over SSH in tmux—leave it running in the background for remote troubleshooting.
Scripted alerts: Use nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv in a cron job or with a small Python script to email/Slack/page you if GPU is >95% for too long.
Combine with Prometheus/Grafana: Pull metrics via nvidia-dcgm-exporter for pretty dashboards, but use nvtop for hands-on debugging.
Embed in Jupyter Notebooks: Use !nvtop -m 1 in cell magic (though output is less pretty in web UIs).

Sample Script: Alert When GPU VRAM >90%


#!/bin/bash
THRESHOLD=90
CURRENT=$(nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | awk -F',' '{used+=$1; total+=$2} END {print int(used/total*100)}')
if [ "$CURRENT" -gt "$THRESHOLD" ]; then
  echo "Warning: GPU memory usage is at ${CURRENT}%!"
  # Here you could send an email, Slack, etc.
fi

Short Story: The Drowning Admin

Once upon a Tuesday, Alex, a sleep-deprived admin, gets an urgent ping: “The deep learning server is down!” SSH-ing in, Alex runs nvidia-smi and sees a wall of numbers. Confused, Alex tries nvtop—and instantly, a rainbow of activity, sorted by PID, reveals that Bob from accounting left a training job running over lunch.
Lesson: If Bob can use the GPU, you need nvtop. Save your sanity.

Wrap-up & Recommendations

nvtop is a must-have if you run NVIDIA GPUs on Linux, especially in shared, cloud, or containerized environments.
It’s fast, lightweight, and gives you the live feedback you need to debug, optimize, and show off.
Perfect for DevOps, ML engineers, researchers, and even curious hobbyists.
Not for AMD/Intel (yet). For those, check out rocm-smi or intel_gpu_top.
For full-stack ops, combine with htop, gpustat, and Prometheus/Grafana for all the monitoring you’ll ever need.
If you need a rock-solid VPS or dedicated box with GPUs, check out VPS or dedicated server options at mangohost.

Don’t let your GPUs run wild in the server farm—tame them with nvtop, and sleep better tonight.

This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.

This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.