13 November 2025

Disk Io Troubleshooting

by Sam Hadow

On my server I self-host quite a lot of services but only have 5900rpm HDDs for the data, and a SSD only for the OS and binaries.
Sometimes these HDDs struggle to keep up with the I/O operations of all my services.
So in this short blog post I’ll show you the troubleshhoting steps to find the culprit of a high disk I/O and how to limit its disk usage.

Check disk usage

To check disks usage we can use the tool iostat (provided by the package sysstat on fedora, debian and archlinux)

to see the extended stats every second:

iostat -x 1

you’ll then get an output like this:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.37    0.00    6.94   21.85    0.00   66.84

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
dm-0             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-1             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-2             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-3             0.00      0.00     0.00   0.00    0.00     0.00  524.00   8384.00     0.00   0.00   47.82    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   25.06 100.00
dm-4             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-5             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00  524.00   8384.00     0.00   0.00   46.02    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   24.11  99.30
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
zram0            1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

let’s explain each column:

Column Meaning
Device The block device name (e.g. sda, dm-0, etc.).
r/s Number of read requests per second issued to the device.
rkB/s Amount of data read per second, in kilobytes.
rrqm/s Number of merged read requests per second (the kernel merges adjacent reads into a single I/O).
%rrqm Percentage of read requests merged - calculated as 100 * rrqm/s / (r/s + rrqm/s).
r_await Average time (in milliseconds) for read requests to be served - includes both queue time and service time.
rareq-sz Average size (in kilobytes) of each read request.
w/s Number of write requests per second issued to the device.
wkB/s Amount of data written per second, in kilobytes.
wrqm/s Number of merged write requests per second.
%wrqm Percentage of write requests that were merged - calculated in a similar way to %rrqm.
w_await Average time (ms) for write requests to complete.
wareq-sz Average size (kB) of each write request.
d/s Number of discard requests per second (TRIM / UNMAP commands - mostly on SSDs).
dkB/s Amount of data discarded per second (in kB).
drqm/s Merged discard requests per second.
%drqm Percentage of discard requests merged.
d_await Average time (ms) for discard requests to complete.
dareq-sz Average discard request size (kB).
f/s Number of flush requests per second — these force buffered data to non-volatile storage.
f_await Average time (ms) for flush requests to complete.
aqu-sz Average queue size — the average number of I/O requests waiting in the queue or being serviced during the sample interval.
%util Percentage of time the device was busy processing I/O requests. Values near 100% indicate full utilization; but it can exceed 100% with multi-queues or parallel I/Os.

The most interesting columns for us are:

fun fact: Although iostat displays units corresponding to kilobytes (kB), megabytes (MB)…, it actually uses kibibytes (kiB), mebibytes (MiB)… A kibibyte is equal to 1024 bytes, and a mebibyte is equal to 1024 kibibytes. source

note

In the previous example, dm-* devices are actually virtual block devices managed by LVM (the device mapper).

To identify which physical volumes they correspond to, we can run this command:

ls -l /dev/mapper

Or this command as root:

dmsetup ls --tree

Find the process causing a high I/O usage

For that we can use the iotop command (package is named iotop on fedora, debian and archlinux). We usually need to run iotop as root as it needs elevated privileges.
With the following options it’s easier to spot processes causing a high I/O usage:

sudo iotop -aoP

what these options do:
-a = accumulated I/O since start
-o = only show processes actually doing I/O
-P = show per-process, not per-thread

We can also use pidstat (provided by the package sysstat on fedora, debian and archlinux). It’s better to run this command as root, otherwise you’ll only the processes from your the user running the command and not all the processes.

to show per process read/write operations, updating every second:

pidstat -d 1

We can then write down the PID, or the command, corresponding to the line with a lot of disk writes, or disk reads.

Limit disk usage

podman

With podman we can use arguments in the run command to limit disk I/Os as mentionned in the documentation

Argument effect
–device-read-bps=path:rate Limit read rate (in bytes per second) from a device (e.g. –device-read-bps=/dev/sda:1mb).
–device-read-iops=path:rate Limit read rate (in IO operations per second) from a device (e.g. –device-read-iops=/dev/sda:1000).
–device-write-bps=path:rate Limit write rate (in bytes per second) to a device (e.g. –device-write-bps=/dev/sda:1mb).
–device-write-iops=path:rate Limit write rate (in IO operations per second) to a device (e.g. –device-write-iops=/dev/sda:1000).

These may not work in rootless mode unless I/O delegation is enabled.

You can verify resource limit delegations enabled with this command:

cat "/sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers"

In our case we need io in the output.

If it’s not present you can create the file /etc/systemd/system/user@.service.d/delegate.conf with the following content:

[Service]
Delegate=io

You can also add the other resource limit you want to delegate to users, for example: memory pids cpu cpuset, the file would look like this:

[Service]
Delegate=io memory pids cpu cpuset

You then need to log out and log back in to have the correct limits permissions.

systemd

To limit disk I/O for a systemd service we can use slices

The most useful options we can put in a slice section are the following, you can see all the available options in the documentation

Property Description
IOAccounting= Enables collection of I/O statistics (used by systemd-cgtop, systemd-analyze, etc.).
IOWeight=weight Sets relative I/O priority (1–10000, default 100). A higher value gives the unit a larger share of available bandwidth when multiple units compete.
IODeviceWeight=device weight Assigns a per-device weight, overriding IOWeight for that device.
IOReadBandwidthMax=device bytes Sets an absolute cap on read bandwidth, e.g. /dev/sda 10M. Units cannot exceed this, even if idle bandwidth exists. Possible units are: K, M, G, or T for Kilobytes, Megabytes, Gigabytes, or Terabytes, respectively. Otherwise the bandwidth is parsed in bytes/s.
IOWriteBandwidthMax=device bytes Same, but for write bandwidth.
IOReadIOPSMax=device limit Caps the number of read operations per second, e.g. /dev/nvme0n1 500.
IOWriteIOPSMax=device limit Caps the number of write operations per second.

To create a slice unit, for example: /etc/systemd/system/io-limited.slice

[Unit]
Description=Slice for IO-limited services

[Slice]
IOAccounting=yes
IOWriteBandwidthMax=/dev/sda 20M
IOReadBandwidthMax=/dev/sda 20M

We can then assign services to this slice, in the service section we would have:

[Service]
Slice=io-limited.slice
tags: