I have been using Linux for several years now and although I have looked at the load averages from time to time (either using
uptime), I never really understood what they meant. All I knew was that the three different numbers stood for averages over three different time spans (1, 5, and 15 minutes) and that under normal operation the numbers should stay under 1.00 (which I now know is only true for single-core CPUs).
Earlier this week at work I needed to figure out why a box was running slow. I was put in charge of determining the cause, whether it be excessive heat, low system resources, or something else. Here’s what I saw for load averages when I ran the
top command on the box:
load average: 2.86, 3.00, 2.89
I knew that looked high, but I had no idea how to explain what “normal” was and why. I quickly realized that I needed a better understanding of what I was looking at before I could confidently explain what was going on. A quick Google search turned up this very detailed article about Linux load averages, including a look at some of the C functions that actually do the calculations (this was particularly interesting to me because I’m currently learning C).
To keep this post shorter than the aforementioned article, I’ll simply quote the two sentences that gave me a clear-as-day explanation of how to read Linux load averages:
The point of perfect utilization, meaning that the CPUs are always busy and, yet, no process ever waits for one, is the average matching the number of CPUs. If there are four CPUs on a machine and the reported one-minute load average is 4.00, the machine has been utilizing its processors perfectly for the last 60 seconds.
The machine I was checking at work was a single-core Celeron machine. This meant with a continuous load of almost 3.00 the CPU was being stressed much higher than it should be. Theoretically, a dual-core machine would drop this load to around 1.50 and a quad-core would drop it to 0.75.
There is a lot more behind truly understanding the Linux load averages, but the most important thing to understand is that they do not represent CPU usage. Rather they represent the load on the CPU by processes waiting for their chance to use the CPU. If you still can’t get your brain away from thinking in terms of percentages, consider 1.00 to be 100% load for single-core CPU’s, 2.00 to be 100% load for dual-core CPUs, and so on.