November 22, 2008

Understanding the Linux Load Averages

I have been using Linux for several years now and although I have looked at the load averages from time to time (either using top or uptime), I never really understood what they meant. All I knew was that the three different numbers stood for averages over three different time spans (1, 5, and 15 minutes) and that under normal operation the numbers should stay under 1.00 (which I now know is only true for single-core CPUs).

Earlier this week at work I needed to figure out why a box was running slow. I was put in charge of determining the cause, whether it be excessive heat, low system resources, or something else. Here's what I saw for load averages when I ran the top command on the box:

load average: 2.86, 3.00, 2.89

I knew that looked high, but I had no idea how to explain what "normal" was and why. I quickly realized that I needed a better understanding of what I was looking at before I could confidently explain what was going on. A quick Google search turned up this very detailed article about Linux load averages, including a look at some of the C functions that actually do the calculations (this was particularly interesting to me because I'm currently learning C).

To keep this post shorter than the aforementioned article, I'll simply quote the two sentences that gave me a clear-as-day explanation of how to read Linux load averages:

The point of perfect utilization, meaning that the CPUs are always busy and, yet, no process ever waits for one, is the average matching the number of CPUs. If there are four CPUs on a machine and the reported one-minute load average is 4.00, the machine has been utilizing its processors perfectly for the last 60 seconds.

The machine I was checking at work was a single-core Celeron machine. This meant with a continuous load of almost 3.00 the CPU was being stressed much higher than it should be. Theoretically, a dual-core machine would drop this load to around 1.50 and a quad-core would drop it to 0.75.

There is a lot more behind truly understanding the Linux load averages, but the most important thing to understand is that they do not represent CPU usage. Rather they represent the load on the CPU by processes waiting for their chance to use the CPU. If you still can't get your brain away from thinking in terms of percentages, consider 1.00 to be 100% load for single-core CPU's, 2.00 to be 100% load for dual-core CPUs, and so on.

Update: John Gilmartin had some insightful feedback and shared a link to Understanding Load Averages where there's a nice graphical description for how load averages work.

Raam Dev

Hello, future.

Published

November 22, 2008

Cancel Reply

20 Comments

lukasz

November 27, 2009

Hello,

Thanks for explaining it shortly, helpful.

Reply to lukasz
- Raam
  
  November 27, 2009
  
  You’re welcome! 🙂
  
  Reply to Raam
Jake

December 5, 2009

I’ve been searching for a simplified explanation of this, and i found this page. Now it makes sense 🙂

Thank you!!

Reply to Jake
- Raam
  
  December 5, 2009
  
  You’re welcome, Jake!
  
  Reply to Raam
Tejas

March 4, 2010

Thanks for the short and understandable explanation 🙂

Reply to Tejas
- Raam
  
  March 4, 2010
  
  You’re very welcome, Tejas! Thanks for stopping by! 🙂
  
  Reply to Raam
Sorcix

December 18, 2010

Great explanation! Found pages full of text and graphs about this, your article beats them all.

Reply to Sorcix
- Raam Dev
  
  December 19, 2010
  
  Thanks!! That’s exactly the problem I hoped this post would solve! 🙂
  
  Reply to Raam
Mike

August 7, 2011

I am not agree with your understanding of load average,the man document doesn’t say that the load average only stands for the load average of CPU.

The load average from the man is system load average.

The high load average may be caused by the bottlenecks of cpu,mem,disk IO,or network. In some case the cpu utilization is low but the system load is high.

Thanks, just a little different understanding of load average. Mike

Reply to Mike
John Gilmartin

August 10, 2011

Raam, I’m afraid your statement, “with a continuous load of almost 3.00 the CPU was being stressed much higher than it should be,” is misleading, or at least mistaken. (For a single core system) a load average of 3.00 means that there were on average 3.00 times as many jobs on the ‘run-queue’ as there was CPU capacity to run them. The run-queue is simply the total number of jobs on the CPU plus those waiting to get on the CPU. In other words, there were some jobs queueing up before the CPU could process them. The point is, the CPU (itself) isn’t, “stressed much higher than it should be.” I think wording it that way could imply that it could overheat or wear out or something along those lines! Yes, the system may be slower to respond, but it’s not ‘bad’ for the system itself, although it may be viewed as less than desirable by any users of it.

Another way of considering it is demand versus capacity. A load average of 3.00 means 3.00 times as much demand as there is capacity. (Capacity meaning the capacity of one CPU core – see below.)

You do touch on an important point, which is that load average needs to be considered alongside the number of CPU cores in the system. When considering load averages, 4x single-core CPUs is equivalent to 2x dual-core CPUs is equivalent to 1x quad-core CPU is equivalent to four cores. A load average of 3.00 on a quad-core system would mean there was some CPU time going unused: 3.00/4=0.75 or 75%.

However, another important point is that jobs may be slowed down for other reasons, such as I/O. e.g. A job may be waiting for a read from or write to disk to complete, or similarly for I/O on a network interface, etc.

I find the following page to be a very good explanation with helpful analogies: http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages

Reply to John
- Raam Dev
  
  August 15, 2011
  
  John,
  
  Thanks so much for your explanation and for the link. I love the graphical description with cars/traffic. 🙂 I’ll go ahead and link to that blog post and this comment in the main body of the post.
  
  Cheers!
  Raam
  
  Reply to Raam
John Gilmartin

November 11, 2011

You’re welcome Raam, I’m glad you found it helpful and worthwhile. John

Reply to John
Aaron

November 14, 2011

Just yesterday my 1 5 and 15 min load averages were all close to or over 25.00

Reply to Aaron
- Raam Dev
  
  November 14, 2011
  
  Wow, that’s crazy! What’s causing it?
  
  Reply to Raam
Aaron

November 15, 2011

I had no idea at all, my top was showing nothing abnormal. I couldn’t even really log on to a TTY session I kept on getting time out before the password prompt came up. I think something related to LXDE bugged out or something, the computer it was on has a Athlon x2 (somewhere between 800 MHz to 1 GHz each depending on the source) normally has a load average of under .10 and 1.00 on the high end.

Reply to Aaron
Mario

November 11, 2012

I had a load of 12 in a server, Xeon 4 cores 2.4GHZ, it was sending mails with a 100K attachment, 6 mails to maybe 30 recipients each, and amavis was checking them all for viruses. But I’ve sent 2000 mails and the load stays at 5. This 12 thing was weird.

Reply to Mario
Eric Ulisses

March 28, 2013

That explanation was very important for help me.

Thank you for sharing.

Reply to Eric
- Raam Dev
  
  March 28, 2013
  
  You’re most welcome, Eric! Thanks for the comment. 🙂
  
  Reply to Raam
subhash

June 11, 2014

nice article

Reply to subhash
- Raam Dev
  
  June 12, 2014
  
  Thank you! 🙂
  
  Reply to Raam

Webmentions

Sorcix June 12, 2014

Published

November 22, 2008

Cancel Reply

Write a Comment

20 Comments

Webmentions

Related Posts