Lazy Linux: 10 Essential tricks for admins

Lazy Linux: 10 Essential tricks for admins is an awesome list of cool things you can do with Linux. I learned about trick #3 (collaboration with screen) from an admin at the datacenter where one of my servers is hosted. I remember being thrilled sitting there watching him do stuff on my server while sharing the keyboard to type messages back and forth in vi (think Remote Desktop or VNC, but on the console). Trick #5 (SSH back door) is something I've been using for years at work for remote diagnostics. It is an invaluable trick for getting around firewalls. Very cool stuff!

Being Greedy With Bash

Last night at my C/Unix class the professor quickly glossed over an interesting shell scripting technique that allows you to strip stuff off the beginning or end of a variable. I forgot about it until I saw the technique used again while editing a shell script at work today.

I didn't know what the technique was called but I remembered the professor saying something about "greedy clobbering" and, since I cannot search Google for special characters, I Googled "Bash greedy" and luckily found 10 Steps to Beautiful Shell Scripts, which just so happened to contain the technique I was looking for (#5).

There are basically four versions of this technique:

${var#pattern}
Search var from left-to-right and return everything after the first occurrence of pattern

${var##pattern}
Search var from left-to-right and return everything after the last occurrence of pattern (be greedy)

${var%pattern}
Search var from right-to-left and return everything after the first occurrence of pattern

${var%%pattern}
Search var from right-to-left and return everything after the last occurrence of pattern (be greedy)

Here's how it works. Let's say you have a variable that contains the path to a file:

FILE=/home/raam/bin/myscript.sh

Now let's say you wanted to extract the myscript.sh part from that variable. You could do some funky stuff with awk but there is a much easier solution built into Bash:

SCRIPTNAME=${FILE##*/}

Now $SCRIPTNAME will contain myscript.sh!

The ##*/ tells the shell to search left-to-right for everything before and including the slash (*/), be greedy while doing it so that all the slashes will be found (##), and then return whatever is left over (in this case, myscript.sh is the only thing remaining after the last slash).

AFAIK, this is a Bash-specific feature, but I'm not entirely certain and I wasn't sure where I could look to find out. It's amazing how four characters can do so much work so easily. The more I learn about what I can do with Bash, the more I wonder how I ever lived without all this knowledge!

Bounce-back Spam (Backscatter)

I really hate bounce-back spam! (I call it bounce-back spam, but the official name for it is Backscatter.) I've read, and been told by sysadmins, that there is not much that can be done about it. The Wikipedia page on bounce messages has a little section that explains why:

Excluding MDAs, all MTAs forward mails to another MTA. This next MTA is free to reject the mail with an SMTP error message like user unknown, over quota, etc. At this point the sending MTA has to inform the originator, or as RFC 5321 puts it:

If an SMTP server has accepted the task of relaying the mail and later finds that the destination is incorrect or that the mail cannot be delivered for some other reason, then it MUST construct an "undeliverable mail" notification message and send it to the originator of the undeliverable mail (as indicated by the reverse-path).

This rule is essential for SMTP: as the name says, it's a simple protocol, it cannot reliably work if mail silently vanishes in black holes, so bounces are required to spot and fix problems.

Today, however, most email is spam, which usually utilizes forged Return-Paths. It is then often impossible for the MTA to inform the originator, and sending a bounce to the forged Return-Path would hit an innocent third party. This inherent flaw in today's SMTP (without the deprecated source routes) is addressed by various proposals, most directly by BATV and SPF.

It looks like I'll have to just deal with it. (I could set up filters and such, but then I might miss a real bounce-back and not know that my message didn't go through!) I'm just grateful it comes in waves of a few hours every few weeks instead of non-stop! Has anyone else had to deal with this? If so, what did you do about it?

Backscatter Spam

I subconsciously converted a problem into a shell script

I have been writing a lot of shell scripts lately as part of the C/Unix class that I'm taking at Harvard Extension. My familiarity with how the Unix shell and the underlying system works has grown exponentially. When I came across a problem earlier today, I subconsciously turned the problem into a shell script without even thinking about it!

The problem: "How can I check to make sure my program is running every 30 minutes and restart it if it's not?"

Answer:

# If myscript isn't running, restart it
ONLINE=`ps aux | grep -c myscript`
# 2 because grep myscript also finds 'grep myscript'
if [ $ONLINE -ne "2" ]; then
        $MYSCRIPT_PATH/restart_service.sh
fi

I'm sure there are many better ways to solve this problem, but the fact that I instantly translated the problem into shell scripting code (and that it worked as expected on my first try) astonished me. I can see how good programmers who write in a particular language, and know the in's and out's the like the back of their hand can turn problems into code seamlessly (or know exactly where to look to find answers if they're unsure).

It's really amazing how easily you can solve simple problems when you have a deeper understanding of how the system works.

That's all. I just wanted to share my excitement. 🙂

Using 'rsync –exclude-from' to Exclude Files Containing Spaces

A few months ago I wrote a post about escaping filename or directory spaces for rsync. Well that wasn't the end of rsync giving me problems with spaces.

When I used the --exclude-from rsync option to specify a list of exclusions, I figured using single or double-quotes around files/directories that contain spaces would be enough to escape them. However, after swashing through hundreds and hundreds of lines from rsync's output, I discovered the excluded directories were still being synced!

When using --exclude-from, files and directories should not contain any single or double quotes, only a backslash:

/afs/*
/automount/*
/Users/raam/Documents/Virtual Machines/*

Note: A commenter pointed out that this no longer applies to the latest version of rsync. I tested this on Mac OS X 10.9 (Mavericks) and rsync v2.6.9 and confirmed that you no longer need to escape spaces in the exclude file.

Yahoo DNS Issues Cause Problems in the United States

Yahoo! appears to be inaccessible to people in the US. Visiting yahoo.com redirects to www.yahoo.com and fails to load. I confirmed it was at least somewhat limited to the US by trying the connection from a shell account on a server in Europe.

Using dig (a Unix DNS lookup utility), we can see from within the United States that there is a problem with DNS. There is no A record with an IP address listed in the ANSWERS section:

;; QUESTION SECTION:
;www.yahoo.com. IN A

;; ANSWER SECTION:
www.yahoo.com. 129 IN CNAME www.wa1.b.yahoo.com.

And from the server in Europe:

;; QUESTION SECTION:
;www.yahoo.com. IN A

;; ANSWER SECTION:
www.yahoo.com. 272 IN CNAME www.wa1.b.yahoo.com.
www.wa1.b.yahoo.com. 33 IN CNAME www-real.wa1.b.yahoo.com.
www-real.wa1.b.yahoo.com. 33 IN A 209.191.93.52

;; AUTHORITY SECTION:
wa1.b.yahoo.com. 273 IN NS yf2.yahoo.com.
wa1.b.yahoo.com. 273 IN NS yf1.yahoo.com.

If you try connecting directly to the missing IP address, you should at least be able to get the main Yahoo page: http://209.191.93.52. You might also try temporarily adding an entry to your /etc/hosts or C:Windowssystem32driversetchosts if you want to continue being able to use the FQDN.

UPDATE: As of 15:50 EST, Yahoo appears to be working again. The outage appeared to start around 15:11 EST, so that's a good 40 minutes of downtime.

Mounting HFS+ with Write Access in Debian

When I decided to reformat and install my Mac Mini with the latest testing version of Debian (lenny, at the time of this writing) I discovered that I couldn't mount my HFS+ OS X backup drive with write access:

erin:/# mount -t hfsplus /dev/sda /osx-backup
[ 630.769804] hfs: write access to a journaled filesystem is not supported, use the force option at your own risk, mounting read-only.

This warning puzzled me because I was able to mount fine before the reinstall and, since the external drive is to be used as the bootable backup for my MBP, anything with "at your own risk" was unacceptable.

I had already erased my previous Linux installation so I had no way of checking what might have previously given me write access to the HFS+ drive. A quick apt-cache search hfs revealed a bunch of packages related to the HFS filesystem. I installed the two that looked relevant to what I was trying to do:

hfsplus - Tools to access HFS+ formatted volumes
hfsutils - Tools for reading and writing Macintosh volumes

No dice. I still couldn't get write access without that warning. I tried loading the hfsplus module and then adding it to /etc/modules to see if that would make a difference. As I expected, it didn't. I was almost ready to give up but there was another HFS package in the list that, even though it seemed unrelated to what was trying to do, seemed worth a shot:

hfsprogs - mkfs and fsck for HFS and HFS+ file systems

It worked! I have no idea how or why (and I'm not interested enough to figure it out), but after installing the hfsprogs package I was able to mount my HFS+ partition with write access.

Update:

As Massimiliano and Matthias have confirmed in the comments below, the following solution seems to work with Ubuntu 8.04:

From Linux, after installing the tools suggested before, you must run:
mount -o force /dev/sdx /mnt/blabla

Otherwise, in my fstab, I have an entry like this:
UUID=489276e8-7f9b-3ae6-8c73-69b99ccaab9c /media/Leopard hfsplus defaults,force 0 0


Understanding the Linux Load Averages

I have been using Linux for several years now and although I have looked at the load averages from time to time (either using top or uptime), I never really understood what they meant. All I knew was that the three different numbers stood for averages over three different time spans (1, 5, and 15 minutes) and that under normal operation the numbers should stay under 1.00 (which I now know is only true for single-core CPUs).

Earlier this week at work I needed to figure out why a box was running slow. I was put in charge of determining the cause, whether it be excessive heat, low system resources, or something else. Here's what I saw for load averages when I ran the top command on the box:

load average: 2.86, 3.00, 2.89

I knew that looked high, but I had no idea how to explain what "normal" was and why. I quickly realized that I needed a better understanding of what I was looking at before I could confidently explain what was going on. A quick Google search turned up this very detailed article about Linux load averages, including a look at some of the C functions that actually do the calculations (this was particularly interesting to me because I'm currently learning C).

To keep this post shorter than the aforementioned article, I'll simply quote the two sentences that gave me a clear-as-day explanation of how to read Linux load averages:

The point of perfect utilization, meaning that the CPUs are always busy and, yet, no process ever waits for one, is the average matching the number of CPUs. If there are four CPUs on a machine and the reported one-minute load average is 4.00, the machine has been utilizing its processors perfectly for the last 60 seconds.

The machine I was checking at work was a single-core Celeron machine. This meant with a continuous load of almost 3.00 the CPU was being stressed much higher than it should be. Theoretically, a dual-core machine would drop this load to around 1.50 and a quad-core would drop it to 0.75.

There is a lot more behind truly understanding the Linux load averages, but the most important thing to understand is that they do not represent CPU usage. Rather they represent the load on the CPU by processes waiting for their chance to use the CPU. If you still can't get your brain away from thinking in terms of percentages, consider 1.00 to be 100% load for single-core CPU's, 2.00 to be 100% load for dual-core CPUs, and so on.

Update: John Gilmartin had some insightful feedback and shared a link to Understanding Load Averages where there's a nice graphical description for how load averages work.

Creating a Bootable OS X Backup on Linux: Impossible?

I've had plans for a while now to set up a backup system using a Debian Linux server and rsync to back up my MacBook Pro laptop. At first glance, it seemed like it would be pretty straight forward. I've been able to make a bootable copy of my entire MBP using nothing but rsync (thanks to some very helpful directions by Mike Bombich, the creator of the popular, and free, Carbon Copy Cloner software). And by bootable copy I mean I could literally plug in the USB drive and boot my MBP from the drive (hold down the Alt/Option key while booting). Restoring a backup is as simple as running the rsync command again, but in the reverse direction. I know this solution works because I used it when I upgraded to a 320GB hard drive.

To start, I needed to create a big enough partition on the external USB drive using Disk Utility (formatted with Mac OS Extended (Journaled)). I then made a bootable copy of my MBP with one rsync command:

sudo rsync -aNHAXx --protect-args --fileflags --force-change
--rsync-path="/usr/local/bin/rsync" / /Volumes/OSXBackup

But my dream backup system was more unattended. I wanted something that would periodically (a couple times a day) run that rsync command over SSH (in the background) and magically keep an up-to-date bootable copy of my MBP on a remote server.

I love Linux and I jump at any opportunity to use it for something new, especially in a heterogeneous network environment. So when I decided to set up a backup server, I naturally wanted to make use my existing Debian Linux machine (which just so happens to be running on an older G4 Mac Mini).

So, after making a bootable copy of my MBP using the local method mentioned above, I plugged the drive into my Linux machine, created a mount point (/osx-backup), and added an entry to /etc/fstab to make sure it was mounted on boot (note the filesystem type is hfsplus):

/dev/sda /osx-backup hfsplus rw,user,auto 0 0

All that's left to do now is to run the same rsync command as earlier but this time specifying the remote path in the destination ([email protected]:/osx-backup/). This causes rsync to tunnel through SSH and run the sync. Unfortunately, this is where things started to fall apart.

OS X uses certain file metadata which must be copied for the backup to be complete (again, we're talking about a true bootable copy that looks no different than the original). Several of the flags used in the rsync command above are required to maintain this metadata and unfortunately Linux doesn't support all the necessary system calls to set this data. In particular, here are the necessary flags that don't work when rsyncing an OS X partition to Linux:

-X (rsync: rsync_xal_set: lsetxattr() failed: Operation not supported (95))
-A (recv_acl_access: value out of range: 8000)
--fileflags (on remote machine: --fileflags: unknown option)
--force-change (on remote machine: --force-change: unknown option)
-N (on remote machine: -svlHogDtNpXrxe.iL: unknown option)

According to the man page for rsync on my MBP, the -N flag is used to preserve create times (crtimes) and the --fileflags option requires chflags system call. When I compiled the newer rsync 3.0.3 on my MBP, I had to apply two patches to the source that were relevant to preserving Mac OS X metadata:

patch -p1 <patches/fileflags.diff
patch -p1 <patches/crtimes.diff

I thought that maybe if I downloaded the source to my Linux server, applied those same patches, and then recompiled rsync, that it would be able to use those options. Unfortunately, those patches require system-level function calls (such as chflags) that simply don't exist in Linux (the patched source wouldn't even compile).

So I tried removing all unsupported flags even though I knew lots of OS X metadata would be lost. After the sync finished, I tried booting from the backup drive to see if everything worked. It booted into OS X, but when I logged into my account lots of configuration was gone and several things didn't work. My Dock and Desktop were both reset and accessing my Documents directory gave me a "permission denied" error. Obviously that metadata is necessary for a viable bootable backup.

So, where to from here? Well, I obviously cannot use Linux to create a bootable backup of my OS X machine using rsync. I read of other possibilities (like mounting my Linux drive as an NFS share on the Mac and then using rsync on the Mac to sync to the NFS share) but they seemed like a lot more work than I was looking for. I liked the rsync solution because it could easily be tunneled over SSH (secure) and it was simple (one command). I can still use the rsync solution, but the backup server will need to be OS X. I'll be setting that up soon, so look for another post with those details.

WHM Whitelist to Exclude from Exim Sender Verify Callbacks

Sender verification is an important feature used by email servers to help prevent spam. When sender verification is enabled, the receiving email server checks to make sure the sender exists. Various email servers have different ways of handling this feature. Exim, for example, uses a mechanism called 'sender callouts' or 'callbacks'. (When the sending server does not accept a verification request, it does not comply with RFC 2821.)

However, in the event that the network route from the receiving email server to the originating email server is broken (or a firewall blocks the connection), the result can be a bit confusing. The receiving email server treats a failed verification as a failed verification, regardless of whether or not it could even connect to the originating server. This means the email never comes through to the recipient. After all, as far as the email server knows, it's spam.

One of my hosting clients was experiencing this "lost email" problem and a quick grep at /var/log/exim_mainlog confirmed the problem (hosts and IPs changed for obvious reasons):


2008-11-17 15:02:27 [30121] H=relay1.example.com (qsv-spam1.example.com) [67.26.151.59]:36752 I=[69.161.211.25]:25 sender verify defer for : could not connect to customer.example.com [163.112.75.15]: Connection timed out
2008-11-17 15:02:27 [30121] H=relay1.example.com (qsv-spam1.example.com) [67.26.151.59]:36752 I=[69.161.211.25]:25 F=<[email protected]> temporarily rejected RCPT <[email protected]>: Could not complete sender verify callout
2008-11-17 15:02:27 [30120] H=relay1.example.com (qsv-spam1.example.com) [67.26.151.59]:36751 I=[69.161.211.25]:25 incomplete transaction (RSET) from <[email protected]>

As you can see, the email server was unable to connect to customer.example.com to verify the existence of the sender ([email protected]). This doesn't mean the sending server doesn't verify callbacks, but rather that the network connection from my server to the sending server could not be established.

Most of the stuff I found online related to solving this problem on a server running WHM (here and here) explain how to modify exim.conf to add special whitelist rules. Luckily, my server is running WHM 11.23.2 and has a whitelist option that makes it really easy to exclude a particular IP address from sender verification without any manual changes to exim.conf:

1. Click Service Configuration -> Exim Configuration Editor
2. Under Access Lists, find "Whitelist: Bypass all SMTP time recipient/sender/spam/relay checks" and click [EDIT]
3. Add the IP address for the sending server for which you wish to skip sender verification (as the note at the bottom explains, hosts cannot be used in this list)
4. Click Save
5. Click Save again near the bottom of the Exim Configuration Editor page

That's it! Now any emails from that IP that were failing to come through because of a sender verification failure will come through without a problem (again, you can watch /var/log/exim_mainlog to confirm).

HOWTO: Install md5sum & sha1sum on Mac OS X

I was a bit surprised to learn that my Mac didn't have the md5sum and sha1sum tools installed by default. A quick search and I found a site that provides the source. The sources compiled successfully on my Mac (OS X 10.5.5, xCode tools installed).

The only quirk appears in the last step:

$ ./configure
$ make
$ sudo make install
cp md5sum sha1sum ripemd160sum /usr/local/bin
chown bin:bin /usr/local/bin/md5sum /usr/local/bin/sha1sum
              /usr/local/bin/ripemd160sum
chown: bin: Invalid argument
make: *** [install] Error 1

The make install command tries to change the ownership of the files to the bin user. Since that user doesn't exist on my system, the command fails. This isn't a problem though, as both binaries work perfectly. By default, they are installed to /usr/local/bin/.

Using the OS X md5 instead of md5sum

As a commenter pointed out, the /sbin/md5 utility provided by OS X contains a hidden -r switch that causes it to output in a format identical to that of md5sum, making it compatible with scripts that require md5sum's format. If you want to use the md5 utility provided by OS X, you can add the following to your ~/.profile or ~/.bashrc:

alias md5='md5 -r'
alias md5sum='md5 -r'

Installing with HomeBrew

A commenter mentioned that you can install md5sum using HomeBrew by running brew install coreutils.

Update (2015-02-25): The current method for installing via HomeBrew is as follows:

brew install md5sha1sum

Installing with MacPorts

A commenter mentioned if you have MacPorts installed, you can run port install coreutils but "you’ll need to add /opt/local/libexec/gnubin/ to your PATH.

Update (2014-08-25): It appears that you should use sudo port install md5sha1sum.

DD-WRT has come a long way!

I just finished installing DD-WRT on a Linksys WRT54GL router for the office and all I can say is wow. I remember when replacing the firmware on a Linksys router was like doing surgery in the dark with a butcher knife and a wrench. Now I just download the DD-WRT firmware, use the Upgrade Firmware section of the Linksys configuration page on my router, and BAM! I have DD-WRT installed. The extra features provided by DD-WRT are invaluable and make the router's usefulness increase exponentially. I've got to install this on a router at home.

Googlebot Relentlessly Using Bandwidth

When one of my hosting clients complained about continuously running out of bandwidth on his low-traffic site, I took a peek at the access logs and discovered that Googlebot was indexing every single possible day on a simple calendar addon for the phpBB2 forum software installed on the site. (Googlebot is the program that crawls the web indexing everything so you can search for it using Google.)

A quick peek at the access logs showed thousands of Googlebot requests for a forum calendar:

[sourcecode language="bash"]
66.249.71.39 - - [01/Sep/2008:17:09:12 -0400] "GET /forums/calendar.php?m=7&d=21&y=1621&sid=79b643b30eer7140adcd2ba76732688a HTTP/1.1" 200 44000 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.40 - - [01/Sep/2008:17:09:33 -0400] "GET /forums/calendar.php?m=4&d=2&y=2188&sid=e4da1ee0a488096e3897a8f15c31cea2 HTTP/1.1" 200 43997 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.40 - - [01/Sep/2008:17:09:44 -0400] "GET /forums/calendar.php?m=12&d=4&y=1624&sid=cc5d5084d158457ce3c7a9d38263f553 HTTP/1.1" 200 44076 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.41 - - [01/Sep/2008:17:10:05 -0400] "GET /forums/calendar.php?m=10&d=15&y=1621&sid=a4e8af0d20715g965b3e616ae6f95004 HTTP/1.1" 200 43751 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.41 - - [01/Sep/2008:17:10:15 -0400] "GET /forums/calendar.php?m=9&d=13&y=2187&sid=80c79b2491ddf3d8d46076d48a6282d1 HTTP/1.1" 200 43896 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.40 - - [01/Sep/2008:17:10:26 -0400] "GET /forums/calendar.php?m=5&d=30&y=1618&sid=f0619ba6517an57bcd6a7e9ca6289a32 HTTP/1.1" 200 43820 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.39 - - [01/Sep/2008:17:10:38 -0400] "GET /forums/calendar.php?m=11&y=2189&d=30&sid=97c0a58bbd2b3914dbf255ea0a2b1a4c HTTP/1.1" 200 44107 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
[/sourcecode]

A quick Google search turned up many others who've had the same problem:

Just found exactly the same on one of my client’s sites. They were complaining that despite being a small site, they’d apparently used all of their bandwidth within 4 days.

They had one of these PHP calendars on their site, where you click the day and it tells you what’s on. Googlebot had tried to index EVERY SINGLE POSSIBLE DAY. And, in the first four days of September, had used up all this site’s bandwidth, clocking up an impressive 19,000 hits and 800MB of bandwidth.

You can use robots.txt to tell all decent robots to push off. I’ve just done that. Let’s see if it works!

So I added a file to the root web directory for the site and named it robots.txt. Inside, I put the following:

User-agent: *
Disallow: /forums/calendar.php

Sure enough, the next time the Googlebot came through it ignored /forums/calendar.php and didn't use up ridiculous amounts of bandwidth indexing something that need not be indexed.

I can't blame the Googlebot though. It was just doing its job. The fault goes to the creators of the calendar addon. What they should have done was add a rel="nofollow" to all the links in the calendar. You can add a nofollow tag to individual links to prevent Googlebot from crawling them. Google started using the nofollow tag as a method of preventing comment spam back in 2005.

Switching to suPHP; What a Mess!

When one of my users reported problems deleting files he had uploaded using a PHP script, I quickly discovered all the files being uploaded were owned by the user running the web server: nobody. This meant only the root user could delete those files.

Apache suEXEC is commonly used to resolve this problem. It allows Apache to run as the user who owns the domain being accessed. This way, files created by PHP would be owned by the user owning the site instead of the default nobody user.

However, Apache suEXEC only works if you're using CGI as the PHP handler. The PHP5 handler on my server was set to use CGI, but I have PHP4 configured as the default PHP version and it was configured to use DSO. When I tried changing PHP4 to use CGI as the handler, most of the domains on my server displayed this:

Warning: Unexpected character in input: '' (ASCII=15) state=1 in /usr/local/cpanel/cgi-sys/php4 on line 772
Warning: Unexpected character in input: ' in /usr/local/cpanel/cgi-sys/php4 on line 772
Warning: Unexpected character in input: ' in /usr/local/cpanel/cgi-sys/php4 on line 772
Warning: Unexpected character in input: ' in /usr/local/cpanel/cgi-sys/php4 on line 772
Parse error: syntax error, unexpected T_STRING in /usr/local/cpanel/cgi-sys/php4 on line 772

OK, that looks like a problem with cPanel. I don't have time to debug cPanel's problems.

suPHP, like suEXEC, is used to run Apache as the user who owns the domain. I decided to try recompiling Apache and PHP with suPHP enabled to see if that would fix the problem.

File Ownership Hell

suPHP worked, except now the sites using PHP sessions were trying to access stored session data in /tmp/ that was owned by the user nobody! So I deleted all the session data and that allowed the PHP sites to create new session data with file ownership of the user owning the domain.

But then I tried accessing my WordPress admin page and started getting permission denied errors in /wp-content/cache/. Same problem: the cache files that had been created before I enabled suPHP were owned by the user nobody and now the user who owns my domain couldn't access them. A quick chown -R raamdev:raamdev /wp-content/cache/ fixed that problem.

Yeah, I could simply chown -R [user]:[user] /home/[user] for each of the users on the server, but there's something about running a recursive command on files I've never seen, and know nothing about, that makes me uncomfortable.

More suPHP Limitations

I was beginning to worry that this was going to be more difficult than simply enabling suPHP and I wondered how many other sites I'm hosting could have similar problems. I tried accessing one of the high priority sites I'm hosting and discovered it was broken and displaying an "Internal Server Error".

After a little research, I discovered that you cannot use php_value directives in .htaccess files with suPHP. The .htaccess file included with (created by?) Joomla! contained this at the bottom:

#Fix Register Globals
php_flag register_globals off

I already knew register_globals was turned off in the global PHP configuration, so I simply commented out that line and the site started working again.

Conclusion

It was at this point that I concluded it was too risky to just blindly enable suPHP while hosting over 50 domains, many of which I am not at all familiar with what's being used or hosted. I will need to take the time to carefully crawl through all the sites making sure their .htaccess files don't contain anything that might disrupt suPHP and then confirm all the sites are still working properly.

Lesson learned: Setup suPHP before you're hosting 50+ domains.

Reverse DNS: That's not me!

I have Speakeasy DSL at home with a static IP address (I'm boycotting Comcast). I run a Linux server on a Mac Mini and I use it for all my messaging (using naim IRC/AIM and Jabber via Bitlbee, but that's for another post).

Since I SSH into my Linux box several times a day, it would be nice to avoid typing the full IP address each time. So I decided to setup an A Record on one of the domains I own (we'll use dev82.org as an example) so that dev82.org points to the IP address of my home DSL connection (66.92.25.92 in this example).

After transferring the Speakeasy DSL to my new apartment in Cambridge, I had a new IP address. No problem -- I simply updated the A Record and dev82.org worked again. However, this time I noticed something funky. Take a look at what hostname my IP address resolved to when I pinged dev82.org:

raam@wfc-main2:~$ ping dev82.org
PING dev82.org (66.92.25.92) 56(84) bytes of data.
64 bytes from host103-spk.online-buddies.com (66.92.25.92): icmp_seq=1 ttl=53 time=38.3 ms

That's weird. What the hell is host103-spk.online-buddies.com? A little Googling tells me:

"Online Buddies, Inc., developer of MANHUNT.net is one of largest developers of web-oriented services serving the gay community."

Uh, I'm not gay. Besides, why the hell is my home DSL IP address responding to a domain I've never heard of? I ran a few more tests, including tests from different ISPs to rule out a local DNS issue. Each time, my home IP address resolved the same:

raam@wfc-main2:~$ nslookup 66.92.25.92
Non-authoritative answer:
92.25.92.66.in-addr.arpa name = host92-spk.online-buddies.com.

So I decided to pick up the phone and call Speakeasy. They have always been helpful and I figured worst-case scenario is that I'll have to request a new IP address and re-point dev82.org. At least that way my IP address won't resolve to some gay site.

I called Speakeasy at 3:30am and had a tech on the phone within 3 minutes (Speakeasy rocks). I explained to the tech my situation and he quickly had an explanation: Whoever was assigned my IP address before I was must have had it set up to resolve to that hostname.

I told the tech I had setup an A Record to point the IP address to dev82.org. After confirming that was true with a ping test, the tech said he would update the Reverse DNS record so that 66.92.25.92 resolves to dev82.org.

This is awesome. I had no idea I could request an update to the Reverse DNS record for my static DSL connection! I wonder how easy that process is with a Comcast connection, or if it's even possible. 😕

Kill Inactive and Idle Linux Users

Every once in awhile the SSH connection to my Linux server will die and I'll be left with a dead user. Here's how I discover the inactive session using the w command:

 15:26:26 up 13 days, 23:47,  2 users,  load average: 0.00, 0.00, 0.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
raam     pts/0    wfc-main.wfcorp. Mon10    2days  0.04s  0.04s -bash
raam     pts/1    pool-151-199-29- 15:26    0.00s  0.02s  0.01s w

You can easily tell there's an idle user by glancing at the IDLE column; the user in the first row has been idle for 2 days. There are many ways of killing idle users, but here I'll show you a few of my favorites. The bottom line is, you need to kill the parent process created by the idle user when he logged in. There are a number of ways of doing that.

Here is how I discover the parent process using the pstree -p command:

        ├─screen(29380)───bash(29381)───naim(29384)
        ├─scsi_eh_0(903)
        ├─sshd(1997)─┬─sshd(32093)─┬─sshd(32095)
        │            │             └─sshd(32097)───bash(32098)─┬─mutt(32229)
        │            │                                         └─screen(32266)
        │            └─sshd(1390)─┬─sshd(1392)
        │                         └─sshd(1394)───bash(1395)───pstree(1484)
        ├─syslogd(1937)
        └─usb-storage(904)

We need to find the parent PID for the dead user and issue the sudo kill -1 command. We use the -1 option because it's a cleaner way of killing processes; some programs, such as mutt, will end cleanly if you kill them with -1. I can see by looking at the tree where I'm running the pstree command, so I just follow that down the tree until I find a common process (branch) shared by both users; this happens to be sshd(1997).

You can see there are two branches at the point -- one for my current session and one for the idle session (I know this because I'm the only user logged into this Linux server and because I know I should only have one active session). So I simply kill the sshd(32093) process and the idle user disappears.

Of course, if you're on a system with multiple users, or you're logged into the box with multiple connections, using the above method and searching through a huge tree of processes trying to figure out which is which will not be fun. Here's another way of doing it: Looking at the output from the w command above, we can see that the idle users' TTY is pts/0 so now all we need is the PID for the parent process. We can find that by running who -all | grep raam:

raam     + pts/0        May 10 10:45   .         18076 (wfc-main.wfcorp.net)
raam     + pts/1        May 11 15:26   .         1390 (pool-151-199-29-190.bos.east.verizon.net)

Here we can see that 18076 is the PID for the parent process of pts/0, so once we issue kill -1 18076 that idle session will be gone!

Erasing a Disk Using Linux

Here is a really quick way to erase a disk in Linux. Maybe "erase" is the wrong word -- the command actually fills the entire disk with 0's thereby overwriting any existing data. Assuming the disk you want to erase is /dev/hda, here's what you would run:

dd if=/dev/zero of=/dev/hda bs=1M

Technically, this is a better option than simply "deleting" the data or removing the partitions, as those options make it easier to recover data. So, if the FBI is about to raid your little lab and you only have time to run one command, thats what it should be. 🙂