Being Greedy With Bash

Last night at my C/Unix class the professor quickly glossed over an interesting shell scripting technique that allows you to strip stuff off the beginning or end of a variable. I forgot about it until I saw the technique used again while editing a shell script at work today.

I didn't know what the technique was called but I remembered the professor saying something about "greedy clobbering" and, since I cannot search Google for special characters, I Googled "Bash greedy" and luckily found 10 Steps to Beautiful Shell Scripts, which just so happened to contain the technique I was looking for (#5).

There are basically four versions of this technique:

${var#pattern}
Search var from left-to-right and return everything after the first occurrence of pattern

${var##pattern}
Search var from left-to-right and return everything after the last occurrence of pattern (be greedy)

${var%pattern}
Search var from right-to-left and return everything after the first occurrence of pattern

${var%%pattern}
Search var from right-to-left and return everything after the last occurrence of pattern (be greedy)

Here's how it works. Let's say you have a variable that contains the path to a file:

FILE=/home/raam/bin/myscript.sh

Now let's say you wanted to extract the myscript.sh part from that variable. You could do some funky stuff with awk but there is a much easier solution built into Bash:

SCRIPTNAME=${FILE##*/}

Now $SCRIPTNAME will contain myscript.sh!

The ##*/ tells the shell to search left-to-right for everything before and including the slash (*/), be greedy while doing it so that all the slashes will be found (##), and then return whatever is left over (in this case, myscript.sh is the only thing remaining after the last slash).

AFAIK, this is a Bash-specific feature, but I'm not entirely certain and I wasn't sure where I could look to find out. It's amazing how four characters can do so much work so easily. The more I learn about what I can do with Bash, the more I wonder how I ever lived without all this knowledge!

Write a Comment

Comment

  1. Tricks like this are a matter of internal struggle for me, if you can believe that. Almost every time, unless there is a huge reason to use the trick rather than a more portable solution, I’m going to use the more portable solution.

    Reason is that recently, even something as simple as short-circuiting a bunch of junk in a bash script for work, rather than using a bunch of proper if/then/else blocks, caused a bunch of turmoil, depending on the host environment (the version of bash that shipped with Centos 3.X doesn’t appear to much care for my short-circuits).

    That said, here’s a trick that I’ve been using a lot lately that has really come in handy (source-from-http):

    WGET=”wget -q -t 1 –timeout=5 -O -”

    source_http() {
    eval “$($WGET $1)”
    }

    source_http “http://someserver.tld/my-bash-functions”

    # check for the existence of my functions in memory.
    # die if not available.

    main “$@” ||
    die “That’s pretty dumb, what with the program not working and all.”

  2. Yeah, portability is a huge concern my professor always talks about in class. He likes to keep things as portable as possible and talks about code that he wrote 30 years ago still working on live systems today as evidence of why portability is so important. When I first started writing shell scripts, the various “bashisms” caused me so much trouble when my script suddenly stopped working after moving it to a different system that did not symlink /bin/sh to /bin/bash.

    All of our shell script homework is required to work with /bin/sh and we’re warned about systems that symlink /bin/sh to /bin/bash (as most newer ones do). He also recommends using the dash shell, which aims for POSIX-compliancy.

    That source_http trick is very neat! Thanks for sharing!