Saturday, June 1, 2013

Using Bash and Awk to Print Long Lines

Bash and AWK are great tools for writing small programs. In this blog, we use AWK in a Bash script to print long lines in a file.

longlines.sh


#!/bin/bash

usage() {
    cat <<EOF

usage: $(basename $0) [-m MAX] [-n] FILEs...

Prints the lines in FILEs that have a length greater than MAX.
By default, MAX is 80.  If -n is given, the line number is also
printed.

EOF
}

max=80
printlines=0
while getopts m:n flag
do
    case $flag in
        m)
            max=$OPTARG
            ;;
        n)
            printlines=1
            ;;
        \?)
            usage
            exit 1
            ;;
        :)
            echo "option -$OPTARG requires an argument"
            usage
            exit 1
            ;;
    esac
done

shift $(( OPTIND - 1));

if [[ $# -lt 1 ]]
then
    usage
    exit 1
fi

awk -v max=$max -v printlines=$printlines \
    'length($0) > max { 
        if (printlines)
            printf("%6d: ", FNR)
        print $0 
    }' $*

Notice how the -v option to awk allows us to set variables in the AWK script from values in the Bash portion of the script.

The following example prints words in the American English dictionary that have more than 21 characters.

longlines.sh example


$ ./longlines.sh -m 21 /usr/share/dict/american-english 
Andrianampoinimerina's
counterrevolutionaries
counterrevolutionary's
electroencephalogram's
electroencephalograph's
electroencephalographs

An alternative way to write the awk portion of the script is

awk \
    'length($0) > max { 
        if (printlines)
            printf("%6d: ", FNR)
        print $0 
    }' max=$max printlines=$printlines $*

The use of -v makes the variables available to the entire script, including the special rule BEGIN; specifing the values as max=$max makes the variables available to all rules, excluding the special rule BEGIN.

No comments:

Post a Comment