Saturday, November 3, 2012

Command Lookup Tables in Various Languages

Sometimes it's fun to see how an idea translates when coded in several languages. In this article, the central idea is having a table that maps a command name to a function. A runloop prompts the user for a command, looks up the command in the table, and, if it exists, executes the associated function. The python cmd module is a take on this idea. Here we code up something much more rudimentary in python, ruby, and lua.

cmd.py


!/usr/bin/env python

import sys 

def connect(host):
    print('connecting to %s...' % (host))

def download(path):
    print('downloading %s...' % (path))

def upload(path):
    print('uploading %s...' % (path))

def disconnect():
    print ('disconnecting...')
    sys.exit(0)

commands = { 
    'connect': connect,
    'download': download,
    'upload': upload,
    'disconnect' : disconnect
}

while True:
    line = raw_input('% ')
    tokens = line.split()
    if not tokens: continue
    cmd = tokens[0]
    args = tokens[1:]
    if cmd in commands:
        try: commands[cmd](*args)
        except TypeError as e:
            print(e)
    else:
        print("cmd '%s' not found" % (cmd))

cmd.rb


!/usr/bin/env ruby

def connect(host) 
    puts "connecting to #{host}..."
end

def download(path) 
    puts "downloading #{path}..."
end

def upload(path) 
    puts "uploading #{path}..."
end

def disconnect() 
    puts "disconnecting..."
    exit
end

commands = { 
    'connect' => :connect,
    'download' => :download,
    'upload' => :upload,
    'disconnect' => :disconnect,
}

while true do
    print '% '
    tokens = gets.split
    cmd = tokens[0]
    args = tokens[1..-1]
    if cmd and commands[cmd] then
        begin
            send(commands[cmd], *args)
        rescue ArgumentError => e
            puts e.message
        end 
    else
        puts "cmd '#{cmd}' not found"
    end 
end

cmd.lua


#!/usr/bin/env lua

---------------------------------------
-- utility functions
---------------------------------------
local function split(s) 
    local tokens = {}
    for t in string.gmatch(s, "%S+") do
        --print('token: ' .. t)
        tokens[#tokens + 1] = t 
    end 
    return tokens
end

local function printf(fmt, ...)
    if fmt then
        return io.write(string.format(fmt, ...))
    else
        return print()
    end 
end

local function perror(msg)
    -- strip file and line number where error occurs
    local i, j = string.find(msg, ':%d+:%s+')
    if j then
        print(string.sub(msg, j+1, -1))
    else
        print(msg)
    end 
end

---------------------------------------
-- commands
---------------------------------------
local function connect(host)
    if not host then
        error('must specify host')
    end 
    printf('connecting to %s...\n', host)
end

local function download(path)
    if not path then
        error('must specify path')
    end 
    printf('downloading %s...\n', path)
end

local function upload(path)
    if not path then
        error('must specify path')
    end
    printf('uploading %s..\n', path)
end

local function disconnect()
    printf('disconnecting...\n')
    os.exit(0)
end

---------------------------------------
-- command table
---------------------------------------
local commands = {
    connect = connect,
    upload = upload,
    download = download,
    disconnect = disconnect
}

---------------------------------------
-- runloop
---------------------------------------
while true do
    io.write('% ')
    io.flush()
    local line = io.read('*line')
    local tokens = split(line)
    if #tokens > 0 then
        local cmd = tokens[1]
        table.remove(tokens, 1)
        local args = tokens
        if commands[cmd] then
            local status, err = pcall(commands[cmd], unpack(args))
            if not status then
                perror(err)
            end
        else
            printf("cmd '%s' not found\n", cmd)
        end
    end
end

Sunday, October 14, 2012

Solving Jumble with Python and Bash

In order to solve jumble, we first write a Python script that outputs all character permutations of a given string. The script demonstrates a cool little recursive algorithm. To compute all permutations of a string of length n, we pick off the character at each index in the string, and compute the permutations of the remaining n-1 characters. The function substring constructs this n-1 character string. When we have a list of the permutations of these n-1 characters, we use the list comprehension
[first + i for i in perms(others)]
to prepend the character that we initially picked off onto each string in the list.

permutations.py


#!/usr/bin/env python

import sys 

def substring(s, i): 
    if i == (len(s) - 1): 
        return s[:i]
    elif i == 0:
        return s[1:]
    else:
        return s[0:i] + s[i+1:]

def perm(s):
    p = []
    length = len(s)
    if length <= 1:
        p.append(s)
    else:
        for i in range(length):
            first = s[i]
            others = substring(s, i)
            p.extend([first + i for i in perm(others)])
    return p

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('usage: %s word' % sys.argv[0])
        sys.exit(1)
    perms = perm(sys.argv[1])
    for i in perms:
        print(i)
    sys.exit(0)

Let's try out the script by having it display all permutations of the string 'one'.

$ ./permutations.py one
one
oen
noe
neo
eon
eno

Now all we have to do to solve jumble is loop over each string in the output and see if that string appears in a dictionary. To do this, we'll write a short Bash script that greps for each word against a dictionary file. On my computer a flat text dictionary is located at /usr/share/dict/words. Feel free to change the assignment of the variable DICT to some other dictionary file that is found on your machine.

jumble.sh


#!/bin/bash

DICT=/usr/share/dict/words

if [ $# -ne 1 ]; then
    echo "usage: $0 jumble-word"
    exit 1
fi

cnt=1
for i in `./permutations.py $1`; do
    grep -q "^$i$" $DICT 
    if [ $? -eq 0 ]; then
        printf "%6d: %s ... [YES]\n" "$cnt" "$i"
        exit 0
    fi  
    printf "%6d: %s ... [NO]\n" "$cnt" "$i"
    let cnt=cnt+1
done

exit 1

Let's try out our program on an example taken from jumble.com.

$ ./jumble.sh tecot
     1: tecot ... [NO]
     2: tecto ... [NO]
     3: teoct ... [NO]
     4: teotc ... [NO]
     5: tetco ... [NO]
     6: tetoc ... [NO]
     7: tceot ... [NO]
     8: tceto ... [NO]
     9: tcoet ... [NO]
    10: tcote ... [NO]
    11: tcteo ... [NO]
    12: tctoe ... [NO]
    13: toect ... [NO]
    14: toetc ... [NO]
    15: tocet ... [NO]
    16: tocte ... [NO]
    17: totec ... [NO]
    18: totce ... [NO]
    19: tteco ... [NO]
    20: tteoc ... [NO]
    21: ttceo ... [NO]
    22: ttcoe ... [NO]
    23: ttoec ... [NO]
    24: ttoce ... [NO]
    25: etcot ... [NO]
    26: etcto ... [NO]
    27: etoct ... [NO]
    28: etotc ... [NO]
    29: ettco ... [NO]
    30: ettoc ... [NO]
    31: ectot ... [NO]
    32: ectto ... [NO]
    33: ecott ... [NO]
    34: ecott ... [NO]
    35: ectto ... [NO]
    36: ectot ... [NO]
    37: eotct ... [NO]
    38: eottc ... [NO]
    39: eoctt ... [NO]
    40: eoctt ... [NO]
    41: eottc ... [NO]
    42: eotct ... [NO]
    43: ettco ... [NO]
    44: ettoc ... [NO]
    45: etcto ... [NO]
    46: etcot ... [NO]
    47: etotc ... [NO]
    48: etoct ... [NO]
    49: cteot ... [NO]
    50: cteto ... [NO]
    51: ctoet ... [NO]
    52: ctote ... [NO]
    53: ctteo ... [NO]
    54: cttoe ... [NO]
    55: cetot ... [NO]
    56: cetto ... [NO]
    57: ceott ... [NO]
    58: ceott ... [NO]
    59: cetto ... [NO]
    60: cetot ... [NO]
    61: cotet ... [NO]
    62: cotte ... [YES]

Friday, October 5, 2012

JavaScript Celsius-Fahrenheit Converter

Your first JavaScript script pops up an alert box. Your second script converts between Celsius and Fahrenheit. So without further ado ... the infamous Celsius-Fahrenheit Converter.

Fahrenheit
Celsius

Source code:

<html>
  <head>
    <title>Temperature Converter</title>
  </head>
  <body>
    <h2>Temperature Converter</h2>
    <hr/>
    <form name="temperature">
      <table>
        <tr>
          <td>Fahrenheit</td>
          <td><input type="text" name="fahrenheit" onchange="ftoc();"></td>
        </tr>
        <tr>
          <td>Celsius</td>
          <td><input type="text" name="celsius" onchange="ctof();"></td>
        </tr>
      </table>
    </form>
    <script language="JavaScript">
      function ftoc() {
        var f = document.temperature.fahrenheit.value;
        if (isNaN(f)) {
            document.temperature.celsius.value = '';
        } else {
            var c = (5.0 / 9.0) * (f - 32);
            document.temperature.celsius.value = c.toFixed(2);
        }
      }

      function ctof() {
        var c = document.temperature.celsius.value;
        if (isNaN(c)) {
            document.temperature.fahrenheit.value = '';
        } else {
            var f = ((9.0 / 5.0) * c) + 32;
            document.temperature.fahrenheit.value = f.toFixed(2);
        }
      }
    </script>
  </body>
</html>

Monday, October 1, 2012

Lazy Initialization, Memoization, and Fun with Python Magic Methods

In this blog, we have some fun creating a class that implements the mapping protocol. The class also provides a simple example of the techniques of lazy initialization and memoization.

Let's say we have a configuration file, planets.conf that lists standard attributes for each planet in the solar system.

planets.conf


[Mercury]
orbit = 57910000
diameter = 4880
mass = 3.30e23

[Venus]
orbit = 108200000
diameter = 12103.6
mass = 4.869e24

[Earth]
orbit = 149600000
diameter = 12756.3
mass = 5.972e24
satellites = Moon

[Mars]
orbit = 227940000
diameter = 6794
mass = 6.4219e23
satellites = Phobos Deimos
.
.
.

Let's now create a structure-like class, called Planet, that encapsulates the attributes of planet name, orbit, diameter, mass, and satellites.

class Planet:
    def __init__(self, name, orbit, diameter, mass, satellites = []):
        self._name = name
        self._orbit = orbit
        self._diameter = diameter
        self._mass = mass
        self._satellites = satellites

    def __str__(self):
        return '%s%s\torbit: %.2f%s\tdiameter: %.2f%s\tmass: %g%s' +
            '\tsatellites: %s%s' \
            % (self._name, os.linesep,  \
            self._orbit, os.linesep, \
            self._diameter, os.linesep, \
            self._mass, os.linesep, \
            ', '.join(self._satellites), os.linesep)
    @property
    def name(self): return self._name
    @property
    def orbit(self): return self._orbit
    @property
    def diameter(self): return self._diameter
    @property
    def mass(self): return self._mass
    @property
    def satellites(self): return self._satellites

Next, we create a dictionary-like class, called Planets, that manages the creation of each Planet object.

class Planets:
    _NUM_PLANETS = 9
    _PLANET_NAMES = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn',
    'Uranus', 'Neptune', 'Pluto']

    def __init__(self):
        self._planets = {}
        self._parser = SafeConfigParser()
        self._parser.read('planets.conf')

    def __len__(self):
        return self._NUM_PLANETS

    def __getitem__(self, key):
        if key in self._planets:
            return self._planets[key]
        if not self._parser.has_section(key):
            raise KeyError(key)
        orbit = self._parser.getfloat(key, 'orbit')
        diameter = self._parser.getfloat(key, 'diameter')
        mass = self._parser.getfloat(key, 'mass')
        satellites = []
        if self._parser.has_option(key, 'satellites'):
            satellites = self._parser.get(key, 'satellites').split()
        self._planets[key] = Planet(key, orbit, diameter, mass, satellites)
        return self._planets[key]

In particular, if a user performs the following operations:

p = Planets()
earth = p['Earth']

then Planets will read the statistics for planet Earth from planets.conf, create a new Planet that represents Earth, and add this planet to its private dictionary of planets (self._planets). Thus, the operation of constructing a planet from a configuration file is only performed if that planet is accessed. On subsequent accesses of that planet, Planets returns the memoized Planet object.

Note that, for fun, we have Planets implement the __len__ and __getitem__ magic methods, thereby fulfilling the protocol of an immutable mapping.

The entire program is listed below.

lazy_planets.py


#!/usr/bin/env python

from ConfigParser import SafeConfigParser
import os

__metaclass__ = type # new-style classes

class Planet:
    def __init__(self, name, orbit, diameter, mass, satellites = []):
        self._name = name
        self._orbit = orbit
        self._diameter = diameter
        self._mass = mass
        self._satellites = satellites

    def __str__(self):
        return '%s%s\torbit: %.2f%s\tdiameter: %.2f%s\tmass: %g%s' +
            '\tsatellites: %s%s' \
            % (self._name, os.linesep,  \
            self._orbit, os.linesep, \
            self._diameter, os.linesep, \
            self._mass, os.linesep, \
            ', '.join(self._satellites), os.linesep)
    @property
    def name(self): return self._name
    @property
    def orbit(self): return self._orbit
    @property
    def diameter(self): return self._diameter
    @property
    def mass(self): return self._mass
    @property
    def satellites(self): return self._satellites

class Planets:
    _NUM_PLANETS = 9
    _PLANET_NAMES = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn',
    'Uranus', 'Neptune', 'Pluto']

    def __init__(self):
        self._planets = {}
        self._parser = SafeConfigParser()
        self._parser.read('planets.conf')

    def __len__(self):
        return self._NUM_PLANETS

    def __getitem__(self, key):
        if key in self._planets:
            return self._planets[key]
        if not self._parser.has_section(key):
            raise KeyError(key)
        orbit = self._parser.getfloat(key, 'orbit')
        diameter = self._parser.getfloat(key, 'diameter')
        mass = self._parser.getfloat(key, 'mass')
        satellites = []
        if self._parser.has_option(key, 'satellites'):
            satellites = self._parser.get(key, 'satellites').split()
        self._planets[key] = Planet(key, orbit, diameter, mass, satellites)
        return self._planets[key]

if __name__ == '__main__':
    p = Planets()
    print('Number of planets: %d' % (len(p), ))
    earth = p['Earth']
    print(earth)
    mars = p['Mars']
    print('The moons of mars are: %s' % (' ,'.join(mars.satellites), ))
    try: ceres = p['Ceres']
    except KeyError, e:
        print 'Ceres is not a planet!'
Here is the output of the program:
$ ./lazy_planets.py 
Number of planets: 9
Earth
 orbit: 149600000.00
 diameter: 12756.30
 mass: 5.972e+24
 satellites: Moon

The moons of mars are: Phobos ,Deimos
Ceres is not a planet!

Friday, September 28, 2012

Searching Files with bash, find, and grep

I often find myself in the situation where I am working in a deeply nested project, and need to search all of the source files for a keyword. One solution is to pipe together the find and grep commands. For instance, one could invoke

find . -type f -name "*.c" -print | xargs grep -rn printf
to find all occurrences of the printf function. For ease of use, I decided to wrap this functionality in a bash script.

findgrep.sh


#!/bin/sh

usage() {
    prog=`basename $0`
    cat << EOF

    usage $prog [-d ROOT_DIR] [-s] FILE KEYWORD

    Options:
        -d ROOT_DIR     root directory to start the search from
        -s              case sensentive search     

    Search for all files under ROOTDIR directory with a name matching
    FILE pattern.  Among the files found, grep for the KEYWORD pattern.

    Examples:

        # search for C source files under /home/joe that mention 'socket'
        $prog -d /home/joe "*.c" socket

        # search for use of 'synchronized' keyword in all Java source files
        # under the current working directory
        $prog -s "*.java" synchronized 

EOF
}


findgrep() {
    if [[ $4 -eq 0 ]]
    then
        grep_args="-in"
    else
        grep_args="-n"
    fi
    find "$1" -type f -name "$2" -print | xargs grep $grep_args "$3"
}

root_dir=`pwd`
case_sensitive=0

while getopts ":hd:s" option
do
    case $option in
    h)
        usage
        exit 1
        ;;
    d)
        rood_dir=$OPTARG
        ;;
    s)
        case_sensitive=1
        ;;
    \?)
        echo "invalid option: -$OPTARG" >&2
        exit
        ;;
    esac
done


# shift command line arguments so that first positional argument is $1
shift $((OPTIND-1))

if [[ $# -ne 2 ]]
 then
    usage
    exit 1
fi

findgrep $root_dir $1 $2 $case_sensitive                                                             
Example:
$ pwd
/Users/smherwig/github
$ ls
lua-daemon  lua-mnt     lua-proc    pytk-editor
$ findgrep.sh -s "*.c" statvfs
/Users/smherwig/github/lua-mnt/src/mnt.c:7:#include <sys/statvfs.h>;
/Users/smherwig/github/lua-mnt/src/mnt.c:154:    struct statvfs  sbuf;
/Users/smherwig/github/lua-mnt/src/mnt.c:157:    ret = statvfs(path, &sbuf);

Thursday, September 27, 2012

Producer-Consumer Model with Python

In this article, we investigate the producer-consumer model in Python. The producer-consumer model describes a situation where one or more players (the producers) are responsible for placing data into a buffer, and one or more players (the consumers) are responsible for removing data from that buffer. The chief hurdles to this arrangement are
  1. Ensuring the producer(s) do not add more data than the buffer can hold.
  2. Ensuring the consumers(s) do not take from an empty buffer,
  3. Ensuring the activities of all players are synchronized.

In Python, the combined use of the Queue and threading modules provides an elegant way to approach the problem that abstracts away many of the details. In particular, the Queue.Queue datatype ensures synchronized access to its data, freeing the user from having to use a locking scheme.

The program below is slightly redudant in that we specify the consumer threads as daemons, but then indirectly wait for them to complete. I will explain this is a second. First, let's talk about the purpose of daemon threads.

The Python documentation states that, "The entire Python program exits when no alive non-daemon threads are left. A clearer phrasing is, "The program exits when all non-daemon threads have completed." By default, all threads (including the 'main' thread) are non-daemon. By specifying that the consumers are daemon, we are saying that the program can terminate even though the consumers have not completed. In many programs (think GUIs) this is the desired behavior. We, however, also delay the 'main' thread from exiting until the producers have stopped and the queue is empty, which effectively waits for the consumers to complete.

The example program below lets the user spawn an arbitrary number of producer and consumer threads. Each producer threads generates a random pair of integers and places the pair in a queue. The consumers remove the pairs from the queue and compute their greatest common divisor.

gcd_processor.py


#!/usr/bin/env python

from optparse import OptionParser
import Queue
import random
import sys
import threading
import time

data_queue = Queue.Queue()
stdout_mutex = threading.Lock()

def gcd(a, b):
    while b != 0:
        a, b = b, a % b
    return a

def consumer(idnum):
    while True:
        try:
            data = data_queue.get(block=False)
        except Queue.Empty:
            pass
        else:
             with stdout_mutex:
                print('\tconsumer %d: computed gcd(%d, %d) = %d' % \
                        (idnum, data[0], data[1], gcd(data[0], data[1])))
        time.sleep(0.1)

def producer(idnum, count):
    for i in range(count):
        a, b  = random.randint(1, sys.maxint), random.randint(1, sys.maxint)
        with stdout_mutex:
            print('\tproducer %d: generated (%d, %d)' % (idnum, a, b))
        data_queue.put((a, b))
        time.sleep(0.1)


if __name__ == '__main__':
    usage = '''usage: %prog [options]
   
Demonstrate producer-consumer model.

Each producer generates a sequence of integer pairs.  The integer pair is
placed on a global queue.  Each consumer removes pairs from the queue and
computes the greatest common divisor for that pair.
'''
    version = '%prog 1.0'
    parser = OptionParser(usage=usage, version=version)
    parser.add_option('-c', '--num-consumers', dest='num_consumers', 
            type='int', default=2, 
            help='number of consumer threads')
    parser.add_option('-p', '--num-producers', dest='num_producers',
            type='int', default=4,
            help='number of producer threads')
    parser.add_option('-n', '--num-integer-pairs', dest='num_integer_pairs',
            type='int', default=10,
            help='the number of integer pairs each producer generates')
    (options, args) = parser.parse_args()

    print('[*] beginning main thread')
    for i in range(options.num_consumers):
        t = threading.Thread(target=consumer, args=(i, ))
        t.daemon = True
        t.start()

    joinable = []
    for i in range(options.num_producers):
        t = threading.Thread(target=producer,
                args=(i, options.num_integer_pairs));
        joinable.append(t)
        t.start()

    # wait for all of the producer threads to finish
    for thread in joinable: thread.join()
    with stdout_mutex:
        print('[*] all producer threads finished')

    # wait for all of the consumer threads to finish
    while not data_queue.empty(): pass
    with stdout_mutex:
        print('[*] all consumer threads finished')
    with stdout_mutex:
        print('[*] main thread exited')

Below is a sample run of the program with two consumers and three producers, where each producer produces four integer pairs.

$ ./gcd_processor.py -c 2 -p 3 -n 4
[*] beginning main thread
 producer 0: generated (8650083748717510955, 7042817999880859144)
 producer 1: generated (8411030783605146193, 5649382909553114176)
 producer 2: generated (7598317520065638838, 1583073951150684026)
 consumer 1: computed gcd(8650083748717510955, 7042817999880859144) = 1
 producer 2: generated (9132283067627759945, 5550586898968097650)
 consumer 0: computed gcd(8411030783605146193, 5649382909553114176) = 1
 producer 0: generated (941876034000530679, 2254827834336877783)
 producer 1: generated (3831435553181605295, 6532409849491920585)
 producer 1: generated (89084273097138564, 7529330907292728475)
 producer 0: generated (810343105420840002, 7098464443848996296)
 producer 2: generated (1459983564619067730, 7336893082805612268)
 consumer 0: computed gcd(7598317520065638838, 1583073951150684026) = 2
 consumer 1: computed gcd(9132283067627759945, 5550586898968097650) = 5
 producer 0: generated (5748866744760111634, 4998219244374494451)
 consumer 0: computed gcd(941876034000530679, 2254827834336877783) = 1
 producer 1: generated (8257193796991062027, 218310341014437559)
 consumer 1: computed gcd(3831435553181605295, 6532409849491920585) = 5
 producer 2: generated (4387473705826720899, 3521353578419199591)
 consumer 0: computed gcd(89084273097138564, 7529330907292728475) = 1
 consumer 1: computed gcd(810343105420840002, 7098464443848996296) = 2
[*] all producer threads finished
 consumer 0: computed gcd(1459983564619067730, 7336893082805612268) = 6
 consumer 1: computed gcd(5748866744760111634, 4998219244374494451) = 1
 consumer 1: computed gcd(8257193796991062027, 218310341014437559) = 1
 consumer 0: computed gcd(4387473705826720899, 3521353578419199591) = 9
[*] all consumer threads finished
[*] main thread exited

Sunday, September 23, 2012

Threading with Python

In this blog, we adapt the wget_fork.py program from the previous article to instead spawn threads to download multiple webpages, rather than create a new process for each download. This article uses the threading chapter of Programming Python by Mark Lutz as inspiration. The examples, though hackneyed, are my originals.

Python has two modules devoted to threading: a low-level module called thread, and a high-level module that uses thread called threading. In this first example, we use thread.start_new_thread() to spawn threads. Each thread downloads a single URL. In order to coordinate access to stdout among the threads, we create a global mutex called stdout_mutex which protects access to that shared stream. We also create a list of mutex called exit_mutexes which provide a mechanism for allowing each thread to signal that it is complete. main() can thereby monitor this list, implementing a rudimentary join scheme. mutexes were chosen for this implementation of join; a list of booleans would serve just as well.

wget_thread.py


#!/usr/bin/env python

import os
import sys 
import thread
import time
import urlparse
import urllib

stdout_mutex = thread.allocate_lock()
exit_mutexes = []  

def download(tid, url):
    global stdout_mutex
    global exit_mutexes

    # craft the output filename
    o = urlparse.urlparse(url)
    filename = o.netloc + o.path
    filename = filename.replace(os.path.sep, '__')
    
    # print message to sdout (could use 'with stdout_mutex' instead)
    stdout_mutex.acquire()
    print '[*] saving %s to %s' % (url, filename)
    stdout_mutex.release()

    # retrieve the file
    try:
        urllib.urlretrieve(url, filename)
    except (IOError, urllib.ContentTooShortError), e:
        stdout_mutex.acquire()
        print('[-] ' + str(e))
        stdout_mutex.release()

    # mark thread as complete
    exit_mutexes[tid].acquire()

def main(argv):
    global stdout_mutex
    global exit_mutexes
    
    # spawn the downloader threads
    for i, url in enumerate(argv[1:]):
        exit_mutexes.append(thread.allocate_lock())
        thread.start_new_thread(download, (i, url))

    # wait for each thread to complete
    while not all(mutex.locked() for mutex in exit_mutexes): time.sleep(1)
    print('main thread exiting')

if __name__ == '__main__':
    main(sys.argv)

The following shows a sample output from the program.

$ ./wget_thread.py http://smherwig.org http://python.org
[*] saving http://smherwig.org to smherwig.org
[*] saving http://python.org to python.org
main thread exiting

In this example, we code the same program using the threading module. The threading module allows you to subclass threading.Thread, thereby allowing a thread to contain state. The run() method is somewhat comparable to the callable that you would pass to thread.start_new_thread. The run method is invoked when you call the class's start() method.

With the threading module, we no longer have to implement our own join, but can instead reap threads with the threading.Thread's join() method.

wget_threading.py


#!/usr/bin/env python

__metaclass__ = type # new-style classes

import os
import sys 
import threading
import urllib
import urlparse

class Downloader(threading.Thread):
    def __init__(self, tid, url, stdout_mutex):
        threading.Thread.__init__(self)
        self.tid = tid 
        self.url = url 
        self.stdout_mutex = stdout_mutex

    def run(self):
        # craft the output filename
        o = urlparse.urlparse(self.url)
        filename = o.netloc + o.path
        filename = filename.replace(os.path.sep, '__')
        with self.stdout_mutex:
            print('[*] saving %s to %s' % (self.url, filename))
        # retrieve the file
        try: 
            urllib.urlretrieve(self.url, filename)
        except (IOError, urllib.ContentTooShortError), e:
            with self.stdout_mutex:
                print('[-] ' + str(e))

def main(argv):
    # create mutex for stdout access (same as thread.allocate_lock())
    stdout_mutex = threading.Lock()
    threads = []  
    # spawn the downloader threads
    for i, url in enumerate(argv[1:]):
        thread = Downloader(i, url, stdout_mutex)
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()   # wait for thread exit
    print('main thread exiting')

if __name__ == '__main__':
    main(sys.argv)

Our last example takes advantage of the fact that threading.Thread's constructor can be passed a callable and tuple of arguments. If our thread is relatively simple, such a constructor might save us from having to write our own subclass. Notice that this use of threading.Thread is quite similar to the semantics fo thread.start_new_thread().

wget_threading2.py


#!/usr/bin/env python

import os
import sys 
import threading
import urllib
import urlparse

def download(tid, url, stdout_mutex):
    o = urlparse.urlparse(url)
    filename = o.netloc + o.path
    filename = filename.replace(os.path.sep, '__')
    with stdout_mutex:
        print('[*] saving %s to %s' % (url, filename))
    # retrieve the file
    try: 
        urllib.urlretrieve(url, filename)
    except (IOError, urllib.ContentTooShortError), e:
        with stdout_mutex:
            print('[-] ' + str(e))

def main(argv):
    # create mutex for stdout access (same as thread.allocate_lock())
    stdout_mutex = threading.Lock()
    threads = []  
    # spawn the downloader threads
    for i, url in enumerate(argv[1:]):
        thread = threading.Thread(target=download, args=(i, url, stdout_mutex))
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()   # wait for thread exit
    print('main thread exiting')

if __name__ == '__main__':
    main(sys.argv)

Thursday, September 20, 2012

Fork Exec Wait with Python

The following program demonstrates the use of os.fork() and os.wait() in Python. The program forks several processes, each of which counts to a number. The parent process waits for each child process to end, and prints that process's return value.

#!/usr/bin/env python

from optparse import OptionParser
import os
import sys 
import time

def worker(count):
    for i in range(count):
        time.sleep(1)
        print '[%s] => %s' % (os.getpid(), i)

def boss(num_workers, count):
    child_pids = []
    for i in range(num_workers):
        pid = os.fork()
        if pid == 0:
            worker(count)
            os._exit(i)
        else:
            child_pids.append(pid)
    for pid in child_pids:
        pid, status = os.waitpid(pid, 0)
        if os.WIFEXITED(status):
            print 'parent: child with pid %d exited with value %d' % \ 
                (pid, os.WEXITSTATUS(status))

if __name__ == '__main__':
    usage = '''usage: %prog [options]
   
Demonstrate fork and wait system calls.'''
    version = '%prog 1.0'
    parser = OptionParser(usage=usage, version=version)
    parser.add_option('-n', '--num-workers', dest='num_workers', type='int',
            default=5, 
            help='number of child worker processes to fork (default=5)')
    parser.add_option('-c', '--toil-count', dest='toil_count', type='int', 
            default=10,
            help='the number each worker process must count to (default=10)')
    (options, args) = parser.parse_args()
    boss(options.num_workers, options.toil_count)

A sample output of running the program with seven worker processes, each of which counts to three, is shown below.

$ ./forkwait.py -n 7 -c 3
[692] => 0
[695] => 0
[694] => 0
[693] => 0
[697] => 0
[696] => 0
[698] => 0
[695] => 1
[692] => 1
[693] => 1
[697] => 1
[698] => 1
[694] => 1
[696] => 1
[695] => 2
[692] => 2
[697] => 2
[693] => 2
[698] => 2
[694] => 2
[696] => 2
parent: child with pid 692 exited with value 0
parent: child with pid 693 exited with value 1
parent: child with pid 694 exited with value 2
parent: child with pid 695 exited with value 3
parent: child with pid 696 exited with value 4
parent: child with pid 697 exited with value 5
parent: child with pid 698 exited with value 6

Our next program is a bit more interesting. In this program, the user species a list of webpages to download. For each webpage, the program forks a new process and uses os.execlp() to execute wget.

#!/usr/bin/env python

from optparse import OptionParser
import os
import sys 
import time
import urlparse

def wget(webpages):
    child_pids = []
    for webpage in webpages:
        pid = os.fork()
        if pid == 0:
            # generate output name to prevent many duplicate index.html's
            o = urlparse.urlparse(webpage)
            output = o.netloc + o.path
            output = output.replace(os.path.sep, '__')
            print '[*] saving %s to %s' % (webpage, output)
            os.execlp('wget', 'wget', '--quiet', '--output-document', output,
                    webpage)
            assert False, 'error starting wget'
        else:
            child_pids.append(pid)
    for pid in child_pids:
        pid, status = os.waitpid(pid, 0)
        if os.WIFEXITED(status):
            print 'parent: child with pid %d exited with value %d' % \ 
                (pid, os.WEXITSTATUS(status))

if __name__ == '__main__':
    usage = '''usage: %prog [url ...]
  
Retrieve one or more webpages.
'''
    version = '%prog 1.0'
    parser = OptionParser(usage=usage, version=version)
    (options, args) = parser.parse_args()
    wget(args)

In the sample run below, we attempt to download the homepage's of smherwig.org, lua.org, and the nonexistent, invalid URL, foo.

$ ./forkwget.py http://smherwig.org/index.html http://lua.org foo
[*] saving http://smherwig.org/index.html to smherwig.org__index.html
[*] saving http://lua.org to lua.org
[*] saving foo to foo
parent: child with pid 1153 exited with value 0
parent: child with pid 1154 exited with value 0
parent: child with pid 1155 exited with value 4

Friday, February 17, 2012

Mac OS X 10.7.3. Wifi Issues

On February 1, I received the Apple 10.7.3 update for my iMac. This update seems to have brought with it a buggy wifi kernel module, although this is just conjecture on my part. That said, the nature of the problem, coupled with the fact that it is unaffected by resetting configuration (e.g., network configuration, PRAM), seems to lead a certain level of credence to this thought.

From a user's perspective, the bugs cause

1. Slow data transfer rates
2. Inability to rejoin a network when returning from sleep
3. Frequent, intermittent wifi connection drops.

When the wifi connection drops, the system log contains numerous, repeated error messages that state 'PM kernel: BSSID changed to <MAC address of your wirelesss router>'.

I decided to simply roll back to 10.7.2. This was a relatively easy procedure. Simply press Command-R at boot, and, when prompted, select to 'Reinstall Lion'. I didn't loose any data when rolling back, but it's a good idea to make a back up of your more important files before proceeding.

So far, so good. Perhaps when 10.7.4 comes along, I'll consider installing the combo update.

Wednesday, February 8, 2012

Setting up x86 Android with Eclipse


In addition to using emulators and real devices to test your Android apps, you also have the option of running x86 Android in VirtualBox.  In this post, I will describe the steps for setting up x86 Android in VirtualBox, and using Eclipse to push apps onto the VM.  I am performing this installation on Mac OS X.




I. CREATE THE VIRTUAL MACHINE


    1. Grab the iso from www.android-x86.org.


        On the left-hand side there is link to downloads.  I am 
        currently targeting Froyo, so I downloaded 
        android-x86-2.2-r2-eeepc.iso.


    2. Create a new VM in VirtualBox


        $ open -a VirtualBox &


        Click 'New'


        For 'OS Type', enter:
            Operating System: Linux
            Version: Other


        The rest of the configuration is really up to you.  The one        
        exception is that you'll want to create a bootable hard  
        disk.


    3. Mount the ISO


        Now that you've created a VM, you want to insert the ISO 
        disk into it's virtual DVD tray


             Settings > Storage section > IDE Controller


        You should see a CD icon that says 'Empty'.  Select the CD 
        entry and another CD icons will appear on the right of the 
        screen.  Click this latter icon and and select 'Choose a 
        virtual CD/DVD disk file...'.  Enter the path to your .iso 
        disk into the dialog box that pops up.


    4. Install x86 Android onto your VM


        When you first boot up, you will a screen that has a few   
        options.  Choose  'Installation - Install Android x86 to 
        hard disk'.


        Create a new primary partition that is the full size of 
        your virtual hardddisk.  You will want to select the 
        'Bootable' option, and then actually create the partition 
        by selection 'Write'.  The partition created is sda1.


        Select 'Quit'.  


        Now you must format a filesystem on the sda1 partition.  
        Choose ext3.  Also, choose to install the GRUB boot loader.  
        Finally, for development purposes, specify that /system be 
        read-write.


        You can now run the VM.


    5. Unmount the .iso


        After installation, shut down the VM.  Go into settings and 
        unmount the iso.




II. CONFIGURE THE VM'S DISPLAY TO REPLICATE A PHONE'S DISPLAY


    1. Define custom resolution modes


        $ VBoxManage \
        > setextradata "<vm-name>" "CustomVideoMode1" "320x480x16"
        
        $ VBoxManage \
        > setextradata "<vm-name>" "CustomVideoMode2" "640x960x16"
        
        $ VBoxManage \
        > setextradata "<vm-name>" "CustomVideoMode3" "480x720x16"


    2. Modify grub configuration to use these resolution modes


        Start the VM.  When the VM is up, hit <FN-OPTION-F1> to 
        switch to a terminal.  At the shell, remount the boot 
        partition to read-write:


            $ mount -o remount,rw /mnt


        Next, open the grub menu confiruation for editing:


            $ vi /mnt/grub/menu.lst


        Create a new first entry by copying the current first entry 
        and changing it to:


         title Android_x86 2.2.2 (Phone)
         quiet root=/dev/ram0 androidboot_hardware=generic_x86 ...
         acpi_sleep=s3_bios,s3_mode DPI=240 UVESA_MODE=480x720 ...
         SRC=/android-2.3-RC1
       
         I'm using '...' to denote a continuation of the line.


    3. Shutdown the VM




III. PRESENTING VM AS A DEVICE TO ECLIPSE


    1. Get the IP address of the VM


        Start the VM and hit MFN-OPTION-F1> to drop to a shell.  
        Enter the command 'netcfg', and record the IP address.


    2. Connect to VM with adb


            $ adb connect <vm's ip>