The Shell is Like a Dishwasher

by Ville Laurikari on Thursday, May 21, 2009

Sometimes I’m just so glad I know shell.  I mean things like

find . -type f -print0 | xargs -0 -n 100 ls -1sk | sort -n | tail -10

For the uninitiated: the above command lists the ten largest files under the current working directory (including subdirectories).  How do the Windows power users do stuff like this?  PowerShell?  Python programs?  Can you even write one-liners in Python?

On a related note, it always pains me to watch people browse to a file in a file manager GUI.  You click through a bunch of folders and stare at each list of file names for a second before finding the right one, click on it, and by the time you’ve found your file the coffee’s gone cold.  With the shell, it’s “sTABnTABkTABsTABn” and you’re done in a fraction of the time.  Hot coffee.  Why can’t you do that in the file browser GUI?

The shell is like a dishwasher.  When I was a just poor undergrad CS student living off macaroni and ketchup “borrowed” from my roommate, I couldn’t afford a dishwasher.  There were more important things to spend money on, such as beer, science fiction, and macaroni.  At the time, washing dishes the old-fashioned way didn’t seem so bad.  In fact, I coped without a dishwasher for a long time after I would have been able to just buy one.  Now I wouldn’t dream of doing all my dishes manually.

If you liked this, click here to receive new posts in a reader.
You should also follow me on Twitter here.

Comments on this entry are closed.

{ 7 comments }

Risto Saarelma May 21, 2009 at 08:31

Python programs? Can you even write one-liners in Python?

Sure you can:

$ python -c "import os; import os.path; a = reduce(lambda x, y : x+y, [[(lambda x : (os.path.exists(x) and os.path.getsize(x), x))(os.path.join(root, file)) for file in files] for root, dirs, files in os.walk('.')], []); a.sort(); print '\n'.join(['%d %s' % (size, name) for (size, name) in a[-10:]])"

File sizes are in bytes, not 1k blocks though.

Ville Laurikari May 21, 2009 at 10:50

It would be nice if some Python module would provide a set of higher level file system operations, similar to what the typical shell utilities provide, in order to make this sort of code significantly shorter.

Perhaps something like that exists already?

Risto Saarelma May 21, 2009 at 12:26

Getting a nice list of files under a directory was probably the ugliest part of working with the default library, so yes, there is room for better stuff. There would still be a syntactic problem even with the better library though, the order in which the functions appear is reverse from the way they are in the pipe, op1 input | op2 | op3 | op4 becomes op4(op3(op2(op1(input)))).

Turns out there’s an evil trick which can make Python use a pipe-style syntax though:

import os
import os.path

def flatten(seqs):
    return reduce(lambda x, y: x + y, seqs, [])

def listfiles(path):
    return flatten([[os.path.join(root, file) for file in files]
                   for root, dirs, files in os.walk(path)])

def filesize(x):
    return os.path.exists(x) and os.path.getsize(x)

class Pipe:
    def __init__(self, fn):
        self.fn = fn

    def __ror__(self, lhs):
        if callable(lhs):
            return Pipe(lambda x : self(lhs(x)))
        else:
            return self(lhs)

    def __call__(self, x):
        return self.fn(x)

find = Pipe(listfiles)

def pmap(fn):
    return Pipe(lambda seq : map(fn, seq))

sort = Pipe(sorted)

def tail(n = 10):
    return Pipe(lambda seq : seq[-n:])

def format(fmt):
    return Pipe(lambda seq : [fmt % x for x in seq])

lines = Pipe(lambda seq : '\n'.join(seq))

print find('.') | pmap(lambda x : (filesize(x), x)) | sort | tail(10) | format("%d %s") | lines
Risto Saarelma May 21, 2009 at 15:51

Actually, you don’t even need to wrap the pipe parts into the operator overloaded objects, just start the expression with something that keeps wrapping the results of the operations:

import os
import os.path

def curry(fn, *args):
    return lambda *args2 : fn(*(args + args2))

class StartPipe:
    def __init__(self, input, fn=None):
        if fn:
            self.fn = fn
        else:
            self.fn = lambda x : x
        self.input = input

    def __or__(self, rhs):
        if callable(rhs):
            return StartPipe(self.run(), lambda x : rhs(x))
        else:
            # Terminate and evaluate the final fn with any non-callable value.
            return self.run()

    def run(self):
        return self.fn(self.input)

def find(path):
    for root, dirs, files in os.walk(path):
        for file in files:
            yield os.path.join(root, file)

def filesize(x):
    return os.path.exists(x) and os.path.getsize(x)

def tail(n = 10):
    return lambda seq : seq[-n:]

def format(fmt):
    return lambda seq : [fmt % x for x in seq]

lines = lambda seq : '\n'.join(seq)

print StartPipe('.') | find | curry(map, lambda x : (filesize(x), x)) | sorted | tail(10) | format("%d %s") | lines | 0
Risto Saarelma May 21, 2009 at 15:56

And it looks like I still haven’t figured out how to format multiline code in comments…

Ville Laurikari May 21, 2009 at 20:57

Risto, that’s pretty cool stuff. This makes it really easy to combine generators and normal functions over values. Much more elegant than your first version!

For code, it seems best to just surround it with <pre> and </pre>.

Lasse Laurila May 22, 2009 at 12:19

The Windows Explorer equivalent for “sTABnTABkTABsTABn”is “sENTERnENTERkENTERsENTERn”. I’m pretty sure they have something similar in Konqueror or Nautilus or whatever. There was a pretty handy way of doing the file size thing in Windows XP’s Explorer, but it doesn’t seem to work in Vista.

Previous post:

Next post: