Stop writing shell scripts!

Over the course of my career have I repeatedly run into large (100 lines or more) scripts written in shell language. I’ve slowly, perhaps unconsciously developed negative feelings for such shell scripts. These days, I start to groan a little when I open some interesting software and find it overridden with shell syntax. I think about the pain it must have been to write the logic and control flow in shell language, and I immediately know the software will be difficult for me to understand, because that seems to be an inherent property of shell. Now, don’t get me wrong: I love the command line and I use it every day – in fact, I consider myself a command line enthusiast. I’ve also written my share of shell scripts (and still find them useful). So why don’t I love shell scripts? It’s that I’ve finally started to realize that for large, complex applications, shell scripting can cause major inefficiencies. I’ve come to a conclusion: Stop using shell scripts! Here’s my attempt to try to explain exactly why.

Maybe it’s a bit harsh to say stop using shell scripts completely – Shell scripts have their use. What I really mean is, use the right tool for the job… and if you have to think much about it, then the right tool is probably not a shell script. My rule of thumb is this: if it requires more than 1 function or any type of control structure (like a loop, or if/then/else statement), you should probably use Python instead.

Why?

I thought it was elegantly stated in an anonymous answer to this Stackoverflow question:

The shell makes common and simple actions really simple, at the expense of making more complex things much much more complex.

Typically, a small shell script will be shorter and simpler than the corresponding python program, but the python program will tend to gracefully accept modifications, whereas the shell script will tend to get less and less maintainable as code is added.

This has the consequence that for optimal day-to-day productivity you need shell-scripting, but you should use it mostly for throwaway scripts, and use python everywhere else.

When I read this answer, I wanted to scream YES! Finally, someone has put eloquent words on my thoughts. Now I’ll add my own slightly-less-eloquent justifications:

Shell scripting language is commonly considered difficult to write.
As a corollary to the above, shell scripts are also generally viewed as difficult to read.
Shell scripting lacks the features in a full-service language (read: Python).

In a bit more detail: shell syntax is frequently unintuitive. For example, this is how you do a string replacement in shell: result=${foobar/foo/bar}. It replaces “foo” with “bar” – but this is cryptic if you aren’t using it frequently, and not immediately understood by the uninitiated. And even though I’m an old hand with vim and sed and appreciate the concise replacement syntax, I can still never remember the exact placement of the brackets in a bash script. More examples: for and while loops have a unique syntax that I have to look up every time I use them (with a : in a certain spot, optional parentheses, and the keyword done at the end). Function and script parameters use numbers for access ($1 is the first argument, etc.), leading to difficulty interpreting. For more evidence, just try to make sense of a long bash script.

Yes, syntax quibbles are subjective; but features are not. Shell scripting isn’t intended to be a feature-rich language, so it doesn’t make sense to implement complex programs using it. For example, shell scripts have no exception handling and cryptic error messages. To make matters worse, by default, if one command fails, the script continues (probably now with faulty data). Some of these things can be addressed with some effort, but all too often they are not. And the solutions are typically not as good as a basic python implementation.

Using shell for bioinformatics pipelines

In bioinformatics, pipelines are often built as shell scripts. I, too, have written my share of these. While there has been an explosion of bioinformatics pipeline framework development, many pipelines are still written in shell scripts. Bioinformaticians are attracted to shell by its apparent simplicity… as put above, “the shell makes common and simple actions really simple” – and what could be simpler than a string of commands? Nothing beats that simplicity and the shell feels perfect. But the problem is that pipelines have a way of growing and expanding far past the original vision. As data types, sizes, and complexity change, it leads to either adding shell control structures or duplicating scripts (both common in bioinformatics pipeline development). Eventually, this leads to monolithic, unmaintainable, incomprehensible beastly shell scripts that nobody wants to touch (including, often, the author). My growing frustration with these scripts is one of the things that led me to develop Pypiper (and its counterpart Looper), which aims to make it really easy to write better pipelines in python.

To conclude: shell scripts are fantastic… for small, throwaway analysis or just stringing together a few commands. But to build a maintainable, understandable, and scalable piece of software, just do the world a favor and start with Python from the beginning. I’ll close by repeating my rule of thumb:

If it requires more than 1 function or any type of control structure (like a loop, or if/then/else statement), you should probably use Python instead of shell