make workshop

Intro and setup

computer setup

You need make and python available


  1. Download and run on a local computer - only recommended if it’s Linux/OSX
  2. Login using MobaXterm to FarmShare.

Bitly to curl in the files:

curl -L >

What is make

man make

Make is an automation tool for *NIX systems.

Originally written to facilitate compiling software.

Organizes and executes complex compilation in a flexible and readable way

conceptual model of how make works

Make as reproducible and scalable research tool

  • like a shell script, written out set of actions to carry-out
  • incremental processing, so can stop in the middle and resume
  • detects what needs to be updated, so changing early pre-processing propogates to more derived steps
  • allows other people, like your future self, to repeat analyses

Make is old, why use?

Other tools are more powerful for pipelines, why use make?

  • Makefiles are really ubiquitous on computing systems
  • Software often gets built with make, some times you need to just dig in and re-wire something
  • It works … usually

I use makefile to launch more complex tools.

example of make

Github for bwa, check out the Makefile


  • Introduce the concepts of make
  • Understand how to run make, run debugging in make
  • Build a simple pipeline to analyze book text
  • Refine the pipeline with more features, to show off

Testing things is the best way to learn! Ask questions if you can think of something it should be able to do, maybe we can find out how to do it.

beginning to build a makefile / pipeline

make-lesson archive orient contains a directory structure. Unzipped, books contains text of books.

The directory also has three python scripts that do a couple of things.

python books/sierra.txt sierra.dat
python sierra.dat

shell scripts for (re)producibility

python books/sierra.txt sierra.dat
python books/last.txt last.dat
python sierra.dat last.dat
  • can’t adapt to input
  • can’t keep track of doing only what’s necessary to update, incremental work

really really basic Makefile

Make a text file called exactly Makefile

    date > a_file

Run it like so

make a_file
make a_file -nd | less

rules define how to make targets from dependencies using actions

TABs are critical

now with two rules

    date > a_file

another_file: a_file
    cat a_file > another_file
    date >> another_file
    cat a_file >> another_file

Run with

make another_file
make another_file -dn | less

return to books

sierra.dat : books/sierra.txt
    python books/sierra.txt sierra.dat

last.dat : books/last.txt
    python books/last.txt last.dat

results.txt : sierra.dat
    python sierra.dat > results.txt

phony rules, and comments

    less results.txt

make disp_results
mkdir disp_results
make disp_results

touch empty_file creates an empty file, or updates modification time

usually phony rule - all

.PHONY: all

all: results.txt

By default make runs the first thing, but will parse for running all.

Another good one is:

.PHONY: all clean
    rm *dat
all: results.txt

talking to yourself is totally normal

# is comments, @ means don’t echo it

write another rule to process another book

write another rule to process n books

Rewrite with some of these automatic variables

sierra.dat : books/sierra.txt
    python $^ $@

results.txt : sierra.dat
    python $^ > $@

make a_file -d | less

scripts as dependencies

sierra.dat : books/sierra.txt
    python $< $@

fun ctions

Functions are called by doing something like $(functionName arugments,here)

sierra.dat : books/sierra.txt whatever
    @echo $(word 1,$^)

pipeline continued

pattern rules

% is a wildcard that gets substituted for % in dependencies

%.dat : books/%.txt
    python $< $@

results.txt : sierra.dat last.dat abyss.dat
    python $^ > $@

$* is special automatic variable available in the action

define your own variables

ALL_THE_DAT = sierra.dat last.dat abyss.dat last.dat

%.dat : books/%.txt
    python $< $@

results.txt : ${ALL_THE_DAT}
    python $^ > $@

Shell variables require $$ to work!

define your variables with functions

Can we define ALL_THE_DAT using a function?

ALL_THE_DAT=$(patsubst books/%.txt,%.dat,$(wildcard books/*.txt))

Slightly awkward - remember make is a pull paradigm pipeline, not push

add a book from

Project Gutenberg


Dig through someone else’s makefile

check out bwa Makefile again