make workshop

Intro and setup

computer setup

You need make and python available

Options

  1. Download and run on a local computer - only recommended if it’s Linux/OSX
  2. Login using MobaXterm to FarmShare.
ssh NETID@rice.stanford.edu

Bitly to curl in the files:

curl -L bit.ly/36SQ1TI > make-lesson.zip

What is make

man make

Make is an automation tool for *NIX systems.

Originally written to facilitate compiling software.

Organizes and executes complex compilation in a flexible and readable way

conceptual model of how make works

Make as reproducible and scalable research tool

  • like a shell script, written out set of actions to carry-out
  • incremental processing, so can stop in the middle and resume
  • detects what needs to be updated, so changing early pre-processing propogates to more derived steps
  • allows other people, like your future self, to repeat analyses

Make is old, why use?

Other tools are more powerful for pipelines, why use make?

  • Makefiles are really ubiquitous on computing systems
  • Software often gets built with make, some times you need to just dig in and re-wire something
  • It works … usually

I use makefile to launch more complex tools.

example of make

https://github.com/lh3/bwa

Github for bwa, check out the Makefile

objectives

  • Introduce the concepts of make
  • Understand how to run make, run debugging in make
  • Build a simple pipeline to analyze book text
  • Refine the pipeline with more features, to show off

Testing things is the best way to learn! Ask questions if you can think of something it should be able to do, maybe we can find out how to do it.

beginning to build a makefile / pipeline

make-lesson archive orient

make-lesson.zip contains a directory structure. Unzipped, books contains text of books.

The directory also has three python scripts that do a couple of things.

python countwords.py books/sierra.txt sierra.dat
python testzipf.py sierra.dat

shell scripts for (re)producibility

#!/bin/bash
python countwords.py books/sierra.txt sierra.dat
python countwords.py books/last.txt last.dat
python testzipf.py sierra.dat last.dat
  • can’t adapt to input
  • can’t keep track of doing only what’s necessary to update, incremental work

really really basic Makefile

Make a text file called exactly Makefile

a_file: 
    date > a_file

Run it like so

make a_file
make
make a_file -nd | less

rules define how to make targets from dependencies using actions

TABs are critical

now with two rules

a_file: 
    date > a_file

another_file: a_file
    cat a_file > another_file
    date >> another_file
    cat a_file >> another_file

Run with

make another_file
make another_file -dn | less

return to books

sierra.dat : books/sierra.txt
    python countwords.py books/sierra.txt sierra.dat

last.dat : books/last.txt
    python countwords.py books/last.txt last.dat

results.txt : sierra.dat
    python testzipf.py sierra.dat > results.txt

phony rules, and comments

disp_results:
    less results.txt

make disp_results
mkdir disp_results
make disp_results

touch empty_file creates an empty file, or updates modification time

usually phony rule - all

.PHONY: all

all: results.txt

By default make runs the first thing, but will parse for running all.

Another good one is:

.PHONY: all clean
clean:
    rm *dat
all: results.txt

talking to yourself is totally normal

# is comments, @ means don’t echo it

write another rule to process another book

write another rule to process n books

Rewrite with some of these automatic variables

sierra.dat : books/sierra.txt
    python countwords.py $^ $@

results.txt : sierra.dat
    python testzipf.py $^ > $@

make a_file -d | less

scripts as dependencies

sierra.dat : books/sierra.txt countwords.py
    python countwords.py $< $@

fun ctions

Functions are called by doing something like $(functionName arugments,here)

sierra.dat : books/sierra.txt countwords.py whatever
    @echo $(word 1,$^)

https://www.gnu.org/software/make/manual/html_node/Functions.html

pipeline continued

pattern rules

% is a wildcard that gets substituted for % in dependencies

%.dat : books/%.txt countwords.py
    python countwords.py $< $@

results.txt : sierra.dat last.dat abyss.dat
    python testzipf.py $^ > $@

$* is special automatic variable available in the action

define your own variables

ALL_THE_DAT = sierra.dat last.dat abyss.dat last.dat

%.dat : books/%.txt countwords.py
    python countwords.py $< $@

results.txt : ${ALL_THE_DAT}
    python testzipf.py $^ > $@

Shell variables require $$ to work!

define your variables with functions

Can we define ALL_THE_DAT using a function?

ALL_THE_DAT=$(patsubst books/%.txt,%.dat,$(wildcard books/*.txt))

Slightly awkward - remember make is a pull paradigm pipeline, not push

add a book from

Project Gutenberg

UTF-8

Dig through someone else’s makefile

check out bwa Makefile again