Automating analyses with make

Author
Affiliation

Kyle Niemeyer

Oregon State University

Published

February 26, 2025

Automating analyses using make

Automated analyses?

  • What if analysis depends on many files?
  • Need to redo analysis with new data?
  • What if analysis has several steps in a particular order?

Build manager: make

Tools like “Make” were developed to help compile complex software, but can also be used to automate any workflow.

How does Make work?

  1. Each time the operating system creates, reads, or changes a file, it updates a timestamp on the file. Make compares these timestamps.
  2. User describes which files depend on each other by writing rules in a Makefile.
  3. Rules tell Make how to update an out-of-date file.
  4. When running Make, it checks all the rules and runs the commands needed to update those that are out of date. If transitive dependencies, then Make traces through to run rules in the right order.

Update single file

Makefile
# regenerate results
results/moby_dick.csv : data/moby_dick.txt 
    python src/countwords.py \
    data/moby_dick.txt > results/moby_dick.csv
  • # indicates a comment
  • 2nd and 3rd lines: build rule, using format target : prerequisite
  • backslash (\) splits line
  • recipe consists of 1+ shell commands, prefixed by single tab character (no spaces)

Run using command make

What happens?

  1. If results/moby_dick.csv doesn’t exist, Make runs recipe to create it
  2. If data/moby_dick.txt is newer than results/moby_dick.csv, Make runs recipe to update it
  3. If results/moby_dick.csv is newer than its prerequisite, nothing happens

Managing multiple files

Makefile
# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt 
    python src/countwords.py data/moby_dick.txt > results/moby_dick.csv

# regenerate results for "Jane Eyre"
results/jane_eyre.csv : data/jane_eyre.txt 
    python src/countwords.py data/jane_eyre.txt > results/jane_eyre.csv

What happens?

By default, Make only attempts to update the first target (default target)

Could specify target directly: make results/jane_eyre.csv

Better, create “phony target” and place at top: all

# regenerate all results
all : results/moby_dick.csv results/jane_eyre.csv

...

Then type make all

Other phony target: clean

By convention a clean target provides rules to remove results/generated outputs

# remove all generated files
clean : 
    rm -rf results/*.csv

Then type make clean. Safer than manually typing!

Problem if file/directory named clean. Avoid this by explicitly telling phony targets at top of file:

.PHONY : all clean

Add programs to prerequisites

The results also depend on the programs used to generate them, so add to prerequisites:

# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt src/countwords.py
    python src/countwords.py data/moby_dick.txt > results/moby_dick.csv

# regenerate results for "Jane Eyre"
results/jane_eyre.csv : data/jane_eyre.txt src/countwords.py
    python src/countwords.py data/jane_eyre.txt > results/jane_eyre.csv

Reducing repetition: variables

Makefile
.PHONY : all clean

COUNT=src/countwords.py 
RUN_COUNT=python $(COUNT)

# regenerate all results
all : results/moby_dick.csv results/jane_eyre.csv

# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt $(COUNT)
    $(RUN_COUNT) data/moby_dick.txt > results/moby_dick.csv

# regenerate results for "Jane Eyre"
results/jane_eyre.csv : data/jane_eyre.txt $(COUNT)
    $(RUN_COUNT) data/jane_eyre.txt > results/jane_eyre.csv

# remove all generated files
clean :
    rm -f results/*.csv

Automatic variables

Automatic variable for target of the rule: $@

# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt $(COUNT)
    $(RUN_COUNT) data/moby_dick.txt > $@

The first prerequisite of the rule: $<

# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt $(COUNT)
    $(RUN_COUNT) $< > $@

Also: all prerequisites of the rule: $^

Generic rules

Create pattern rule using wildcard: %

results/%.csv : data/%.txt $(COUNT)
    $(RUN_COUNT) $< > $@

So full Makefile is:

Makefile
.PHONY : all clean 

COUNT=src/countwords.py 
RUN_COUNT=python $(COUNT)

# regenerate all results
all : results/moby_dick.csv results/jane_eyre.csv \
  results/time_machine.csv

# regenerate results for any book
results/%.csv : data/%.txt $(COUNT)
    $(RUN_COUNT) $< > $@

# remove all generated files
clean :
    rm -f results/*.csv

Define sets of files

Use variable to list all results files present:

RESULTS=results/*.csv
all : $(RESULTS)

But, only works if results already exist. Instead, use list of files data/ directory.

DATA=$(wildcard data/*.txt)

Use pattern substitution to create corresponding output files:

RESULTS=$(patsubst data/%.txt,results/%.csv,$(DATA))

settings target

Use settings target to print variables, using @ to avoid repeating command in output:

# ... rest of Makefile

# show variables' values
settings :
    @echo COUNT: $(COUNT)
    @echo DATA: $(DATA)
    @echo RESULTS: $(RESULTS)

Further streamlining

Remove RUN_COUNT variable:

# regenerate results for any book
results/%.csv : data/%.txt $(COUNT)
    python $(COUNT) $< > $@

Since all depends on $(RESULTS) we can regenerate in one step:

make clean
make

Documenting a Makefile

Create a phony target help to print commands:

.PHONY: all clean help settings

# ... other definitions ... 

# show help
help :
    @echo "all : regenerate all results."
    @echo "results/*.csv : regenerate result for any book."
    @echo "clean : remove all generated files."
    @echo "settings : show variables' values."
    @echo "help : show this message."

Problem with this? It requires manual updates.

“Auto”-documenting a Makefile

Use ## to mark lines to display and grep to pull lines:

Makefile
.PHONY: all clean help settings

COUNT=src/countwords.py 
DATA=$(wildcard data/*.txt) 
RESULTS=$(patsubst data/%.txt,results/%.csv,$(DATA))

## all : regenerate all results.
all : $(RESULTS)

## results/%.csv : regenerate result for any book.
results/%.csv : data/%.txt $(COUNT)
    python $(COUNT) $< > $@

## clean : remove all generated files.
clean :
    rm -f $(RESULTS)

## settings : show variables' names
settings :
    @echo COUNT: $(COUNT)
    @echo DATA: $(DATA)
    @echo RESULTS: $(RESULTS)

## help : show this message
help :
    @grep '^##' ./Makefile

Other uses for Make ?

  • Use Make to automate analyses
  • You could also include building a LaTeX document

Example with LaTeX

Makefile
.PHONY: all paper clean help settings

COUNT=src/countwords.py 
DATA=$(wildcard data/*.txt) 
RESULTS=$(patsubst data/%.txt,results/%.csv,$(DATA))

## all : regenerate paper and all results.
all : paper.pdf $(RESULTS)

## results/%.csv : regenerate result for any book.
results/%.csv : data/%.txt $(COUNT)
    python $(COUNT) $< > $@

## paper.pdf : regenerate paper.
paper.pdf : paper.tex paper.bib $(RESULTS)
    latexmk -pdf $<

## clean : remove all generated files.
clean :
    rm -f $(RESULTS)
    latexmk -c

## settings : show variables' names
settings :
    @echo COUNT: $(COUNT)
    @echo DATA: $(DATA)
    @echo RESULTS: $(RESULTS)

## help : show this message
help :
    @grep '^##' ./Makefile