Automating analyses with make

Author

Affiliation

Kyle Niemeyer

Oregon State University

Published

February 26, 2025

Automating analyses using make

Automated analyses?

What if analysis depends on many files?
Need to redo analysis with new data?
What if analysis has several steps in a particular order?

Build manager: make

Tools like “Make” were developed to help compile complex software, but can also be used to automate any workflow.

How does Make work?

Each time the operating system creates, reads, or changes a file, it updates a timestamp on the file. Make compares these timestamps.
User describes which files depend on each other by writing rules in a Makefile.
Rules tell Make how to update an out-of-date file.
When running Make, it checks all the rules and runs the commands needed to update those that are out of date. If transitive dependencies, then Make traces through to run rules in the right order.

Update single file

Makefile

# regenerate results
results/moby_dick.csv : data/moby_dick.txt 
    python src/countwords.py \
    data/moby_dick.txt > results/moby_dick.csv

# indicates a comment
2nd and 3rd lines: build rule, using format target : prerequisite
backslash (\) splits line
recipe consists of 1+ shell commands, prefixed by single tab character (no spaces)

Run using command make

What happens?

If results/moby_dick.csv doesn’t exist, Make runs recipe to create it
If data/moby_dick.txt is newer than results/moby_dick.csv, Make runs recipe to update it
If results/moby_dick.csv is newer than its prerequisite, nothing happens

Managing multiple files

Makefile

# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt 
    python src/countwords.py data/moby_dick.txt > results/moby_dick.csv

# regenerate results for "Jane Eyre"
results/jane_eyre.csv : data/jane_eyre.txt 
    python src/countwords.py data/jane_eyre.txt > results/jane_eyre.csv

What happens?

By default, Make only attempts to update the first target (default target)

Could specify target directly: make results/jane_eyre.csv

Better, create “phony target” and place at top: all

# regenerate all results
all : results/moby_dick.csv results/jane_eyre.csv

...

Then type make all

Other phony target: `clean`

By convention a clean target provides rules to remove results/generated outputs

# remove all generated files
clean : 
    rm -rf results/*.csv

Then type make clean. Safer than manually typing!

Problem if file/directory named clean. Avoid this by explicitly telling phony targets at top of file:

.PHONY : all clean

Add programs to prerequisites

The results also depend on the programs used to generate them, so add to prerequisites:

# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt src/countwords.py
    python src/countwords.py data/moby_dick.txt > results/moby_dick.csv

# regenerate results for "Jane Eyre"
results/jane_eyre.csv : data/jane_eyre.txt src/countwords.py
    python src/countwords.py data/jane_eyre.txt > results/jane_eyre.csv

Reducing repetition: variables

Makefile

.PHONY : all clean

COUNT=src/countwords.py 
RUN_COUNT=python $(COUNT)

# regenerate all results
all : results/moby_dick.csv results/jane_eyre.csv

# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt $(COUNT)
    $(RUN_COUNT) data/moby_dick.txt > results/moby_dick.csv

# regenerate results for "Jane Eyre"
results/jane_eyre.csv : data/jane_eyre.txt $(COUNT)
    $(RUN_COUNT) data/jane_eyre.txt > results/jane_eyre.csv

# remove all generated files
clean :
    rm -f results/*.csv

Automatic variables

Automatic variable for target of the rule: $@

# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt $(COUNT)
    $(RUN_COUNT) data/moby_dick.txt > $@

The first prerequisite of the rule: $<

# regenerate results for "Moby Dick"
results/moby_dick.csv : data/moby_dick.txt $(COUNT)
    $(RUN_COUNT) $< > $@

Also: all prerequisites of the rule: $^

Generic rules

Create pattern rule using wildcard: %

results/%.csv : data/%.txt $(COUNT)
    $(RUN_COUNT) $< > $@

So full Makefile is:

Makefile

.PHONY : all clean 

COUNT=src/countwords.py 
RUN_COUNT=python $(COUNT)

# regenerate all results
all : results/moby_dick.csv results/jane_eyre.csv \
  results/time_machine.csv

# regenerate results for any book
results/%.csv : data/%.txt $(COUNT)
    $(RUN_COUNT) $< > $@

# remove all generated files
clean :
    rm -f results/*.csv

Define sets of files

Use variable to list all results files present:

RESULTS=results/*.csv
all : $(RESULTS)

But, only works if results already exist. Instead, use list of files data/ directory.

DATA=$(wildcard data/*.txt)

Use pattern substitution to create corresponding output files:

RESULTS=$(patsubst data/%.txt,results/%.csv,$(DATA))

`settings` target

Use settings target to print variables, using @ to avoid repeating command in output:

# ... rest of Makefile

# show variables' values
settings :
    @echo COUNT: $(COUNT)
    @echo DATA: $(DATA)
    @echo RESULTS: $(RESULTS)

Further streamlining

Remove RUN_COUNT variable:

# regenerate results for any book
results/%.csv : data/%.txt $(COUNT)
    python $(COUNT) $< > $@

Since all depends on $(RESULTS) we can regenerate in one step:

make clean
make

Documenting a Makefile

Create a phony target help to print commands:

.PHONY: all clean help settings

# ... other definitions ... 

# show help
help :
    @echo "all : regenerate all results."
    @echo "results/*.csv : regenerate result for any book."
    @echo "clean : remove all generated files."
    @echo "settings : show variables' values."
    @echo "help : show this message."

Problem with this? It requires manual updates.

“Auto”-documenting a Makefile

Use ## to mark lines to display and grep to pull lines:

Makefile

.PHONY: all clean help settings

COUNT=src/countwords.py 
DATA=$(wildcard data/*.txt) 
RESULTS=$(patsubst data/%.txt,results/%.csv,$(DATA))

## all : regenerate all results.
all : $(RESULTS)

## results/%.csv : regenerate result for any book.
results/%.csv : data/%.txt $(COUNT)
    python $(COUNT) $< > $@

## clean : remove all generated files.
clean :
    rm -f $(RESULTS)

## settings : show variables' names
settings :
    @echo COUNT: $(COUNT)
    @echo DATA: $(DATA)
    @echo RESULTS: $(RESULTS)

## help : show this message
help :
    @grep '^##' ./Makefile

Other uses for Make ?

Use Make to automate analyses
You could also include building a LaTeX document

Example with LaTeX

Makefile

.PHONY: all paper clean help settings

COUNT=src/countwords.py 
DATA=$(wildcard data/*.txt) 
RESULTS=$(patsubst data/%.txt,results/%.csv,$(DATA))

## all : regenerate paper and all results.
all : paper.pdf $(RESULTS)

## results/%.csv : regenerate result for any book.
results/%.csv : data/%.txt $(COUNT)
    python $(COUNT) $< > $@

## paper.pdf : regenerate paper.
paper.pdf : paper.tex paper.bib $(RESULTS)
    latexmk -pdf $<

## clean : remove all generated files.
clean :
    rm -f $(RESULTS)
    latexmk -c

## settings : show variables' names
settings :
    @echo COUNT: $(COUNT)
    @echo DATA: $(DATA)
    @echo RESULTS: $(RESULTS)

## help : show this message
help :
    @grep '^##' ./Makefile

Automating analyses using make

Automated analyses?

Build manager: make

How does Make work?

Update single file

What happens?

Managing multiple files

What happens?

Other phony target: clean

Add programs to prerequisites

Reducing repetition: variables

Automatic variables

Generic rules

Define sets of files

settings target

Further streamlining

Documenting a Makefile

“Auto”-documenting a Makefile

Other uses for Make ?

Example with LaTeX

Other phony target: `clean`

`settings` target