Automating analyses with make


Kyle Niemeyer. 1 Mar 2022

ME 599, Corvallis, OR

Automated analyses?


  • What if analysis depends on many files?
  • Need to redo analysis with new data?
  • What if analysis has several steps in a particular order?

Update single file

Contents of Makefile:


            # regenerate results
            results/moby_dick.csv : data/moby_dick.txt 
                python src/countwords.py \
                    data/moby_dick.txt > results/moby_dick.csv
        
  • # indicates a comment
  • 2nd and 3rd lines: build rule, using format target : prerequisite
  • backslash (/) splits line
  • recipe consists of 1+ shell commands, prefixed by single tab character (no spaces)

Run using command make

What happens?


  1. If results/moby_dick.csv doesn't exist, Make runs recipe to create it
  2. If data/moby_dick.txt is newer than results/moby_dick.csv, Make runs recipe to update it
  3. If results/moby_dick.csv is newer than its prerequisite, nothing happens

Managing multiple files


Contents of Makefile:


            # regenerate results for "Moby Dick"
            results/moby_dick.csv : data/moby_dick.txt 
                python src/countwords.py \
                    data/moby_dick.txt > results/moby_dick.csv

            # regenerate results for "Jane Eyre"
            results/jane_eyre.csv : data/jane_eyre.txt 
                python src/countwords.py \
                    data/jane_eyre.txt > results/jane_eyre.csv
        

What happens?


By default, Make only attempts to update the first target (default target)

Could specify target directly: make results/jane_eyre.csv

Better, create "phony target" and place at top: all


            # regenerate all results
            all : results/moby_dick.csv results/jane_eyre.csv
        

Then type make all

Other phony target: clean


By convention a clean target provides rules to remove results/generated outputs


            # remove all generated files
            clean : 
                rm -rf results/*.csv
        

Then type make clean. Safer than manually typing!

Problem if file/directory named "clean". Avoid this by explicitly telling phony targets at top of file:


            .PHONY : all clean
        

Add programs to prerequisites


The results also depend on the programs used to generate them, so add to prerequisites:


            # regenerate results for "Moby Dick"
            results/moby_dick.csv : data/moby_dick.txt src/countwords.py
                python src/countwords.py \
                    data/moby_dick.txt > results/moby_dick.csv

            # regenerate results for "Jane Eyre"
            results/jane_eyre.csv : data/jane_eyre.txt src/countwords.py
                python src/countwords.py \
                    data/jane_eyre.txt > results/jane_eyre.csv
        

Reducing repetition: variables



            .PHONY : all clean 

            COUNT=src/countwords.py 
            RUN_COUNT=python $(COUNT)

            # regenerate all results
            all : results/moby_dick.csv results/jane_eyre.csv

            # regenerate results for "Moby Dick"
            results/moby_dick.csv : data/moby_dick.txt $(COUNT)
                $(RUN_COUNT) data/moby_dick.txt > results/moby_dick.csv

            # regenerate results for "Jane Eyre"
            results/jane_eyre.csv : data/jane_eyre.txt $(COUNT)
                $(RUN_COUNT) data/jane_eyre.txt > results/jane_eyre.csv

            # remove all generated files
            clean :
                rm -f results/*.csv
        

Automatic variables


Automatic variable for target of the rule: $@


            # regenerate results for "Moby Dick"
            results/moby_dick.csv : data/moby_dick.txt $(COUNT)
                $(RUN_COUNT) data/moby_dick.txt > $@
        

The first prerequisite of the rule: $<


            # regenerate results for "Moby Dick"
            results/moby_dick.csv : data/moby_dick.txt $(COUNT)
                $(RUN_COUNT) $< > $@
        

Also: all prerequisites of the rule: $^

Generic rules

Create pattern rule using wildcard: %


            results/%.csv : data/%.txt $(COUNT)
                $(RUN_COUNT) $< > $@
        

So full Makefile is:


            .PHONY : all clean 

            COUNT=src/countwords.py 
            RUN_COUNT=python $(COUNT)

            # regenerate all results
            all : results/moby_dick.csv results/jane_eyre.csv \
              results/time_machine.csv

            # regenerate results for any book
            results/%.csv : data/%.txt $(COUNT)
                $(RUN_COUNT) $< > $@

            # remove all generated files
            clean :
                rm -f results/*.csv
        

Define sets of files

Use variable to list all results files present:


            RESULTS=results/*.csv
            all : $(RESULTS)
        

But, only works if results already exist. Instead, use list in data/ directory.


            DATA=$(wildcard data/*.txt)
        

Use pattern substitution to create corresponding output files:


            RESULTS=$(patsubst data/%.txt,results/%.csv,$(DATA))
        

Use settings target to print variables, using @ to avoid repeating command in output:


            # ... rest of Makefile
            
            # show variables' values
            settings :
                @echo COUNT: $(COUNT)
                @echo DATA: $(DATA)
                @echo RESULTS: $(RESULTS)
        

Further streamlining: remove RUN_COUNT variable:


            # regenerate results for any book
            results/%.csv : data/%.txt $(COUNT)
                python $(COUNT) $< > $@
        

Since all depends on $(RESULTS) we can regenerate in one step:


            make clean
            make
        

Documenting a Makefile

Create a phony target help to print commands:


            .PHONY: all clean help settings

            # ... other definitions ... 

            # show help
            help :
                @echo "all : regenerate all results."
                @echo "results/*.csv : regenerate result for any book."
                @echo "clean : remove all generated files."
                @echo "settings : show variables' values."
                @echo "help : show this message."
        

Problem with this? Requires manual updates.

"Auto"-documenting a Makefile

Use ## to mark lines to display and grep to pull lines:


            .PHONY: all clean help settings

            COUNT=src/countwords.py 
            DATA=$(wildcard data/*.txt) 
            RESULTS=$(patsubst data/%.txt,results/%.csv,$(DATA))

            ## all : regenerate all results.
            all : $(RESULTS)

            ## results/%.csv : regenerate result for any book.
            results/%.csv : data/%.txt $(COUNT)
                python $(COUNT) $< > $@

            ## clean : remove all generated files.
            clean :
                rm -f $(RESULTS)

            ## settings : show variables' names
            settings :
                @echo COUNT: $(COUNT)
                @echo DATA: $(DATA)
                @echo RESULTS: $(RESULTS)

            ## help : show this message
            help :
                @grep '^##' ./Makefile
        

Example with LaTeX


                .PHONY: all paper clean help settings

                COUNT=src/countwords.py 
                DATA=$(wildcard data/*.txt) 
                RESULTS=$(patsubst data/%.txt,results/%.csv,$(DATA))
    
                ## all : regenerate paper and all results.
                all : paper.pdf $(RESULTS)
    
                ## results/%.csv : regenerate result for any book.
                results/%.csv : data/%.txt $(COUNT)
                    python $(COUNT) $< > $@

                ## paper.pdf : regenerate paper.
                paper.pdf : paper.tex paper.bib $(RESULTS)
                    latexmk -pdf $<
    
                ## clean : remove all generated files.
                clean :
                    rm -f $(RESULTS)
                    latexmk -c
    
                ## settings : show variables' names
                settings :
                    @echo COUNT: $(COUNT)
                    @echo DATA: $(DATA)
                    @echo RESULTS: $(RESULTS)
    
                ## help : show this message
                help :
                    @grep '^##' ./Makefile