https://github.com/dan-zeman/mdmake
Imagine you need to apply the same sequence of tools to a set of data files, and possibly want to be able to repeat the experiment later, i.e. sometime in future you will want recall how precisely the processing would be invoked. One example is a shared task in processing of similarly formatted data in many languages. One may want to use make and Makefiles where the sequence of application of the various scripts can be well described. However, dealing with some phenomena of such sort of processing is rather tricky in classical Makefiles.
The most prominent phenomenon that is difficult to capture is what I call multidimensionality of the data. Every data file undergoes a sequence of processing steps, i.e. it appears in many different states (and intermediate data formats). Some processing tools may have alternative implementations, so you may have the same piece of data in the same stage of processing (e.g. syntactically parsed) but with different processing results (e.g. parsed either by Malt parser, or MST parser). Besides that, you may be applying the same processing to data in ten different languages, several domains per language, separately to development and evaluation test data etc. All these dimensions will probably be somehow reflected in the path to your data files. You probably would want to use pattern (template) rules in your Makefile to describe the same action applied to many files. However, gnu make allows you only one %
(variable) per pattern rule, which makes it rather difficult to define templates in the multidimensional space. This is where mdmake, or “multidimensional make” may be useful.
An older, more detailed discussion of the related problems is described here but it's in Czech.
makefile.mdm
) may contain all syntactic constructions that a normal makefile can contain. The constructions will be copied to a generated makefile and normal gnu make will be responsible for their interpretation. It has to be borne in mind however that they will be processed after the makefile will be generated. So for example if we include nested makefiles, these must be normal makefiles, not MD-makefiles..MDIMS: LANGUAGES/ DE TRAINTEST -PREPROCESSINGS .STATES
LANGUAGES = cs en DE = d e TRAINTEST = train test PREPROCESSINGS = pre1 pre2 STATES = mst blind.conll mst.conll
.MDRULE
, .MDALL
, .MDIN
.MDRULE
introduces the main type of pattern rule. It has the parameter .md.rul
, which specifies the target and source states / file types (values of the last dimension). For example, we may state that the target file type mst.conll
(a file parsed by the MST parser) needs source files of two types: blind.conll
(the text to be parsed) and mst
(the trained model for the MST parser)..MDRULE .md.rul: mst.conll < blind.conll mst @echo Run the parser here.
cs/dtest-pre1.mst.conll: cs/dtest-pre1.blind.conll cs/dtest-pre1.mst @echo Run the parser here. cs/dtest-pre2.mst.conll: cs/dtest-pre2.blind.conll cs/dtest-pre2.mst @echo Run the parser here. cs/etest-pre1.mst.conll: cs/etest-pre1.blind.conll cs/etest-pre1.mst @echo Run the parser here. cs/etest-pre2.mst.conll: cs/etest-pre2.blind.conll cs/etest-pre2.mst @echo Run the parser here. ... en/etest-pre2.mst.conll: en/etest-pre2.blind.conll en/etest-pre2.mst @echo Run the parser here.
.md.for
parameter specifies in what dimensions the target file exists. (The other dimensions will not appear in the file name.) If there is no parameter .md.for
the rule is generated for all known dimensions except the last one (STATES
in our case)..md.fix
parameter contains values that are fixed in this rule, i.e. the rule is not generated for other values of the same dimension. So far it is not allowed to include more values in one dimension (although in theory we may want to use it to constrain partial generation)..md.fix
contains a dimension that at the same time appears in .md.for
, it means that the target type exists in this dimension, has its value in its name/path but this particular rule generates this file only for one value of that dimension..md.fix
contains a dimension that does not appear in .md.for
, it means that the target file type does not know this dimension and does not have it in its name/path but one of the source files knows the dimension and needs to know what value ve have on mind. We can figure out from the rules generating the source files what dimensions they exist in..MDRULE .md.rul: mst.conll < blind.conll mst .md.for: LANGUAGES DE PREPROCESSINGS .md.fix: test @echo Run the parser here.
.md.if
directive but we would like to be able to constrain the .STATES
dimension (or the last dimension in the list) directly in the rule.$(*1)
(or other number instead of 1, for n-th dependency) are available within the commands in the rule. MD-make finds the rule that creates this dependency, uses it to determine the set of dimensions of the dependency, constructs the name of the file and replaces the variable by the file name. MD-make leaves intact $<
and $^
that will still work in the generated makefile. However, don't use $*
that does not make sense in MD-rules (unlike in normal pattern rules)..md.del
removes dimensions from .md.for
(handy if .md.for
is not explicitly stated and contains all dimensions by default).md.fxd
combines .md.fix
and .md.del
. Contains values, not dimensions (like .md.fix
and unlike .md.del
)$(*LANGUAGES)
) the reference will be converted to the actual value of the dimension. If different source files have different values of the same dimension within one generated rule the reference will be replaced by the value that the target file takes in this dimension (i.e. by the variable value). Anyway the purpose of such references is to refer to the variable dimensions. Modified values of particular source files are fixed exceptions, we know them in advance and can write them to the command directly, if necessary..MDRULE .md.rul: mst.conll < blind.conll mst .md.dep: $(TOOLDIR)/runmst.pl .md.for: LANGUAGES DE PREPROCESSINGS .md.fix: test @echo Running MST for language $(*LANGUAGES): $(TOOLDIR)/runmst.pl -m $(*2) < $< > $@
.md.in:
before such rule. MD-make adds a command to copy the source file to the target (cp $< $@
) and checks that the target file has values for all dimensions that a file in its state (last dimension's value) ought to have..MDALL
that creates a .PHONY
goal depending on all files containing given values. E.g..MDALL: d hi conll
would be rewritten as
.PHONY: all_d_hi_conll all_d_hi_conll: <list of all files containing values "d", "hi" and "conll">
Copyright © 2009 Daniel Zeman
All software supplied with this package is released under the GNU
General Public License. This program is free software; you can
redistribute it and/or modify it under the terms of the GNU General
Public License as published by the Free Software Foundation; either
version 2, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License (below or at http://www.gnu.org/licenses/
gpl.html) for more details.
mdmake.zip contains the script mdmake.pl
(you need a Perl interpreter to use it), a sample multidimensional makefile and the normal makefile generated from it.
This research has been supported by the grant of the Czech Ministry of Education no. MSM0021620838.