[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Table of Contents

mdmake

https://github.com/dan-zeman/mdmake

Imagine you need to apply the same sequence of tools to a set of data files, and possibly want to be able to repeat the experiment later, i.e. sometime in future you will want recall how precisely the processing would be invoked. One example is a shared task in processing of similarly formatted data in many languages. One may want to use make and Makefiles where the sequence of application of the various scripts can be well described. However, dealing with some phenomena of such sort of processing is rather tricky in classical Makefiles.

The most prominent phenomenon that is difficult to capture is what I call multidimensionality of the data. Every data file undergoes a sequence of processing steps, i.e. it appears in many different states (and intermediate data formats). Some processing tools may have alternative implementations, so you may have the same piece of data in the same stage of processing (e.g. syntactically parsed) but with different processing results (e.g. parsed either by Malt parser, or MST parser). Besides that, you may be applying the same processing to data in ten different languages, several domains per language, separately to development and evaluation test data etc. All these dimensions will probably be somehow reflected in the path to your data files. You probably would want to use pattern (template) rules in your Makefile to describe the same action applied to many files. However, gnu make allows you only one % (variable) per pattern rule, which makes it rather difficult to define templates in the multidimensional space. This is where mdmake, or “multidimensional make” may be useful.

An older, more detailed discussion of the related problems is described here but it's in Czech.

Makefile

.MDIMS: LANGUAGES/ DE TRAINTEST -PREPROCESSINGS .STATES
LANGUAGES = cs en
DE = d e
TRAINTEST = train test
PREPROCESSINGS = pre1 pre2
STATES = mst blind.conll mst.conll
.MDRULE
.md.rul: mst.conll < blind.conll mst
        @echo Run the parser here.
cs/dtest-pre1.mst.conll: cs/dtest-pre1.blind.conll cs/dtest-pre1.mst
        @echo Run the parser here.
cs/dtest-pre2.mst.conll: cs/dtest-pre2.blind.conll cs/dtest-pre2.mst
        @echo Run the parser here.
cs/etest-pre1.mst.conll: cs/etest-pre1.blind.conll cs/etest-pre1.mst
        @echo Run the parser here.
cs/etest-pre2.mst.conll: cs/etest-pre2.blind.conll cs/etest-pre2.mst
        @echo Run the parser here.
...
en/etest-pre2.mst.conll: en/etest-pre2.blind.conll en/etest-pre2.mst
        @echo Run the parser here.
.MDRULE
.md.rul: mst.conll < blind.conll mst
.md.for: LANGUAGES DE PREPROCESSINGS
.md.fix: test
        @echo Run the parser here.
.MDRULE
.md.rul: mst.conll < blind.conll mst
.md.dep: $(TOOLDIR)/runmst.pl
.md.for: LANGUAGES DE PREPROCESSINGS
.md.fix: test
        @echo Running MST for language $(*LANGUAGES):
        $(TOOLDIR)/runmst.pl -m $(*2) < $< > $@
.MDALL: d hi conll

would be rewritten as

.PHONY: all_d_hi_conll
all_d_hi_conll: <list of all files containing values "d", "hi" and "conll">

Download

Copyright © 2009 Daniel Zeman

All software supplied with this package is released under the GNU
General Public License. This program is free software; you can
redistribute it and/or modify it under the terms of the GNU General
Public License as published by the Free Software Foundation; either
version 2, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License (below or at http://www.gnu.org/licenses/
gpl.html) for more details.

mdmake.zip contains the script mdmake.pl (you need a Perl interpreter to use it), a sample multidimensional makefile and the normal makefile generated from it.

Acknowledgements

This research has been supported by the grant of the Czech Ministry of Education no. MSM0021620838.


[ Back to the navigation ] [ Back to the content ]