[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:mdmake [2010/11/05 15:28]
zeman
user:zeman:mdmake [2023/04/21 18:17] (current)
zeman Now versioned at Github.
Line 1: Line 1:
 ====== mdmake ====== ====== mdmake ======
 +
 +[[https://github.com/dan-zeman/mdmake]]
  
 Imagine you need to apply the same sequence of tools to a set of data files, and possibly want to be able to repeat the experiment later, i.e. sometime in future you will want recall how precisely the processing would be invoked. One example is a shared task in processing of similarly formatted data in many languages. One may want to use [[http://www.gnu.org/software/make/manual/make.html|make]] and Makefiles where the sequence of application of the various scripts can be well described. However, dealing with some phenomena of such sort of processing is rather tricky in classical Makefiles. Imagine you need to apply the same sequence of tools to a set of data files, and possibly want to be able to repeat the experiment later, i.e. sometime in future you will want recall how precisely the processing would be invoked. One example is a shared task in processing of similarly formatted data in many languages. One may want to use [[http://www.gnu.org/software/make/manual/make.html|make]] and Makefiles where the sequence of application of the various scripts can be well described. However, dealing with some phenomena of such sort of processing is rather tricky in classical Makefiles.
Line 11: Line 13:
   * A MD-makefile (''makefile.mdm'') may contain all syntactic constructions that a normal makefile can contain. The constructions will be copied to a generated makefile and normal gnu make will be responsible for their interpretation. It has to be borne in mind however that they will be processed //after// the makefile will be generated. So for example if we include nested makefiles, these must be normal makefiles, not MD-makefiles.   * A MD-makefile (''makefile.mdm'') may contain all syntactic constructions that a normal makefile can contain. The constructions will be copied to a generated makefile and normal gnu make will be responsible for their interpretation. It has to be borne in mind however that they will be processed //after// the makefile will be generated. So for example if we include nested makefiles, these must be normal makefiles, not MD-makefiles.
   * Enumerate variables that contain values of respective dimensions. At the same time tell how to combine them into file names (paths). (The spaces will be deleted, their purpose here is to show what delimiter should be omitted if a dimension is omitted. Permitted delimiters are slash, hyphen and period.)   * Enumerate variables that contain values of respective dimensions. At the same time tell how to combine them into file names (paths). (The spaces will be deleted, their purpose here is to show what delimiter should be omitted if a dimension is omitted. Permitted delimiters are slash, hyphen and period.)
 +
 <code>.MDIMS: LANGUAGES/ DE TRAINTEST -PREPROCESSINGS .STATES</code> <code>.MDIMS: LANGUAGES/ DE TRAINTEST -PREPROCESSINGS .STATES</code>
 +
   * The delimiters are not mandatory but MD-make checks whether missing delimiters do not cause ambiguities (e.g. if LANGUAGES = hi him, DOMAINS = mix ix, then .MDIMS: LANGUAGES DOMAINS would cause problems).   * The delimiters are not mandatory but MD-make checks whether missing delimiters do not cause ambiguities (e.g. if LANGUAGES = hi him, DOMAINS = mix ix, then .MDIMS: LANGUAGES DOMAINS would cause problems).
   * The last dimension in the list of dimensions is special. It need not be named STATES and it need not be delimited by a period (although it is recommended - in some operating systems it is desirable that the file name extension defines the type of the contents), nevertheless the value of this dimension is considered the type of the file. Among others, the file type defines, in what dimensions the files of this type exist. MD-make gets that information from the rule that generates files of this type as its goal. For every type there must be at least one such rule. Theoretically there can be more if e.g. we want to perform different actions for different languages. In that case all such rules must lead to the same list of dimensions of the goal. However, they are not required to cover together all values of all these dimensions.   * The last dimension in the list of dimensions is special. It need not be named STATES and it need not be delimited by a period (although it is recommended - in some operating systems it is desirable that the file name extension defines the type of the contents), nevertheless the value of this dimension is considered the type of the file. Among others, the file type defines, in what dimensions the files of this type exist. MD-make gets that information from the rule that generates files of this type as its goal. For every type there must be at least one such rule. Theoretically there can be more if e.g. we want to perform different actions for different languages. In that case all such rules must lead to the same list of dimensions of the goal. However, they are not required to cover together all values of all these dimensions.
   * The respective variables with values of the respective dimensions must be normal variables containing only a list of words separated by spaces. MD-make will not search them for references to other variables or macros. If it encounters a dollar sign in these variables, it will throw an exception and terminate. These variables will be visible in the generated makefile as well.   * The respective variables with values of the respective dimensions must be normal variables containing only a list of words separated by spaces. MD-make will not search them for references to other variables or macros. If it encounters a dollar sign in these variables, it will throw an exception and terminate. These variables will be visible in the generated makefile as well.
   * No value in no dimension can be identical with any other value of any dimension. In other words, a value uniquely identifies its dimension. (This helps prevent ambiguities in file names that do not contain all dimensions.)   * No value in no dimension can be identical with any other value of any dimension. In other words, a value uniquely identifies its dimension. (This helps prevent ambiguities in file names that do not contain all dimensions.)
-  * There are special keywords to mark a multidimensional pattern rule. The following parameters can be supplied, too: 
-    * In what dimensions the target file exists. (The other dimensions will not appear in the file name.) 
-    * What are the constraints for the values in the respective dimensions. (Standard way is the ''.md.if'' directive but we would like to be able to constrain the ''.STATES'' dimension (or the last dimension in the list) directly in the rule. 
-    * MD-make will generate many normal rules from the multidimensional rule. In these generated rules, all combinations of all values in all affected dimensions will appear. As these rules are not templatic any more, we don't have to fear that gnu make will encounter cyclic dependencies or other problems. 
  
-    * Uvnitř příkazů lze použít nové proměnné ''$(*1)'', resp. místo jedničky jiné číslo, pro n-tou závislost. MD-make si najde pravidlo, kterým tato závislost vzniká, zjistí si z&nbsp;něj, v&nbsp;jakých rozměrech se pohybuje, a podle toho zkonstruuje jméno příslušného souboru, které na dané místo vloží. Beze změny ponechá ''$<'' a ''$^'', které budou fungovat samy od sebe, avšak pozor na ''$*'', které v&nbsp;MD pravidlech (na rozdíl od obyčejných šablonovitých pravidel) nemá smysl. +<code>LANGUAGES = cs en 
-    * MD pravidlo končí povinně prázdným řádkem (dokonce i na konci souboru). +DE = d e 
-    * Není-li uveden parametr ''.md.for'', pravidlo se rozgeneruje pro všechny známé rozměry kromě posledního (u nás ''STATES'', ale může se jmenovat i jinak)+TRAINTEST = train test 
-    Parametr ''.md.fix'' obsahuje hodnotykteré jsou v&nbsp;tomto pravidle pevné, tj. pravidlo se nerozgenerovává pro ostatní hodnoty téhož rozměru. Není zatím dovoleno uvést více hodnot ve stejném rozměru (i když by to teoreticky mohlo sloužit k&nbsp;vymezení částečného rozgenerování). +PREPROCESSINGS = pre1 pre2 
-      * Jestliže ''.md.fix'' obsahuje rozměrkterý je současně uveden v&nbsp;''.md.for'', znamená to, že cílový typ souboru se pohybuje v&nbsp;tomto rozměru, má jeho hodnotu uvedenou v&nbsp;cestě, akorát toto konkrétní pravidlo generuje tento soubor pouze pro jednu hodnotu v&nbsp;dotyčném rozměru. +STATES = mst blind.conll mst.conll</code> 
-      Jestliže ''.md.fix'' obsahuje rozměr, který není současně uveden v&nbsp;''.md.for'', znamená to, že cílový typ souboru tento rozměr nezná a nemá ho uveden v&nbsp;cestě, avšak některý ze zdrojových souborů tento rozměr má a potřebuje vědět, kterou hodnotu máme na mysliKteré ze zdrojových souborů hodnotu ''.md.fix'' využijípoznáme z&nbsp;pravidel, která tyto soubory generují jako cílové a vymezují jejich rozměry. + 
-      * Jestliže některý zdrojový soubor vyžaduje rozměrkterý cílový soubor neobsahuje, a tento rozměr není zafixován, pravidlo se rozgeneruje i pro všechny hodnoty tohoto rozměru. Bude pak existovat několik konkurenčních pravidel, která vytvářejí tentýž cílový soubor. +  There are special keywords to mark a multidimensional rule: ''.MDRULE'', ''.MDALL'', ''.MDIN'' 
-    * ''.md.del'' odstraní rozměry z&nbsp;''.md.for'' (nejvíce se hodí, když ''.md.for'' není uvedeno defaultně tedy obsahuje všechny rozměry) +  * ''.MDRULE'' introduces the main type of pattern ruleIt has the parameter ''.md.rul'', which specifies the target and source states / file types (values of the last dimension)For examplewe may state that the target file type ''mst.conll'' (a file parsed by the MST parserneeds source files of two types: ''blind.conll'' (the text to be parsed) and ''mst'' (the trained model for the MST parser).
-    * ''.md.fxd'' je jako ''.md.fix'' a ''.md.del'' dohromady. Uvádějí se hodnoty, nikoli názvy rozměrů (tedy jako u ''.md.fix'' a na rozdíl od ''.md.del''+
-    * Odkaz na hodnotu rozměru z&nbsp;příkazu (např. ''$(*LANGUAGES)'') se převede na aktuální hodnotu daného rozměru. Pokud mohou mít různé zdrojové soubory různé hodnoty téhož rozměru v&nbsp;rámci jednoho vygenerovaného pravidla, odkaz se převede na hodnotu, které v&nbsp;tomto rozměru nabývá cílový soubor, resp. která je proměnná. Odkazy tohoto druhu byly stejně zavedeny kvůli proměnným rozměrům. Odlišné hodnoty u konkrétních zdrojových souborů jsou fixní výjimky, tyto hodnoty známe předem a v&nbsp;případě potřeby je můžeme do příkazu zapsat přímo.+
  
 <code>.MDRULE <code>.MDRULE
-.md.rul mst.conll < blind.conll mst +.md.rul: mst.conll < blind.conll mst 
-.md.dep $(TOOLDIR)/runmst.pl+        @echo Run the parser here. 
 +</code> 
 + 
 +  * A MD-rule ends obligatorily with an empty line (even at the end of the file). 
 +  * MD-make will generate many normal rules from the multidimensional rule. In these generated rules, all combinations of all values in all affected dimensions will appear. As these rules are not templatic any more, we don't have to fear that gnu make will encounter cyclic dependencies or other problems. For instance, the above multidimensional rule yields the following normal rules, among others: 
 + 
 +<code>cs/dtest-pre1.mst.conll: cs/dtest-pre1.blind.conll cs/dtest-pre1.mst 
 +        @echo Run the parser here. 
 +cs/dtest-pre2.mst.conll: cs/dtest-pre2.blind.conll cs/dtest-pre2.mst 
 +        @echo Run the parser here. 
 +cs/etest-pre1.mst.conll: cs/etest-pre1.blind.conll cs/etest-pre1.mst 
 +        @echo Run the parser here. 
 +cs/etest-pre2.mst.conll: cs/etest-pre2.blind.conll cs/etest-pre2.mst 
 +        @echo Run the parser here. 
 +... 
 +en/etest-pre2.mst.conll: en/etest-pre2.blind.conll en/etest-pre2.mst 
 +        @echo Run the parser here. 
 +</code> 
 + 
 +  * The following parameters can be supplied, too: 
 +    * The ''.md.for'' parameter specifies in what dimensions the target file exists. (The other dimensions will not appear in the file name.) If there is no parameter ''.md.for'' the rule is generated for all known dimensions except the last one (''STATES'' in our case). 
 +    * The ''.md.fix'' parameter contains values that are fixed in this rule, i.e. the rule is not generated for other values of the same dimension. So far it is not allowed to include more values in one dimension (although in theory we may want to use it to constrain partial generation). 
 +      * If ''.md.fix'' contains a dimension that at the same time appears in ''.md.for'', it means that the target type exists in this dimension, has its value in its name/path but this particular rule generates this file only for one value of that dimension. 
 +      * If ''.md.fix'' contains a dimension that does not appear in ''.md.for'', it means that the target file type does not know this dimension and does not have it in its name/path but one of the source files knows the dimension and needs to know what value ve have on mind. We can figure out from the rules generating the source files what dimensions they exist in. 
 +      * If a source file requires a dimension not contained in the target file, and the dimension is not fixed, the rule will be generated for all values of this dimension. This means that there will be several competing rules for the same target file. 
 +      * Example: The rule defined above is intended for parsing, not training, so it should only operate on test conll files. We thus freeze the TRAINTEST dimension on the test value. 
 + 
 +<code>.MDRULE 
 +.md.rul: mst.conll < blind.conll mst 
 +.md.for: LANGUAGES DE PREPROCESSINGS 
 +.md.fix: test 
 +        @echo Run the parser here. 
 +</code> 
 + 
 +    * What are the constraints for the values in the respective dimensions. (Standard way is the ''.md.if'' directive but we would like to be able to constrain the ''.STATES'' dimension (or the last dimension in the list) directly in the rule. 
 +    * New variables ''$(*1)'' (or other number instead of 1, for n-th dependency) are available within the commands in the rule. MD-make finds the rule that creates this dependency, uses it to determine the set of dimensions of the dependency, constructs the name of the file and replaces the variable by the file name. MD-make leaves intact ''$<'' and ''$^'' that will still work in the generated makefile. However, don't use ''$*'' that does not make sense in MD-rules (unlike in normal pattern rules). 
 +    * ''.md.del'' removes dimensions from ''.md.for'' (handy if ''.md.for'' is not explicitly stated and contains all dimensions by default) 
 +    * ''.md.fxd'' combines ''.md.fix'' and ''.md.del''. Contains values, not dimensions (like ''.md.fix'' and unlike ''.md.del''
 +    * If a command in a rule refers a dimension (e.g. ''$(*LANGUAGES)'') the reference will be converted to the actual value of the dimension. If different source files have different values of the same dimension within one generated rule the reference will be replaced by the value that the target file takes in this dimension (i.e. by the variable value). Anyway the purpose of such references is to refer to the variable dimensions. Modified values of particular source files are fixed exceptions, we know them in advance and can write them to the command directly, if necessary. 
 + 
 +<code>.MDRULE 
 +.md.rul: mst.conll < blind.conll mst 
 +.md.dep$(TOOLDIR)/runmst.pl
 .md.for: LANGUAGES DE PREPROCESSINGS .md.for: LANGUAGES DE PREPROCESSINGS
 .md.fix: test .md.fix: test
Line 40: Line 80:
         $(TOOLDIR)/runmst.pl -m $(*2) < $< > $@</code>         $(TOOLDIR)/runmst.pl -m $(*2) < $< > $@</code>
  
-  * Je možné definovat vstupní souboryTy typicky leží úplně v&nbsp;jiné cestě, nebo se alespoň jmenují tak, aby se nepletly se soubory pojmenovanými pomocí hodnot rozměrů, nehrozilo tudíž jejich smazání makemMůžeme popsat jejich vlastnosti v&nbsp;jednotlivých rozměrech prostě tak, že vytvoříme obyčejné pravidlo, kde dotyčný vstupní soubor bude jako závislost, zatímco cíl bude soubor pojmenovaný příslušnými hodnotami rozměrůPřed pravidlo připíšeme ''.md.in:''. MD-make pak doplní příkaz pro zkopírování závislosti do cíle (''cp $< $@'') a navíc zkontroluje, že cílový soubor má hodnoty všech rozměrů, které soubor v&nbsp;daném stavu (hodnota posledního rozměrumá mít+  * It is possible to define input filesThese are typically in completely different folder or they names are different so that they are not confused with the files named according to dimension values and they are not in danger of being removed by makeWe can describe their properties in various dimensions if we create a normal rule where the input file is a dependency (source) and the goal is a file named by dimension valuesWe write ''.md.in:'' before such rule. MD-make adds a command to copy the source file to the target (''cp $< $@''and checks that the target file has values for all dimensions that file in its state (last dimension's valueought to have
-  * Vygenerovaný makefile by navíc mohl obsahovat pro každou hodnotu každého rozměru seznam souborů, v&nbsp;nichž je tato hodnota zafixovanáNapř. všechny cílové soubory v&nbsp;jazyce "hi"Kromě proměnné obsahující jména těchto souborů (HIFILES) by vygenerovaný makefile obsahoval cíl, který všechny tyto soubory vyrobí (hi)cíl, který je smaže (clean_hi). +  * The generated makefile could further contain for each value of each dimension the list of files for which this value is fixedFor instance, all target files in the language "hi"Besides the variable containing names of such files (HIFILES) there would also be a goal that creates/updates all such files (hi) and goal that removes them (clean_hi). 
-  * V&nbsp;průběhu generování vícerozměrných pravidel si pamatovat seznam všech vygenerovaných cílových souborůKe každému cílovému souboru vytvořit hash, jehož klíčem je hodnota libovolného rozměru hodnota u daného klíče je nenulová, jestliže příslušná hodnota rozměru je v&nbsp;názvu souboru obsaženaNa konci makefilu lze použít pravidlo ''.MDALL'', které vytvoří ''.PHONY'' cíl závisející na všech souborech obsahujících určité hodnotyNapř.+  * Collect names of all generated target files during generating of the multidimensional rulesCreate a hash for every target file where the key is the value in dimension and the value under that key is non-zero if the value of the dimension is contained in the name of the fileAt the end of the makefile there can be a rule ''.MDALL'' that creates a ''.PHONY'' goal depending on all files containing given values. E.g.
  
 <code>.MDALL: d hi conll</code> <code>.MDALL: d hi conll</code>
  
-se přepíše jako+would be rewritten as
  
 <code>.PHONY: all_d_hi_conll <code>.PHONY: all_d_hi_conll
-all_d_hi_conll: <seznam všech souborů obsahujících hodnoty "d", "hi" "conll"></code>+all_d_hi_conll: <list of all files containing values "d", "hi" and "conll"></code> 
 + 
 +===== Download ===== 
 + 
 +Copyright © 2009 Daniel Zeman 
 + 
 +All software supplied with this package is released under the GNU 
 +General Public License.  This program is free software; you can 
 +redistribute it and/or modify it under the terms of the GNU General 
 +Public License as published by the Free Software Foundation; either 
 +version 2, or (at your option) any later version. 
 + 
 +This program is distributed in the hope that it will be useful, 
 +but WITHOUT ANY WARRANTY; without even the implied warranty of 
 +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 +GNU General Public License (below or at http://www.gnu.org/licenses/ 
 +gpl.html) for more details. 
 + 
 +{{:user:zeman:mdmake.zip|mdmake.zip}} contains the script ''mdmake.pl'' (you need a Perl interpreter to use it), a sample multidimensional makefile and the normal makefile generated from it. 
 + 
 +===== Acknowledgements ===== 
 + 
 +This research has been supported by the grant of the Czech Ministry of Education no. MSM0021620838. 

[ Back to the navigation ] [ Back to the content ]