diff options
| author | aarne <aarne@cs.chalmers.se> | 2008-06-27 11:27:00 +0000 |
|---|---|---|
| committer | aarne <aarne@cs.chalmers.se> | 2008-06-27 11:27:00 +0000 |
| commit | e4e64c13a69db6505df499a0c3445ada9b1b2d88 (patch) | |
| tree | 28044f42fd5d30582a0478556b043f2363b6c9fb | |
| parent | 7cdbe8e7a34e1eda300f4756c4a66ff9be027368 (diff) | |
more rm in doc
| -rw-r--r-- | doc/gf2-highlights.html | 490 | ||||
| -rw-r--r-- | doc/gf2.2-highlights.html | 173 | ||||
| -rw-r--r-- | doc/gfcc.pdf | bin | 145566 -> 0 bytes | |||
| -rw-r--r-- | doc/grammars-and-types.txt | 56 | ||||
| -rw-r--r-- | doc/intro-resource.txt | 511 | ||||
| -rw-r--r-- | doc/multimodal.html | 863 | ||||
| -rw-r--r-- | doc/multimodal.txt | 728 |
7 files changed, 0 insertions, 2821 deletions
diff --git a/doc/gf2-highlights.html b/doc/gf2-highlights.html deleted file mode 100644 index 3d8a150a9..000000000 --- a/doc/gf2-highlights.html +++ /dev/null @@ -1,490 +0,0 @@ -<html> - -<body bgcolor="#FFFFFF" text="#000000"> - -<center> - -<h1>Grammatical Framework Version 2</h1> - -Highlights, versions 2.0, 2.1, and 2.2 (2.2 coming soon) - -<p> - -13/10/2003 - 25/11 - 2/4/2004 - 18/6 - 13/10 - 16/2/2005 - -<p> - -<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a> - -</center> - - -<h2>Syntax of GF</h2> - -An accurate <a href="DocGF.pdf">language specification</a> is now available. - - -<h2>Summary of novelties in Versions 2.0 to 2.2</h2> - -<h4>Module system</h4> - -<li> Separate modules for <tt>abstract</tt>, - <tt>concrete</tt>, and <tt>resource</tt>. -<li> Replaces the file-based <tt>include</tt> system -<li> Name space handling with qualified names -<li> Hierarchic structure (single inheritance <tt>**</tt>) + - cross-cutting reuse (<tt>open</tt>) -<li> Separate compilation, one module per file -<li> Reuse of <tt>abstract</tt>+<tt>concrete</tt> as <tt>resource</tt><br> - <b>Version 2.2</b>: separate <tt>reuse</tt> modules no longer needed -<li> Parametrized modules: - <tt>interface</tt>, <tt>instance</tt>, <tt>incomplete</tt>. -<li> New experimental module types: <tt>transfer</tt>, - <tt>union</tt>. -<li> Version 2.1: multiple inheritance in module extension. - -<h4>Canonical format GFC</h4> - -<li> The target of GF compiler; to reuse, just read in. -<li> Readable by Haskell/Java/C++/C applications. -<li> Version 2.1: Java interpreter available for GFC (by Björn Bringert). -<li> <b>Version 2.2</b>: new optimizations to reduce the size of GFC files - - -<h4>New features in expression language</h4> - -<li> Disjunctive patterns <tt>P | ... | Q</tt>. -<li> String patterns <tt>"foo"</tt>. -<li> Binding token <tt>&+</tt> to glue separate tokens at unlexing phase, - and unlexer to resolve this. -<li> New syntax alternatives for local definitions: <tt>let</tt> without - braces and <tt>where</tt>. -<li> Pattern variables can be used on lhs's of <tt>oper</tt> definitions. -<li> New Unicode transliterations (by Harad Hammarström). -<li> Version 2.1: Initial segments of integers - (<tt>Ints</tt><i>n</i>) available as parameter types. - - -<h4>New shell commands and command functionalities</h4> - -<li> <tt>pi</tt> = <tt>print_info</tt>: information on an identifier in scope. -<li> <tt>h</tt> = <tt>help</tt> now in long or short form, - and on individual commands. -<li> <tt>gt</tt> = <tt>generate_trees</tt>: all trees of a given - category or instantiations of a given incomplete term, up to a - given depth. -<li> <tt>gr</tt> = <tt>generate_random</tt> can now be given - an incomplete term as an argument, to constrain generation. -<li> <tt>so</tt> = <tt>show_opers</tt> shows all <tt>ope</tt> - operations with a given value type. -<li> <tt>pm</tt> = <tt>print_multi</tt> prints the multilingual - grammar resident in the current state to a ready-compiles - <tt>.gfcm</tt> file. -<li> <b>Version 2.2</b>: several new command options -<li> <b>Version 2.2</b>: <tt>vg</tt> visializes the module dependency graph -<li> All commands have both long and short names (see help). Short - names are easier to type, whereas long names - make scripts more readable. -<li> Meaningless command options generate warnings. - - -<h4>New editor features</h4> - -<li> Active text field: click the middle button in the focus to send - in refinement through the parser. -<li> Clipboard: copy complex terms into the refine menu. -<li> <b>Version 2.2</b>: text corresponding to subtrees with constraints marked with red colour - - -<h4>Improved implementation</h4> - -<li> Haskell source code is organized into subdirectories. -<li> BNF Converter is used for defining the languages GF and GFC, which also - give reliable LaTeX documentation. -<li> Lexical rules sorted out by option <tt>-cflexer</tt> for efficient - parsing with large lexica. -<li> GHC optimizations and strictness flags are used for improving performance. -<li> <b>Version 2.2</b>: started <a - href="http://www.haskell.org/haddock">haddock</a> documentation - by using uniform module headers - - - -<h4>New parser (work in progress)</h4> - -<li> By Peter Ljunglöf, based on MCFG. -<li> Much more efficient for morphology and discontinuous constituents. -<li> Treatment of cyclic rules. -<li> Version 2.1: improved generation of speech recognition - grammars (by Björn Bringert). -<li> Version 2.1: output of Labelled BNF files readable by the - BNF Converter. - - - - -<!-- NEW --> - -<h2>Abstract, concrete, and resource modules</h2> - -Judgement forms are sorted as follows: -<ul> -<li> abstract: - <tt>cat</tt>, <tt>fun</tt>, <tt>def</tt>, <tt>data</tt>, <tt>flags</tt> -<li> concrete: - <tt>lincat</tt>, <tt>cat</tt>, <tt>printname</tt>, <tt>flags</tt> -<li> resource: - <tt>param</tt>, <tt>oper</tt>, <tt>flags</tt> -<li> -</ul> -Example: -<pre> - abstract Sums = { - cat - Exp ; - fun - One : Exp ; - plus : Exp -> Exp -> Exp ; - } - - concrete EnglishSums of Sums = open ResEng in { - lincat - Exp = {s : Str ; n : Number} ; - lin - One = expSg "one" ; - sum x y = expSg ("the" ++ "sum" ++ "of" ++ x.s ++ "and" ++ y.s) ; - } - - resource ResEng = { - param - Number = Sg | Pl ; - oper - expSG : Str -> {s : Str ; n : Number} = \s -> {s = s ; n = Sg} ; - } -</pre> - - - -<!-- NEW --> - -<h2>Opening and extending modules</h2> - -A <tt>concrete</tt> or <tt>resource</tt> can <b>open</b> a -<tt>resource</tt>. This means that -<ul> -<li> the names defined in <tt>resource</tt> can be used ("become visible") -<li> but: these names are not included in ("exported from") the opening module -</ul> -A module of any type can moreover <b>extend</b> a module of the same type. -This means that -<ul> -<li> the names defined in the extended module can be used ("become visible") -<li> and also: these names are included in ("exported from") the extending module -</ul> -Examples of extension: -<pre> - abstract Products = Sums ** { - fun times : Exp -> Exp -> Exp ; - } - -- names exported: Exp, plus, times - - concrete English of Products = EnglishSums ** open ResEng in { - lin times x y = expSg ("the" ++ "product" ++ "of" ++ x.s ++ "and" ++ y.s) ; - } -</pre> - -<p> - -Opening, but not extension, can be <b>qualified</b>: -<pre> - concrete NumberSystems of Systems = open (Bin = Binary), (Dec = Decimal) in { - lin - BZero = Bin.Zero ; - DZero = Dec.Zero - } -</pre> - -<p> - -<b>Version 2.1</b> introduces <tt>multiple inheritance</tt>: a module -can extend several modules at the same time, for instance, -<pre> - abstract Dialogue = User, System ** { ...} -</pre> -may be used to put together "User's moves" and "System's moves" into -one Dialogue System grammar. - - - -<!-- NEW --> - -<h2>Compiling modules</h2> - -Separate compilation assumes there is <b>one module per file</b>. - -<p> - -The <b>module header</b> is the beginning of the module code up to the -first left bracket (<tt>{</tt>). The header gives -<ul> -<li> the module type: <tt>abstract</tt>, <tt>concrete</tt> (<tt>of</tt> <i>A</i>), - or <tt>resource</tt> -<li> the name of the module (next to the module type keyword) -<li> the names of extended modules (between <tt>=</tt> and <tt>**</tt>) -<li> the names of opened modules -</ul> - -<p> - -<b>filename</b> = <b>modulename</b> <tt>.</tt> <b>extension</b> - -<p> - -File name extensions: -<ul> -<li> <tt>gf</tt>: GF source file (uses GF syntax, is type checked and compiled) -<li> <tt>gfc</tt>: canonical GF file (uses GFC syntax, is simply read -in instead of compiled; produced from all kinds of modules) -<li> <tt>gfr</tt>: GF resource file (uses GF syntax, is only read in; produced from -<tt>resource</tt> modules) -<li> <tt>gfcm</tt>: canonical multilingual GF file -(uses GFC syntax, is only read in; produced -from a set of <tt>abstract</tt> and <tt>conctrete</tt> modules) -</ul> -Only <tt>gf</tt> files should ever be written/edited manually! - -<p> - -What the make facility does when compiling <tt>Foo.gf</tt> -<ol> -<li> read the module header of <tt>Foo.gf</tt>, and recursively all headers from -the modules it <b>depends</b> on (i.e. extends or opens) -<li> build a dependency graph of these modules, and do topological sorting -<li> starting from the first module in topological order, -compare the modification times of each <tt>gf</tt> and <tt>gfc</tt> file: -<ul> -<li> if <tt>gf</tt> is later, compile the module and all modules depending on it -<li> if <tt>gfc</tt> is later, just read in the module -</ul> -</ol> -Inside the GF shell, also time stamps of modules read into memory are -taken into account. Thus a module need not be read from a file if the -module is in the memory and the file has not been modified. - -<p> - -If the compilation of a grammar fails at some module, the state of the -GF shell contains all modules read up to that point. This makes it -faster to compile the faulty module again after fixing it. - -<p> - -Use the command <tt>po</tt> = <tt>print_options</tt> to see what -modules are in the state. - -<p> - -To force compilation: -<ul> -<li> The flag <i>-src</i> in the import command forces compilation from - source even if more recent object files exist. This is useful - when testing new versions of GF. -<li> The flag <i>-retain</i> in the import command forces reading in - <tt>gfr</tt> files in addition to <tt>gfc</tt> files. This is useful - when testing operations with the <tt>cc</tt> command. -</ul> - -<!-- NEW --> - -<h3>Compiler optimizations</h3> - -<b>Version 2.2</b> - -<p> - -The sometimes exploding size of generated <tt>gfc</tt> and -<tt>gfr</tt> files has made it urgent to find optimizations -that reduce the size of the code. There are five -combinations optimizations that can be chosen, as the value of the -<tt>optimize</tt> flag: -<ul> -<li> <tt>share</tt>: group tables so that common branch values are shared -by the use of disjunctive patterns. -<li> <tt>parametrize</tt>: if table branches differ at most at the -occurrence of the pattern, replace the expanded table by a one-branch -table with a variable. If this fails, perform <tt>share</tt>. -<li> <tt>values</tt>: only show the values of table branches, not the -patterns. -<li> <tt>all</tt>: try <tt>parametrize</tt>; if this fails, do <tt>values</tt>. -<li> <tt>none</tt>: don't do any optimizations -</ul> -The <tt>share</tt> and <tt>parametrize</tt> optimizations are always -just good, whereas the <tt>values</tt> optimization may slow down the -use of the table. However, it is very good for grammars mostly consisting -of the inflection tables of lexical items: it can reduce the file size -by the factor of 4. - -<p> - -An optimization can be selected individually for each -<tt>resource</tt> and <tt>concrete</tt> module by including -the judgement -<pre> - flags optimize=(share|parametrize|values|all|none) ; -</pre> -in the module body. These flags can be overridden by a flag given -in the <tt>i</tt> command, e.g. -<pre> - i -src -optimize=none Foo.gf -</pre> -Notice that the option <tt>-src</tt> is needed if there already are -generated files created with other optimization flags. - - - -<!-- NEW --> - -<h2>Module search paths</h2> - -Modules can reside in different directories. Use the <tt>path</tt> -flag to extend the directory search path. For instance, -<pre> - -path=.:../resource/russian:../prelude -</pre> -enables files to be found in three different directories. -By default, only the current directory is included. -If a <tt>path</tt> flag is given, the current directory -<tt>.</tt> must be explicitly included if it is wanted. - -<p> - -The <tt>path</tt> flag can be set in any of the following -places: -<ul> -<li> when invoking GF: <tt>gf -path=xxx</tt> -<li> when importing a module: <tt>i -path=xxx Foo.gf</tt> -<li> as a pragma in a topmost file: <tt>--# -path=xxx</tt> -</ul> -A flag set on a command line overrides ones set in files. - - -<!-- NEW --> - -<h2>How to use GF 1.* files</h2> - -Backward compatibility with respect to old GF grammars has been -a central goal. All GF grammars, from version 0.9, should work in -the old way in GF2. The main exceptions are some features that -are rarely used. -<ul> -<li> The <tt>package</tt> system introduced in GF 1.2, cannot be - interpreted in the module system of GF 2.0, since packages are in - mutual scope with the top level. -<li> <tt>tokenizer</tt> pragmas are cannot be parsed any more. In GF - 1.2, they are already replaced by <tt>lexer</tt> flags. -<li> <tt>var</tt> pragmas cannot be parsed any more. -</ul> - -<p> - -Very old GF grammars (from versions before 0.9), with the completely -different notation, do not work. They should be first converted to -GF1 by using GF version 1.2. - -<p> - -The import command <tt>i</tt> can be given the option <tt>-old</tt>. E.g. -<pre> - i -old tut1.Eng.g2 -</pre> -But this is no more necessary: GF2 detects automatically if a grammar -is in the GF1 format. - -<p> - -Importing a set of GF2 files generates, internally, three modules: -<pre> - abstract tut1 = ... - resource ResEng = ... - concrete Eng of tut1 = open ResEng in ... -</pre> -(The names are different if the file name has fewer parts.) - - -<p> - -The option <tt>-o</tt> causes GF2 to write these modules into files. - -<p> - -The flags <tt>-abs</tt>, <tt>-cnc</tt>, and <tt>-res</tt> can be used -to give custom names to the modules. In particular, it is good to use -the <tt>-abs</tt> flag to guarantee that the abstract syntax module -has the same name for all grammars in a multilingual environmens: -<pre> - i -old -abs=Numerals hungarian.gf - i -old -abs=Numerals tamil.gf - i -old -abs=Numerals sanskrit.gf -</pre> - -<p> - -The same flags as in the import command can be used when invoking -GF2 from the system shell. Many grammars can be imported on the same command -line, e.g. -<pre> - % gf2 -old -abs=Tutorial tut1.Eng.gf tut1.Fin.gf tut1.Fra.gf -</pre> - -<p> - -To write a GF2 grammar back to GF1 (as one big file), use the command -<pre> - > pg -old -</pre> - - -<p> - - -GF2 has more reserved words than GF 1.2. When old files are read, a preprocessor -replaces every identifier that has the shape of a new reserved word -with a variant where the last letter is replaced by <tt>Z</tt>, e.g. -<tt>instance</tt> is replaced by <tt>instancZ</tt>. This method is of course -unsafe and should be replaced by something better. - - -<!-- NEW --> - -<h2>Missing features of GF 1.2 (13/10/2004)</h2> - -Generally, GF1 grammars can be automatically translated to GF2, although the -result is not as good -as manual, since indentation and comments are destroyed. -The results can be -saved in GF2 files, but this is not necessary. -Some rarely used GF1 features are no longer supported (see next section). -It is also possible to write a GF2 grammar back to GF1, with the -command <tt>pg -printer=old</tt>. - - -<p> - -Resource libraries -and some example grammars have been -converted. Most old example grammars work without any changes. -However, there is a new resource API with -many new constructions, and which is recommended. - -<p> - -Soundness checking of module depencencies and completeness is not -complete. This means that some errors may show up too late. - -<p> - -Latex and XML printing of grammars do not work yet. - - -</body> -</html> diff --git a/doc/gf2.2-highlights.html b/doc/gf2.2-highlights.html deleted file mode 100644 index 58ccd5256..000000000 --- a/doc/gf2.2-highlights.html +++ /dev/null @@ -1,173 +0,0 @@ -<html> - -<body bgcolor="#FFFFFF" text="#000000"> - -<center> - -<h1>Grammatical Framework Version 2.2</h1> - -Highlights of GF version 2.2. - -<p> - -9/5/2005 - -<p> - -<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a> - -</center> - - -<h2>Summary of novelties in Version 2.2 in comparison to 2.1</h2> - -<li> New optimizations to reduce the size of GFC files -<li> Improved parsing algorithms -<li> Lots of bug fixes -<li> Separate <tt>reuse</tt> modules no longer needed -<li> Several new command options -<li> New documentation: - <ul> - <li> <a href="gf-modules.html">module system document</tt> - <li> <a href="tutorial/gf-tutorial2.html">new tutorial</a>, based on the module system (unfinished) - </ul> -<li> New resource libraries -<li> New example grammars -<li> Visualization of module dependency graph -<li> In the editor GUI, text corresponding to subtrees with constraints marked with red colour -<li> Hierarchic modules used in the source code -<li> <a href="http://www.haskell.org/haddock">haddock</a> documentation available for source code -<li> Optimizations to reduce GF's memory footprint when using large grammars. -<li> The <tt>pm</tt> command can now convert identifiers in the grammar to UTF-8. - - -<h3>Compiler optimizations</h3> - -The sometimes exploding size of generated <tt>gfc</tt> and -<tt>gfr</tt> files has made it urgent to find optimizations -that reduce the size of the code. There are five -combinations optimizations that can be chosen, as the value of the -<tt>optimize</tt> flag: -<ul> -<li> <tt>share</tt>: group tables so that common branch values are shared -by the use of disjunctive patterns. -<li> <tt>parametrize</tt>: if table branches differ at most at the -occurrence of the pattern, replace the expanded table by a one-branch -table with a variable. If this fails, perform <tt>share</tt>. -<li> <tt>values</tt>: only show the values of table branches, not the -patterns. -<li> <tt>all</tt>: try <tt>parametrize</tt>; if this fails, do <tt>values</tt>. -<li> <tt>none</tt>: don't do any optimizations -</ul> -The <tt>share</tt> and <tt>parametrize</tt> optimizations are always -just good, whereas the <tt>values</tt> optimization may slow down the -use of the table. However, it is very good for grammars mostly consisting -of the inflection tables of lexical items: it can reduce the file size -by the factor of 4. - -<p> - -An optimization can be selected individually for each -<tt>resource</tt> and <tt>concrete</tt> module by including -the judgement -<pre> - flags optimize=(share|parametrize|values|all|none) ; -</pre> -in the module body. These flags can be overridden by a flag given -in the <tt>i</tt> command, e.g. -<pre> - i -src -optimize=none Foo.gf -</pre> -Notice that the option <tt>-src</tt> is needed if there already are -generated files created with other optimization flags. - -<p> - -<b>Important notice</b>: If you use the -<a href="http://www.cs.chalmers.se/~bringert/gf/gf-java.html"> -Embedded GF Interpreter</a>, -or the improved parsing algorithms described below, -only the values <tt>none</tt>, -<tt>share</tt> and <tt>values</tt> can be used; the stronger optimizations are not -supported yet. -Also note that currently, GF aborts and reports an error if the stronger optimizations are used -when creating the grammar for the Embedded GF Interpreter, or when trying to parse. - - -<h3>Improved parsing algorithms</h3> - -We have implemented some of the suggested parsing algorithms described in -Peter Ljunglöf's <a href="http://www.cs.chalmers.se/~peb/pubs.html">PhD thesis</a>. -So now there are the following options for parsing: -<ul> - <li>The default parser. It uses a (possibly) very overgenerating context-free grammar, and filters the resulting parse trees by type-checking. - <li>The <tt>-cfg</tt> flag. It uses a much less overgenerating context-free grammar, and filters as above. - <li>The <tt>-mcfg</tt> flag. It uses an even less overgenerating <em>multiple context-free grammar</em>. - If the abstract syntax is context-free, meaning that there are no dependent types and only first-order functions, - the trees do not have to be filtered at all. -</ul> -The option <tt>-parser=X</tt> selects the parsing strategy. The default parser has the strategies -<tt>chart</tt>, <tt>bottomup</tt>, <tt>topdown</tt>, <tt>old</tt>, with the first one being the default. -The <tt>-cfg</tt> and <tt>-mcfg</tt> parsers only recognize the <tt>bottomup</tt> and <tt>topdown</tt> strategies. - -<p> - -<b>Note</b> that the <tt>-cfg</tt> and <tt>-mcfg</tt> parsers can take a very long time on their first call, since -they have to convert the GF grammar. This will only happen once in a GF run, provided the GF files are not changed. - -<p> - -<b>Tips</b> for choosing the best parser for your grammar. Try with the default parser; if it is too slow, try the other two. -Remember that the first time you parse they will be very slow, since they have to build parsing information. -the <tt>-cfg</tt> parser is best on grammars with many parameters and inflection tables, and -The <tt>-mcfg</tt> parser is even better when the grammar also has discontinuous constituents. - -<p> - -Here is a small example from the resource library: -<pre> -> i -src -optimize=share lib/resource/english/LangEng.gf -> p -cat=S "" -> p -cat=S -cfg "" -> p -cat=S -mcfg "" -{Comment: Just some dummy parsing calls to calculate the parsing information} - -> p -cat=S -rawtrees=200000 "you will be running" -{Comment: Nr of unfiltered trees: 169296 -- 99,996% av the trees are ill-typed} - -UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV AAnter run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV ASimul run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV AAnter run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV ASimul run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV AAnter run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV ASimul run_V)) - -17730 msec - -> p -cat=S -cfg "you will be running" -{Comment: Nr of unfiltered trees: 246 -- 97,5% of the trees are ill-typed} - -UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV AAnter run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV ASimul run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV AAnter run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV ASimul run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV AAnter run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV ASimul run_V)) - -1580 msec - -> p -cat=S -mcfg "you will be running" -{Comment: Nr of unfiltered trees: 6 -- all trees are type-corrent} - -UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV AAnter run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV ASimul run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV AAnter run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV ASimul run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV AAnter run_V)) -UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV ASimul run_V)) - -470 msec -</pre> - -</body> -</html> diff --git a/doc/gfcc.pdf b/doc/gfcc.pdf Binary files differdeleted file mode 100644 index 9d7b2193f..000000000 --- a/doc/gfcc.pdf +++ /dev/null diff --git a/doc/grammars-and-types.txt b/doc/grammars-and-types.txt deleted file mode 100644 index 27667589d..000000000 --- a/doc/grammars-and-types.txt +++ /dev/null @@ -1,56 +0,0 @@ -Grammars and Types - -==Historical introduction== - -Stoics ? - -Port-Royal ? - -Lyons - -Frege - -Ajdukiewicz - -Bar-Hillel - -Lambek - -Curry - -Montague - -PATR, HPSG - -LFG - -GF - -ACG, HOG - - -==Syntactic and semantic grammars== - -in GF - -==Cross-linguistic types== - -generalizations over type systems, parametrized modules - - -==Grammatical concepts formalized== - -POS, category - -inherent and parametric features - -agreement - -rection - -endocentric and exocentric concepts - -(see Lyons and Jespersen for more) - -a core syntax (latin.gf) - diff --git a/doc/intro-resource.txt b/doc/intro-resource.txt deleted file mode 100644 index c4c292fca..000000000 --- a/doc/intro-resource.txt +++ /dev/null @@ -1,511 +0,0 @@ - - -==Coverage== - -The GF Resource Grammar Library contains grammar rules for -10 languages (in addition, 2 languages are available as incomplete -implementations, and a few more are under construction). Its purpose -is to make these rules available for application programmers, -who can thereby concentrate on the semantic and stylistic -aspects of their grammars, without having to think about -grammaticality. The targeted level of application grammarians -is that of a skilled programmer with -a practical knowledge of the target languages, but without -theoretical knowledge about their grammars. -Such a combination of -skills is typical of programmers who, for instance, want to localize -software to new languages. - -The current resource languages are -- ``Ara``bic (incomplete) -- ``Cat``alan (incomplete) -- ``Dan``ish -- ``Eng``lish -- ``Fin``nish -- ``Fre``nch -- ``Ger``man -- ``Ita``lian -- ``Nor``wegian -- ``Rus``sian -- ``Spa``nish -- ``Swe``dish - - -The first three letters (``Eng`` etc) are used in grammar module names. -The incomplete Arabic and Catalan implementations are -enough to be used in many applications; they both contain, amoung other -things, complete inflectional morphology. - - - -==A first example== - -To give an example application, consider a system for steering -music playing devices by voice commands. In the application, -we may have a semantical category ``Kind``, examples -of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song`` -is linearized into the noun "Lied", but knowing this is not -enough to make the application work, because the noun must be -produced in both singular and plural, and in four different -cases. By using the resource grammar library, it is enough to -write -``` - lin Song = mkN "Lied" "Lieder" neuter -``` -and the eight forms are correctly generated. The resource grammar -library contains a complete set of inflectional paradigms (such as -``mkN`` here), enabling the definition of any lexical items. - -The resource grammar library is not only about inflectional paradigms - it -also has syntax rules. The music player application -might also want to modify songs with properties, such as "American", -"old", "good". The German grammar for adjectival modifications is -particularly complex, because adjectives have to agree in gender, -number, and case, and also depend on what determiner is used -("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this -variation is taken care of by the resource grammar function -``` - mkCN : AP -> CN -> CN -``` -(see the table in the end of this document for the list of all resource grammar -functions). The resource grammar implementation of the rule adding properties -to kinds is -``` - lin PropKind kind prop = mkCN prop kind -``` -given that -``` - lincat Prop = AP - lincat Kind = CN -``` -The resource library API is devided into language-specific -and language-independent parts. To put it roughly, -- the lexicon API is language-specific -- the syntax API is language-independent - - -Thus, to render the above example in French instead of German, we need to -pick a different linearization of ``Song``, -``` - lin Song = mkN "chanson" feminine -``` -But to linearize ``PropKind``, we can use the very same rule as in German. -The resource function ``mkCN`` has different implementations in the two -languages (e.g. a different word order in French), -but the application programmer need not care about the difference. - - - -==Note on APIs== - -From version 1.1 onwards, the resource library is available via two -APIs: -- original ``fun`` and ``oper`` definitions -- overloaded ``oper`` definitions - - -Introducing overloading in GF version 2.7 has been a success in improving -the accessibility of libraries. It has also created a layer of abstraction -between the writers and users of libraries, and thereby makes the library -easier to modify. We shall therefore use the overloaded API -in this document. The original function names are mainly interesting -for those who want to write or modify libraries. - - - -==A complete example== - -To summarize the example, and also give a template for a programmer to work on, -here is the complete implementation of a small system with songs and properties. -The abstract syntax defines a "domain ontology": -``` - abstract Music = { - - cat - Kind, - Property ; - fun - PropKind : Kind -> Property -> Kind ; - Song : Kind ; - American : Property ; - } -``` -The concrete syntax is defined by a functor (parametrized module), -independently of language, by opening -two interfaces: the resource ``Syntax`` and an application lexicon. -``` - incomplete concrete MusicI of Music = - open Syntax, MusicLex in { - lincat - Kind = CN ; - Property = AP ; - lin - PropKind k p = mkCN p k ; - Song = mkCN song_N ; - American = mkAP american_A ; - } -``` -The application lexicon ``MusicLex`` is an interface -opening the resource category system ``Cat``. -``` - interface MusicLex = Cat ** { - oper - song_N : N ; - american_A : A ; - } -``` -It could also be an abstract syntax that extends ``Cat``, but -this would limit the kind of constructions that are possible in -the interface - -Each language has its own concrete syntax, which opens the -inflectional paradigms module for that language: -``` - interface MusicLexGer of MusicLex = - CatGer ** open ParadigmsGer in { - oper - song_N = mkN "Lied" "Lieder" neuter ; - american_A = mkA "amerikanisch" ; - } - - interface MusicLexFre of MusicLex = - CatFre ** open ParadigmsFre in { - oper - song_N = mkN "chanson" feminine ; - american_A = mkA "américain" ; - } -``` -The top-level ``Music`` grammars are obtained by -instantiating the two interfaces of ``MusicI``: -``` - concrete MusicGer of Music = MusicI with - (Syntax = SyntaxGer), - (MusicLex = MusicLexGer) ; - - concrete MusicFre of Music = MusicI with - (Syntax = SyntaxFre), - (MusicLex = MusicLexFre) ; -``` -Both of these files can use the same ``path``, defined as -``` - --# -path=.:present:prelude -``` -The ``present`` category contains the compiled resources, restricted to -present tense; ``alltenses`` has the full resources. - -To localize the music player system to a new language, -all that is needed is two modules, -one implementing ``MusicLex`` and the other -instantiating ``Music``. The latter is -completely trivial, whereas the former one involves the choice of correct -vocabulary and inflectional paradigms. For instance, Finnish is added as follows: -``` - instance MusicLexFin of MusicLex = - CatFin ** open ParadigmsFin in { - oper - song_N = mkN "kappale" ; - american_A = mkA "amerikkalainen" ; - } - - concrete MusicFin of Music = MusicI with - (Syntax = SyntaxFin), - (MusicLex = MusicLexFin) ; -``` -More work is of course needed if the language-independent linearizations in -MusicI are not satisfactory for some language. The resource grammar guarantees -that the linearizations are possible in all languages, in the sense of grammatical, -but they might of course be inadequate for stylistic reasons. Assume, -for the sake of argument, that adjectival modification does not sound good in -English, but that a relative clause would be preferrable. One can then use -restricted inheritance of the functor: -``` - concrete MusicEng of Music = - MusicI - [PropKind] - with - (Syntax = SyntaxEng), - (MusicLex = MusicLexEng) ** - open SyntaxEng in { - lin - PropKind k p = mkCN k (mkRS (mkRCl which_RP (mkVP p))) ; - } -``` -The lexicon is as expected: -``` - instance MusicLexEng of MusicLex = - CatEng ** open ParadigmsEng in { - oper - song_N = mkN "song" ; - american_A = mkA "American" ; - } -``` - - -==Lock fields== - -//This section is only relevant as a guide to error messages that have to do with lock fields, and can be skipped otherwise.// - -FIXME: this section may become obsolete. - -When the categories of the resource grammar are used -in applications, a **lock field** is added to their linearization types. -The lock field for a category ``C`` is a record field -``` - lock_C : {} -``` -with the only possible value -``` - lock_C = <> -``` -The lock field carries no information, but its presence -makes the linearization type of ``C`` -unique, so that categories -with the same implementation are not confused with each other. -(This is inspired by the ``newtype`` discipline in Haskell.) - -For example, the lincats of adverbs and conjunctions are the same -in ``CatEng`` (and therefore in ``GrammarEng``, which inherits it): -``` - lincat Adv = {s : Str} ; - lincat Conj = {s : Str} ; -``` -But when these category symbols are used to denote their linearization -types in an application, these definitions are translated to -``` - oper Adv : Type = {s : Str ; lock_Adv : {}} ; - oper Conj : Type = {s : Str} ; lock_Conj : {}} ; -``` -In this way, the user of a resource grammar cannot confuse adverbs with -conjunctions. In other words, the lock fields force the type checker -to function as grammaticality checker. - -When the resource grammar is ``open``ed in an application grammar, -and only functions from the resource are used in type-correct way, the -lock fields are never seen (except possibly in type error messages). -If an application grammarian has to write lock fields herself, -it is a sign that the guarantees given by the resource grammar -no longer hold. But since the resource may be incomplete, the -application grammarian may occasionally have to provide the dummy -values of lock fields (always ``<>``, the empty record). -Here is an example: -``` - mkUtt : Str -> Utt ; - mkUtt s = {s = s ; lock_Utt = <>} ; -``` -Currently, missing lock field produce warnings rather than errors, -but this behaviour of GF may change in future. - - -==Parsing with resource grammars?== - -The intended use of the resource grammar is as a library for writing -application grammars. It is not designed for parsing e.g. newspaper text. There -are several reasons why this is not practical: -- Efficiency: the resource grammar uses complex data structures, in -particular, discontinuous constituents, which make parsing slow and the -parser size huge. -- Completeness: the resource grammar does not necessarily cover all rules -of the language - only enough many to be able to express everything -in one way or another. -- Lexicon: the resource grammar has a very small lexicon, only meant for test -purposes. -- Semantics: the resource grammar has very little semantic control, and may -accept strange input or deliver strange interpretations. -- Ambiguity: parsing in the resource grammar may return lots of results many -of which are implausible. - - -All of these problems should be solved in application grammars. -The task of resource grammars is just to take care of low-level linguistic -details such as inflection, agreement, and word order. - -It is for the same reasons that resource grammars are not adequate for translation. -That the syntax API is implemented for different languages of course makes -it possible to translate via it - but there is no guarantee of translation -equivalence. Of course, the use of functor implementations such as ``MusicI`` -above only extends to those cases where the syntax API does give translation -equivalence - but this must be seen as a limiting case, and bigger applications -will often use only restricted inheritance of ``MusicI``. - - - -=To find rules in the resource grammar library= - -==Inflection paradigms== - -Inflection paradigms are defined separately for each language //L// -in the module ``Paradigms``//L//. To test them, the command -``cc`` (= ``compute_concrete``) -can be used: -``` - > i -retain german/ParadigmsGer.gf - - > cc mkN "Schlange" - { - s : Number => Case => Str = table Number { - Sg => table Case { - Nom => "Schlange" ; - Acc => "Schlange" ; - Dat => "Schlange" ; - Gen => "Schlange" - } ; - Pl => table Case { - Nom => "Schlangen" ; - Acc => "Schlangen" ; - Dat => "Schlangen" ; - Gen => "Schlangen" - } - } ; - g : Gender = Fem - } -``` -For the sake of convenience, every language implements these five paradigms: -``` - oper - mkN : Str -> N ; -- regular nouns - mkA : Str -> A : -- regular adjectives - mkV : Str -> V ; -- regular verbs - mkPN : Str -> PN ; -- regular proper names - mkV2 : V -> V2 ; -- direct transitive verbs -``` -It is often possible to initialize a lexicon by just using these functions, -and later revise it by using the more involved paradigms. For instance, in -German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a -Masculine noun with the plural form ``"Liede"``. -The individual ``Paradigms`` modules -tell what cases are covered by the regular heuristics. - -As a limiting case, one could even initialize the lexicon for a new language -by copying the English (or some other already existing) lexicon. This would -produce language with correct grammar but with content words directly borrowed from -English - maybe not so strange in certain technical domains. - - - -==Syntax rules== - -Syntax rules should be looked for in the module ``Constructors``. -Below this top-level module exposing overloaded constructors, -there are around 10 abstract modules, each defining constructors for -a group of one or more related categories. For instance, the module -``Noun`` defines how to construct common nouns, noun phrases, and determiners. -But these special modules are seldom or never needed by the users of the library. - -TODO: when are they needed? - -Browsing the libraries is helped by the gfdoc-generated HTML pages, -whose LaTeX versions are included in the present document. - - -==Special-purpose APIs== - -To give an analogy with the well-known type setting software, GF can be compared -with TeX and the resource grammar library with LaTeX. -Just like TeX frees the author -from thinking about low-level problems of page layout, so GF frees the grammarian -from writing parsing and generation algorithms. But quite a lot of knowledge of -//how// to write grammars is still needed, and the resource grammar library helps -GF grammarians in a way similar to how the LaTeX macro package helps TeX authors. - -But even LaTeX is often too detailed and low-level, and users are encouraged to -develop their own macro packages. The same applies to GF resource grammars: -the application grammarian might not need all the choices that the resource -provides, but would prefer less writing and higher-level programming. -To this end, application grammarians may want to write their own views on the -resource grammar. - - -==Browsing by the parser== - -A method alternative to browsing library documentation is -to use the parser. -Even though parsing is not an intended end-user application -of resource grammars, it is a useful technique for application grammarians -to browse the library. To find out which resource function implements -a particular structure, one can just parse a string that exemplifies this -structure. For instance, to find out how sentences are built using -transitive verbs, write -``` - > i english/LangEng.gf - - > p -cat=Cl "she loves him" - PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron)) -``` -The parser returns original constructors, not overloaded ones. Overloaded -constructors can be returned, so far with experimental heuristics, by using -the grammar ``api/toplevel/OverLangEng.gf`` and a special flag: -``` - > i api/toplevel/OverLangEng.gf - - > p -cat=Cl -overload "she loves him" - mkCl (mkNP she_Pron) love_V2 (mkNP he_Pron) -``` -Parsing with the English resource grammar has an acceptable speed, but -with most languages it takes just too much resources even to build the -parser. However, examples parsed in one language can always be linearized into -other languages: -``` - > i italian/LangIta.gf - - > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron)) - lo ama -``` -Therefore, one can use the English parser to write an Italian grammar, and also -to write a language-independent (incomplete) grammar. One can also parse strings -that are bizarre in English but the intended way of expression in another language. -For instance, the phrase for "I am hungry" in Italian is literally "I have hunger". -This can be built by parsing "I have beer" in ``OverLangEng`` and then writing -``` - lin IamHungry = - let beer_N = mkN "fame" feminine - in - mkCl (mkNP i_Pron) have_V2 (mkNP massQuant beer_N) -``` -which uses ``ParadigmsIta.mkN``. - - - -==Example-based grammar writing== - -The technique of parsing with the resource grammar can be used in GF source files, -endowed with the suffix ``.gfe`` ("GF examples"). The suffix tells GF to preprocess -the file by replacing all expressions of the form -``` - in Module.Cat "example string" -``` -by the syntax trees obtained by parsing "example string" in ``Cat`` in ``Module``. -For instance, -``` - lin IamHungry = - let beer_N = mkN "fame" feminine - in - (in LangEng.Cl "I have beer") ; -``` -will result in the rule displayed in the previous section. The normal binding rules -of functional programming (and GF) guarantee that local bindings of identifiers -take precedence over constants of the same forms. Thus it is also possible to -linearize functions taking arguments in this way: -``` - lin - PropKind car_N old_A = in LangEng.CN "old car" ; -``` -However, the technique of example-based grammar writing has some limitations: -- Ambiguity. If a string has several parses, the first one is returned, and -it may not be the intended one. The other parses are shown in a comment, from -where they must/can be picked manually. -- Lexicality. The arguments of a function must be atomic identifiers, and are thus -not available for categories that have no lexical items. -For instance, the ``PropKind`` rule above gives the result -``` - lin - PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ; -``` -However, it is possible to write a special lexicon that gives atomic rules for -all those categories that can be used as arguments, for instance, -``` - fun - cat_CN : CN ; - old_AP : AP ; -``` -and then use this lexicon instead of the standard one included in ``Lang``. - - diff --git a/doc/multimodal.html b/doc/multimodal.html deleted file mode 100644 index 9f2b43902..000000000 --- a/doc/multimodal.html +++ /dev/null @@ -1,863 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> -<HTML> -<HEAD> -<META NAME="generator" CONTENT="http://txt2tags.sf.net"> -<TITLE>Demonstrative Expressions and Multimodal Grammars</TITLE> -</HEAD><BODY BGCOLOR="white" TEXT="black"> -<P ALIGN="center"><CENTER><H1>Demonstrative Expressions and Multimodal Grammars</H1> -<FONT SIZE="4"> -<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR> -Last update: Mon Jan 9 20:29:45 2006 -</FONT></CENTER> - -<P></P> -<HR NOSHADE SIZE=1> -<P></P> - <UL> - <LI><A HREF="#toc1">Abstract</A> - <LI><A HREF="#toc2">Multimodal grammars</A> - <UL> - <LI><A HREF="#toc3">Representing demonstratives in semantics and grammar</A> - <LI><A HREF="#toc4">Asynchronous syntax in GF</A> - <LI><A HREF="#toc5">Example multimodal grammar: abstract syntax</A> - <LI><A HREF="#toc6">Digression: discontinuous constituents</A> - <LI><A HREF="#toc7">From grammars to dialogue systems</A> - </UL> - <LI><A HREF="#toc8">Adding multimodality to a unimodal grammar</A> - <UL> - <LI><A HREF="#toc9">The multimodal conversion</A> - <LI><A HREF="#toc10">An example of the conversion</A> - <LI><A HREF="#toc11">Multimodal conversion combinators</A> - </UL> - <LI><A HREF="#toc12">Multimodal resource grammars</A> - <UL> - <LI><A HREF="#toc13">Resource grammar API</A> - <LI><A HREF="#toc14">Multimodal API: functions for building demonstratives</A> - <LI><A HREF="#toc15">Multimodal API: functions for building sentences and phrases</A> - <LI><A HREF="#toc16">Language-independent implementation: examples</A> - <LI><A HREF="#toc17">Multimodal API: interface to unimodal expressions</A> - <LI><A HREF="#toc18">Instantiating multimodality to different languages</A> - <LI><A HREF="#toc19">Language-independent reimplementation of TramDemo</A> - <LI><A HREF="#toc20">The order problem</A> - <LI><A HREF="#toc21">A recipe for using the resource library</A> - </UL> - </UL> - -<P></P> -<HR NOSHADE SIZE=1> -<P></P> -<A NAME="toc1"></A> -<H2>Abstract</H2> -<P> -This document shows a method to write grammars -in which spoken utterances are accompanied by -pointing gestures. A computer application of such -grammars are <B>multimodal dialogue systems</B>, in -which the pointing gestures are performed by -mouse clicks and movements. -</P> -<P> -After an introduction to the notions of -<B>demonstratives</B> and <B>integrated multimodality</B>, -we will show by a concrete example -how multimodal grammars can be written in GF -and how they can be used in dialogue systems. -The explanation is given in three stages: -</P> -<OL> -<LI>How to write a multimodal grammar by hand. -<LI>How to add multimodality to a unimodal grammar. -<LI>How to use a multimodal resource grammar. -</OL> - -<A NAME="toc2"></A> -<H2>Multimodal grammars</H2> -<P> -<B>Demonstrative expressions</B> are an old idea. Such -expressions get their meaning from the context. -</P> - <BLOCKQUOTE> - <I>This train</I> is faster than <I>that airplane</I>. - </BLOCKQUOTE> -<P></P> - <BLOCKQUOTE> - I want to go from <I>this place</I> to <I>this place</I>. - </BLOCKQUOTE> -<P></P> -<P> -In particular, as in these examples, the meaning -can be obtained from accompanying pointing gestures. -</P> -<P> -Thus the meaning-bearing unit is neither the words nor the -gestures alone, but their combination. Demonstratives -thus provide an example of <B>integrated multimodality</B>, -as opposed to parallel multimodality. In parallel -multimodality, speech and other modes of communication -are just alternative ways to convey the same information. -</P> -<A NAME="toc3"></A> -<H3>Representing demonstratives in semantics and grammar</H3> -<P> -When formalizing the semantics of demonstratives, we can combine syntax with coordinates: -</P> - <BLOCKQUOTE> - I want to go from this place to this place - </BLOCKQUOTE> -<P></P> -<P> -is interpreted as something like -</P> -<PRE> - want(I, go, this(place,(123,45)), this(place,(98,10))) -</PRE> -<P> -Now, the same semantic value can be given in many ways, by performing -the clicks at different points of time in relation to the speech: -</P> - <BLOCKQUOTE> - I want to go from this place CLICK(123,45) to this place CLICK(98,10) - </BLOCKQUOTE> -<P></P> - <BLOCKQUOTE> - I want to go from this place to this place CLICK(123,45) CLICK(98,10) - </BLOCKQUOTE> -<P></P> - <BLOCKQUOTE> - CLICK(123,45) CLICK(98,10) I want to go from this place to this place - </BLOCKQUOTE> -<P></P> -<P> -How do we build the value compositionally in parsing? -Traditional parsing is sequential: its input is a string of tokens. -It works for demonstratives only if the pointing is adjacent to -the spoken expression. In the actual input, the demonstrative word -can be separated from the accompanying click by other words. The two -can also be simultaneous. -</P> -<A NAME="toc4"></A> -<H3>Asynchronous syntax in GF</H3> -<P> -What we need is a notion of <B>asynchronous parsing</B>, as opposed to -sequential parsing (where demonstrative words and clicks must be -adjacent). -</P> -<P> -We can implement asynchronous parsin in GF by exploiting the generality -of <B>linearization types</B>. A linearization type is the type of -the <B>concrete syntax objects</B> assigned to semantic values. -What a GF grammar defines is a relation -</P> -<PRE> - abstract syntax trees <---> concrete syntax objects -</PRE> -<P> -When modelling context-free grammar in GF, -the concrete syntax objects are just strings. -But they can be more structured objects as well - in general, they are -<B>records</B> of different kinds of objects. For example, -a demonstrative expression can be linearized into a record of two strings. -</P> -<PRE> - {s = "this place" ; - this place (coord 123 45) <---> p = "(123,45)" - } -</PRE> -<P> -The record -</P> -<PRE> - {s = "I want to go from this place to this place" ; - p = "(123,45) (98,10" - } -</PRE> -<P> -represents any combination of the sentence and the clicks, as long -as the clicks appear in this order. -</P> -<A NAME="toc5"></A> -<H3>Example multimodal grammar: abstract syntax</H3> -<P> -A simple example of a multimodal GF grammar is the one called -the Tram Demo grammar. It was written by Björn Bringert within -the TALK project as a part of a dialogue system that -deals with queries about tram timetables. The system interprets -a speech input in combination with mouse clicks on a digital map. -</P> -<P> -The abstract syntax of (a minimal fragment of) the Tram Demo -grammar is -</P> -<PRE> - cat - Input, Dep, Dest, Click ; - fun - GoFromTo : Dep -> Dest -> Input ; -- "I want to go from x to y" - DepHere : Click -> Dep ; -- "from here" with click - DestHere : Click -> Dest ; -- "to here" with click - - CCoord : Int -> Int -> Click ; -- click coordinates -</PRE> -<P> -An English concrete syntax of the grammar is -</P> -<PRE> - lincat - Input, Dep, Dest = {s : Str ; p : Str} ; - Click = {p : Str} ; - - lin - GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; p = x.p ++ y.p} ; - DepHere c = {s = ["from here"] ; p = c.p} ; - DestHere c = {s = ["to here"] ; p = c.p} ; - - CCoord x y = {p = "(" ++ x.s ++ "," ++ y.s ++ ")"} ; -</PRE> -<P> -When the grammar is used in the actual system, standard parsing methods -are used for interpreting the integrated speech and click input. -Parsing appears on two levels: the speech input parsing -performed by the Nuance speech recognition program (without the clicks), -and the semantics-yielding parser sending input to the dialogue manager. -The latter parser just attaches the clicks to the speech input. The order -of the clicks is preserved, and the parser can hence associate each of -the clicks with proper demonstratives. Here is the grammar used in the -two parsing phases. -</P> -<PRE> - cat - Query, -- whole content - Speech ; -- speech only - fun - QueryInput : Input -> Query ; -- the whole content shown - SpeechInput : Input -> Speech ; -- only the speech shown - - lincat - Query, Speech = {s : Str} ; - lin - QueryInput i = {s = i.s ++ ";" ++ i.p} ; - SpeechInput i = {s = i.s} ; -</PRE> -<P></P> -<A NAME="toc6"></A> -<H3>Digression: discontinuous constituents</H3> -<P> -The GF representation of integrated multimodality is -similar to the representation of <B>discontinous constituents</B>. -For instance, assume <I>has arrived</I> is a verb phrase in English, -which can be used both in declarative sentences and questions, -</P> - <BLOCKQUOTE> - she <I>has arrived</I> - </BLOCKQUOTE> -<P></P> - <BLOCKQUOTE> - <I>has</I> she <I>arrived</I> - </BLOCKQUOTE> -<P></P> -<P> -In the question, the two words are separated from each other. If -<I>has arrived</I> is a constituent of the question, it is thus discontinuous. -To represent such constituents in GF, records can be used: -we split verb phrases (<CODE>VP</CODE>) into a finite and infinitive part. -</P> -<PRE> - lincat VP = {fin, inf : Str} ; - - lin Indic np vp = {s = np.s ++ vp.fin ++ vp.inf} ; - lin Quest np vp = {s = vp.fin ++ np.s ++ vp.inf} ; -</PRE> -<P></P> -<A NAME="toc7"></A> -<H3>From grammars to dialogue systems</H3> -<P> -The general recipe for using GF when building dialogue systems -is to write a grammar with the following components: -</P> -<UL> -<LI>The abstract syntax defines the semantics (the "ontology") - of the domain of the system. -<LI>The concrete syntaxes define alternative modes of input and output. -</UL> - -<P> -The engineering advantages of this approach have to do partly with -the declarativity of the description, partly with the tools provided -by GF to derive different components of the system: -</P> -<UL> -<LI>The type checker guarantees that all the input and output - modes match with the ontology. -<LI>The grammar compiler generates parsers for each input grammar - and generators for each output grammar. -<LI>Translators between GF's abstract syntax and other ontology - description languages enable communication with different - kinds of dialogue managers and cover e.g. Prolog terms and XML objects. -<LI>Translators from GF's concrete syntax to speech recognition formats - make it possible to generate e.g. Nuance grammars and ATK language - models. -</UL> - -<P> -An example of this process is Björn Bringert's TramDemo. -More recently, grammars have been integrated to the GoDiS dialogue -manager by Prolog representations of abstract syntax. -</P> -<A NAME="toc8"></A> -<H2>Adding multimodality to a unimodal grammar</H2> -<P> -This section gives a recipe for making any unimodal grammar -multimodal, by adding pointing gestures to chosen expressions. The recipe -guarantees that the resulting grammar remains semantically well-formed, -i.e. type correct. -</P> -<A NAME="toc9"></A> -<H3>The multimodal conversion</H3> -<P> -The <B>multimodal conversion</B> of a grammar consists of seven -steps, of which the first is always the same, the second -involves a decision, and the rest are derivative: -</P> -<OL> -<LI>Add the category <CODE>`Point`</CODE> with a standard linearization type. -<PRE> - cat Point ; - lincat Point = {point : Str} ; -</PRE> -<LI>(Decision) Decide which constructors are demonstrative, i.e. take - a pointing gesture as an argument. Add a <CODE>Point`</CODE> as their last argument. - The new type signatures for such constructors <I>d</I> have the form -<PRE> - fun d : ... -> Point -> D -</PRE> -<LI>(Derivative) Add a <CODE>point</CODE> field to the linearization type <I>L</I> of any - demonstrative category <I>D</I>, i.e. a category that has at least one demonstrative - constructor: -<PRE> - lincat D = L ** {point : Str} ; -</PRE> -<LI>(Derivative) If some other category <I>C</I> has a constructor <I>d</I> that takes - demonstratives as arguments, make it demonstrative by adding a <I>point</I> field - to its linearization type. -<LI>(Derivative) Store the <CODE>point</CODE> field in the linearization <I>t</I> of any - constructor <I>d</I> that has been made demonstrative: -<PRE> - lin d x1 ... xn p = t x1 ... xn ** {point = p.point} ; -</PRE> -<LI>(Derivative) For each constructor <I>f</I> that takes demonstratives <I>D_1,...,D_n</I> - as arguments, collect the <I>point</I> fields of the arguments in the <I>point</I> - field of the value: -<PRE> - lin f x_1 ... x_m = - t x_1 ... x_m ** {point = x_d1.point ++ ... ++ x_dn.point} ; -</PRE> - Make sure that the pointings <CODE>x_d1.point ... x_dn.point</CODE> are concatenated - in the same order as the arguments appear in the <I>linearization</I> <I>t</I>, - which is not necessarily the same as the abstract argument order. -<LI>(Derivative) To preserve type correctness, add an empty - <CODE>point</CODE> field to the linearization <I>t</I> of any - constructor <I>c</I> of a demonstrative category: -<PRE> - lin c x1 ... xn = t x1 ... xn ** {point = []} ; -</PRE> -</OL> - -<A NAME="toc10"></A> -<H3>An example of the conversion</H3> -<P> -Start with a Tram Demo grammar with no demonstratives, but just -tram stop names and the indexical <I>here</I> (interpreted as e.g. the user's -standing place). -</P> -<PRE> - cat - Input, Dep, Dest, Name ; - fun - GoFromTo : Dep -> Dest -> Input ; - DepHere : Dep ; - DestHere : Dest ; - DepName : Name -> Dep ; - DestName : Name -> Dest ; - - Almedal : Name ; -</PRE> -<P> -A unimodal English concrete syntax of the grammar is -</P> -<PRE> - lincat - Input, Dep, Dest, Name = {s : Str} ; - - lin - GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} ; - DepHere = {s = ["from here"]} ; - DestHere = {s = ["to here"]} ; - DepName n = {s = ["from"] ++ n.s} ; - DestName n = {s = ["to"] ++ n.s} ; - - Almedal = {s = "Almedal"} ; -</PRE> -<P> -Let us follow the steps of the recipe. -</P> -<OL> -<LI>We add the category <CODE>Point</CODE> and its linearization type. -<LI>We decide that <CODE>DepHere</CODE> and <CODE>DestHere</CODE> involve a pointing gesture. -<LI>We add <CODE>point</CODE> to the linearization types of <CODE>Dep</CODE> and <CODE>Dest</CODE>. -<LI>Therefore, also add <CODE>point</CODE> to <CODE>Input</CODE>. (But <CODE>Name</CODE> remains unimodal.) -<LI>Add <CODE>p.point</CODE> to the linearizations of <CODE>DepHere</CODE> and <CODE>DestHere</CODE>. -<LI>Concatenate the points of the arguments of <CODE>GoFromTo</CODE>. -<LI>Add an empty <CODE>point</CODE> to <CODE>DepName</CODE> and <CODE>DestName</CODE>. -</OL> - -<P> -In the resulting grammar, one category is added and -two functions are changed in the abstract syntax (annotated by the step numbers): -</P> -<PRE> - cat - Point ; -- 1 - fun - DepHere : Point -> Dep ; -- 2 - DestHere : Point -> Dest ; -- 2 - -</PRE> -<P> -The concrete syntax in its entirety looks as follows -</P> -<PRE> - lincat - Dep, Dest = {s : Str ; point : Str} ; -- 3 - Input = {s : Str ; point : Str} ; -- 4 - Name = {s : Str} ; - Point = {point : Str} ; -- 1 - lin - GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; -- 6 - point = x.point ++ y.point - } ; - DepHere p = {s = ["from here"] ; -- 5 - point = p.point - } ; - DestHere p = {s = ["to here"] : -- 5 - point = p.point - } ; - DepName n = {s = ["from"] ++ n.s ; -- 7 - point = [] - } ; - DestName n = {s = ["to"] ++ n.s ; -- 7 - point = [] - } ; - Almedal = {s = "Almedal"} ; -</PRE> -<P> -What we need in addition, to use the grammar in applications, are -</P> -<OL> -<LI>Constructors for <CODE>Point</CODE>, e.g. coordinate pairs. -<LI>Top-level categories, like <CODE>Query</CODE> and <CODE>Speech</CODE> in the original. -</OL> - -<P> -But their proper place is probably in another grammar module, so that -the core Tram Demo grammar can be used in different systems e.g. -encoding clicks in different ways. -</P> -<A NAME="toc11"></A> -<H3>Multimodal conversion combinators</H3> -<P> -GF is a functional programming language, and we exploit this -by providing a set of combinators that makes the multimodal conversion easier -and clearer. We start with the type of sequences of pointing gestures. -</P> -<PRE> - Point : Type = {point : Str} ; -</PRE> -<P> -To make a record type multimodal is to extend it with <CODE>Point</CODE>. -The record extension operator <CODE>**</CODE> is needed here. -</P> -<PRE> - Dem : Type -> Type = \t -> t ** Point ; -</PRE> -<P> -To construct, use, and concatenate pointings: -</P> -<PRE> - mkPoint : Str -> Point = \s -> {point = s} ; - - noPoint : Point = mkPoint [] ; - - point : Point -> Str = \p -> p.point ; - - concatPoint : (x,y : Point) -> Point = \x,y -> - mkPoint (point x ++ point y) ; -</PRE> -<P> -Finally, to add pointing to a record, with the limiting case of no demonstrative needed. -</P> -<PRE> - mkDem : (t : Type) -> t -> Point -> Dem t = \_,x,s -> x ** s ; - - nonDem : (t : Type) -> t -> Dem t = \t,x -> mkDem t x noPoint ; -</PRE> -<P> -Let us rewrite the Tram Demo grammar by using these combinators: -</P> -<PRE> - oper - SS : Type = {s : Str} ; - lincat - Input, Dep, Dest = Dem SS ; - Name = SS ; - - lin - GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} ** - concatPoint x y ; - DepHere = mkDem SS {s = ["from here"]} ; - DestHere = mkDem SS {s = ["to here"]} ; - DepName n = nonDem SS {s = ["from"] ++ n.s} ; - DestName n = nonDem SS {s = ["to"] ++ n.s} ; - - Almedal = {s = "Almedal"} ; -</PRE> -<P> -The type synonym <CODE>SS</CODE> is introduced to make the combinator applications -concise. Notice the use of partial application in <CODE>DepHere</CODE> and -<CODE>DestHere</CODE>; an equivalent way to write is -</P> -<PRE> - DepHere p = mkDem SS {s = ["from here"]} p ; -</PRE> -<P></P> -<A NAME="toc12"></A> -<H2>Multimodal resource grammars</H2> -<P> -The main advantage of using GF when building dialogue systems is -that various components of the system -can be automatically generated from GF grammars. -Writing these grammars, however, can still be a considerable -task. A case in point are multilingual systems: -how to localize e.g. a system built in a car to -the languages of all those customers to whom the -car is sold? This problem has been the main focus of -GF for some years, and the solution on which most work has been -done is the development of <B>resource grammar libraries</B>. -These libraries work in the same way as program libraries -in software engineering, enabling a division of labour -between linguists and domain experts. -</P> -<P> -One of the goals in the resource grammars of different -languages has been to provide a <B>language-independent API</B>, -which makes the same resource grammar functions available for -different languages. For instance, the categories -<CODE>S</CODE>, <CODE>NP</CODE>, and <CODE>VP</CODE> are available in all of the -10 languages currently supported, and so is the function -</P> -<PRE> - PredVP : NP -> VP -> S -</PRE> -<P> -which corresponds to the rule <CODE>S -> NP VP</CODE> in phrase -structure grammar. However, there are several levels of abstraction -between the function <CODE>PredVP</CODE> and the phrase structure rule, -because the rule is implemented in so different ways in different -languages. In particular, discontinuous constituents are needed in -various degrees to make the rule work in different languages. -</P> -<P> -Now, dealing with discontinuous constituents is one of the demanding -aspects of multilingual grammar writing that the resource grammar -API is designed to hide. But the proposed treatment of integrated -multimodality is heavily dependent on similar things. What can we -do to make multimodal grammars easier to write (for different languages)? -There are two orthogonal answers: -</P> -<OL> -<LI>Use resource grammars to write a unimodal dialogue grammar and - then apply the multimodal - conversion to manually chosen parts. -<LI>Use <B>multimodal resource grammars</B> to derive multimodal - dialogue system grammars directly. -</OL> - -<P> -The multimodal resource grammar library has been obtained from -the unimodal one by applying the multimodal conversion manually. -In addition, the API has been simplified -by leaving out structures needed in written technical documents -(the original application area of GF) but not in spoken dialogue. -</P> -<P> -In the following subsections, we will show a part of the -multimodal resource grammar API, limited to a fragment that -is needed to get the main ideas and to reimplement the -Tram Demo grammar. The reimplementation shows one more advantage -of the resource grammar approach: dialogue systems can be -automatically instantiated to different languages. -</P> -<A NAME="toc13"></A> -<H3>Resource grammar API</H3> -<P> -The resource grammar API has three main kinds of entries: -</P> -<OL> -<LI>Language-independent linguistic structures (``linguistic ontology''), e.g. -<PRE> - PredVP : NP -> VP -> S ; -- "Mary helps him" -</PRE> -<LI>Language-specific syntax extensions, e.g. Swedish and German fronting -topicalization -<PRE> - TopicObj : NP -> VP -> S ; -- "honom hjälper Mary" -</PRE> -<LI>Language-specific lexical constructors, e.g. Germanic <I>Ablaut</I> patterns -<PRE> - irregV : (sing,sang,sung : Str) -> V ; -</PRE> -</OL> - -<P> -The first two kinds of entries are <CODE>cat</CODE> and <CODE>fun</CODE> definitions -in an abstract syntax. The multimodal, restricted API has -e.g. the following categories. Their names are obtained from -the corresponding unimodal categories by prefixing <CODE>M</CODE>. -</P> -<PRE> - MS ; -- multimodal sentence or question - MQS ; -- multimodal wh question - MImp ; -- multimodal imperative - MVP ; -- multimodal verb phrase - MNP ; -- multimodal (demonstrative) noun phrase - MAdv ; -- multimodal (demonstrative) adverbial - - Point ; -- pointing gesture -</PRE> -<P></P> -<A NAME="toc14"></A> -<H3>Multimodal API: functions for building demonstratives</H3> -<P> -Demonstrative pronouns can be used both as noun phrases and -as determiners. -</P> -<PRE> - this_MNP : Point -> MNP ; -- this - thisDet_MNP : CN -> Point -> MNP ; -- this car -</PRE> -<P> -There are also demonstrative adverbs, and prepositions give -a productive way to build more adverbs. -</P> -<PRE> - here_MAdv : Point -> MAdv ; -- here - here7from_MAdv : Point -> MAdv ; -- from here - - MPrepNP : Prep -> MNP -> MAdv ; -- in this car -</PRE> -<P></P> -<A NAME="toc15"></A> -<H3>Multimodal API: functions for building sentences and phrases</H3> -<P> -A handful of predication rules construct sentences, questions, and imperatives. -</P> -<PRE> - MPredVP : MNP -> MVP -> MS ; -- this plane flies here - MQPredVP : MNP -> MVP -> MQS ; -- does this plane fly here - MQuestVP : IP -> MVP -> MQS ; -- who flies here - MImpVP : MVP -> MImp ; -- fly here! -</PRE> -<P> -Verb phrases are constructed from verbs (inherited as such from -the unimodal API) by providing their complements. -</P> -<PRE> - MUseV : V -> MVP ; -- flies - MComplV2 : V2 -> MNP -> MVP ; -- takes this - MComplVV : VV -> MVP -> MVP ; -- wants to take this -</PRE> -<P> -A multimodal adverb can be attached to a verb phrase. -</P> -<PRE> - MAdvVP : MVP -> MAdv -> MVP ; -- flies here -</PRE> -<P></P> -<A NAME="toc16"></A> -<H3>Language-independent implementation: examples</H3> -<P> -The implementation makes heavy use of the multimodal conversion -combinators. It adds a <CODE>point</CODE> field to whatever the implementation of the unimodal -category is in any language. Thus, for example -</P> -<PRE> - lincat - MVP = Dem VP ; - MNP = Dem NP ; - MAdv = Dem Adv ; - - lin - this_MNP = mkDem NP this_NP ; - -- i.e. this_MNP p = this_NP ** {point = p.point} ; - - MComplV2 verb obj = mkDem VP (ComplV2 verb obj) obj ; - - MAdvVP vp adv = mkDem VP (AdvVP vp adv) (concatPoint vp adv) ; -</PRE> -<P></P> -<A NAME="toc17"></A> -<H3>Multimodal API: interface to unimodal expressions</H3> -<P> -Using nondemonstrative expressions as demonstratives: -</P> -<PRE> - DemNP : NP -> MNP ; - DemAdv : Adv -> MAdv ; -</PRE> -<P> -Building top-level phrases: -</P> -<PRE> - PhrMS : Pol -> MS -> Phr ; - PhrMS : Pol -> MS -> Phr ; - PhrMQS : Pol -> MQS -> Phr ; - PhrMImp : Pol -> MImp -> Phr ; -</PRE> -<P></P> -<A NAME="toc18"></A> -<H3>Instantiating multimodality to different languages</H3> -<P> -The implementation above has only used the resource grammar API, -not the concrete implementations. The library <CODE>Demonstrative</CODE> -is a <B>parametrized module</B>, also called a <B>functor</B>, which -has the following structure -</P> -<PRE> - incomplete concrete DemonstrativeI of Demonstrative = - Cat, TenseX ** open Test, Structural in { - - -- lincat and lin rules - - } -</PRE> -<P> -It can be <B>instantiated</B> to different languages as follows. -</P> -<PRE> - concrete DemonstrativeEng of Demonstrative = - CatEng, TenseX ** DemonstrativeI with - (Test = TestEng), - (Structural = StructuralEng) ; - - concrete DemonstrativeSwe of Demonstrative = - CatSwe, TenseX ** DemonstrativeI with - (Test = TestSwe), - (Structural = StructuralSwe) ; -</PRE> -<P></P> -<A NAME="toc19"></A> -<H3>Language-independent reimplementation of TramDemo</H3> -<P> -Again using the functor idea, we reimplement <CODE>TramDemo</CODE> -as follows: -</P> -<PRE> - incomplete concrete TramI of Tram = open Multimodal in { - - lincat - Query = Phr ; Input = MS ; - Dep, Dest = MAdv ; Click = Point ; - lin - QInput = PhrMS PPos ; - - GoFromTo x y = - MPredVP (DemNP (UsePron i_Pron)) - (MAdvVP (MAdvVP (MComplVV want_VV (MUseV go_V)) x) y) ; - - DepHere = here7from_MAdv ; - DestHere = here7to_MAdv ; - DepName s = MPrepNP from_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ; - DestName s = MPrepNP to_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ; - -</PRE> -<P> -Then we can instantiate this to all languages for which -the <CODE>Multimodal</CODE> API has been implemented: -</P> -<PRE> - concrete TramEng of Tram = TramI with - (Multimodal = MultimodalEng) ; - - concrete TramSwe of Tram = TramI with - (Multimodal = MultimodalSwe) ; - - concrete TramFre of Tram = TramI with - (Multimodal = MultimodalFre) ; -</PRE> -<P></P> -<A NAME="toc20"></A> -<H3>The order problem</H3> -<P> -It was pointed out in the section on the multimodal conversion that -the concrete word order may be different from the abstract one, -and vary between different languages. For instance, Swedish -topicalization -</P> - <BLOCKQUOTE> - Det här tåget vill den här kunden inte ta. - </BLOCKQUOTE> -<P></P> -<P> -(``this train, this customer doesn't want to take'') may well have -an abstract syntax of a form in which the customer appears -before the train. -</P> -<P> -This is a problem for the implementor of the resource grammar. -It means that some parts of the resource must be written manually -and not as a functor. -However, the <I>user</I> of the resource can safely -ignore the word order problem, if it is correctly dealt with in -the resource. -</P> -<A NAME="toc21"></A> -<H3>A recipe for using the resource library</H3> -<P> -When starting to develop resource grammars, we believed they -would be all that -an application grammarian needs to write a concrete syntax. -However, experience has shown that it can be tough to start -grammar development in this way: selecting functions from -a resource API requires more abstract thinking than just -writing strings, and its take longer to reach testable -results. The most light-weight format is -maybe to start with context-free grammars (which notation is -also supported by GF). Context-free grammars that -give acceptable even though over-generating -results for languages like English are quick to produce. -</P> -<P> -The experience has led to the following -steps for grammar development. While giving the work -a quick start, this recipe -increases abstraction at a later level, when it is time to -to localize the grammar to different languages. -If context-free notation is used, steps 1 and 2 can -be merged. -</P> -<OL> -<LI>Encode domain ontology in and abstract syntax, <CODE>Domain</CODE>. -<LI>Write a rough concrete syntax in English, <CODE>DomainRough</CODE>. - This can be oversimplified and overgenerating. -<LI>Reimplement by using the resource library, and build a functor <CODE>DomainI</CODE>. - This can helped by <B>example-based grammar writing</B>, where - the examples are generated from <CODE>DomainRough</CODE>. -<LI>Instantiate the functor <CODE>DomainI</CODE> to different languages, - and test the results by generating linearizations. -<LI>If some rule doesn't satisfy in some language, use the resource in - a different way for that case (<B>compile-time transfer</B>). -</OL> - - -<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) --> -<!-- cmdline: txt2tags -\-toc multimodal.txt --> -</BODY></HTML> diff --git a/doc/multimodal.txt b/doc/multimodal.txt deleted file mode 100644 index 8f41ab22e..000000000 --- a/doc/multimodal.txt +++ /dev/null @@ -1,728 +0,0 @@ -Demonstrative Expressions and Multimodal Grammars -Author: Aarne Ranta <aarne (at) cs.chalmers.se> -Last update: %%date(%c) - -% NOTE: this is a txt2tags file. -% Create an html file from this file using: -% txt2tags --toc multimodal.txt - -% Create a latex file from this file using: -% txt2tags -ttex multimodal.txt - -%!target:html - - -==Abstract== - -This document shows a method to write grammars -in which spoken utterances are accompanied by -pointing gestures. A computer application of such -grammars are **multimodal dialogue systems**, in -which the pointing gestures are performed by -mouse clicks and movements. - -After an introduction to the notions of -**demonstratives** and **integrated multimodality**, -we will show by a concrete example -how multimodal grammars can be written in GF -and how they can be used in dialogue systems. -The explanation is given in three stages: - -+ How to write a multimodal grammar by hand. -+ How to add multimodality to a unimodal grammar. -+ How to use a multimodal resource grammar. - - -==Multimodal grammars== - -**Demonstrative expressions** are an old idea. Such -expressions get their meaning from the context. - - //This train// is faster than //that airplane//. - - I want to go from //this place// to //this place//. - -In particular, as in these examples, the meaning -can be obtained from accompanying pointing gestures. - -Thus the meaning-bearing unit is neither the words nor the -gestures alone, but their combination. Demonstratives -thus provide an example of **integrated multimodality**, -as opposed to parallel multimodality. In parallel -multimodality, speech and other modes of communication -are just alternative ways to convey the same information. - - -===Representing demonstratives in semantics and grammar=== - -When formalizing the semantics of demonstratives, we can combine syntax with coordinates: - - I want to go from this place to this place - -is interpreted as something like -``` - want(I, go, this(place,(123,45)), this(place,(98,10))) -``` -Now, the same semantic value can be given in many ways, by performing -the clicks at different points of time in relation to the speech: - - I want to go from this place CLICK(123,45) to this place CLICK(98,10) - - I want to go from this place to this place CLICK(123,45) CLICK(98,10) - - CLICK(123,45) CLICK(98,10) I want to go from this place to this place - -How do we build the value compositionally in parsing? -Traditional parsing is sequential: its input is a string of tokens. -It works for demonstratives only if the pointing is adjacent to -the spoken expression. In the actual input, the demonstrative word -can be separated from the accompanying click by other words. The two -can also be simultaneous. - - -===Asynchronous syntax in GF=== - -What we need is a notion of **asynchronous parsing**, as opposed to -sequential parsing (where demonstrative words and clicks must be -adjacent). - -We can implement asynchronous parsin in GF by exploiting the generality -of **linearization types**. A linearization type is the type of -the **concrete syntax objects** assigned to semantic values. -What a GF grammar defines is a relation -``` - abstract syntax trees <---> concrete syntax objects -``` -When modelling context-free grammar in GF, -the concrete syntax objects are just strings. -But they can be more structured objects as well - in general, they are -**records** of different kinds of objects. For example, -a demonstrative expression can be linearized into a record of two strings. -``` - {s = "this place" ; - this place (coord 123 45) <---> p = "(123,45)" - } -``` -The record -``` - {s = "I want to go from this place to this place" ; - p = "(123,45) (98,10" - } -``` -represents any combination of the sentence and the clicks, as long -as the clicks appear in this order. - - -===Example multimodal grammar: abstract syntax=== - -A simple example of a multimodal GF grammar is the one called -the Tram Demo grammar. It was written by Björn Bringert within -the TALK project as a part of a dialogue system that -deals with queries about tram timetables. The system interprets -a speech input in combination with mouse clicks on a digital map. - -The abstract syntax of (a minimal fragment of) the Tram Demo -grammar is -``` -cat - Input, Dep, Dest, Click ; -fun - GoFromTo : Dep -> Dest -> Input ; -- "I want to go from x to y" - DepHere : Click -> Dep ; -- "from here" with click - DestHere : Click -> Dest ; -- "to here" with click - - CCoord : Int -> Int -> Click ; -- click coordinates -``` -An English concrete syntax of the grammar is -``` -lincat - Input, Dep, Dest = {s : Str ; p : Str} ; - Click = {p : Str} ; - -lin - GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; p = x.p ++ y.p} ; - DepHere c = {s = ["from here"] ; p = c.p} ; - DestHere c = {s = ["to here"] ; p = c.p} ; - - CCoord x y = {p = "(" ++ x.s ++ "," ++ y.s ++ ")"} ; -``` -When the grammar is used in the actual system, standard parsing methods -are used for interpreting the integrated speech and click input. -Parsing appears on two levels: the speech input parsing -performed by the Nuance speech recognition program (without the clicks), -and the semantics-yielding parser sending input to the dialogue manager. -The latter parser just attaches the clicks to the speech input. The order -of the clicks is preserved, and the parser can hence associate each of -the clicks with proper demonstratives. Here is the grammar used in the -two parsing phases. -``` -cat - Query, -- whole content - Speech ; -- speech only -fun - QueryInput : Input -> Query ; -- the whole content shown - SpeechInput : Input -> Speech ; -- only the speech shown - -lincat - Query, Speech = {s : Str} ; -lin - QueryInput i = {s = i.s ++ ";" ++ i.p} ; - SpeechInput i = {s = i.s} ; -``` - - -===Digression: discontinuous constituents=== - -The GF representation of integrated multimodality is -similar to the representation of **discontinous constituents**. -For instance, assume //has arrived// is a verb phrase in English, -which can be used both in declarative sentences and questions, - - she //has arrived// - - //has// she //arrived// - -In the question, the two words are separated from each other. If -//has arrived// is a constituent of the question, it is thus discontinuous. -To represent such constituents in GF, records can be used: -we split verb phrases (``VP``) into a finite and infinitive part. -``` - lincat VP = {fin, inf : Str} ; - - lin Indic np vp = {s = np.s ++ vp.fin ++ vp.inf} ; - lin Quest np vp = {s = vp.fin ++ np.s ++ vp.inf} ; -``` - -===From grammars to dialogue systems=== - -The general recipe for using GF when building dialogue systems -is to write a grammar with the following components: - -- The abstract syntax defines the semantics (the "ontology") - of the domain of the system. -- The concrete syntaxes define alternative modes of input and output. - - -The engineering advantages of this approach have to do partly with -the declarativity of the description, partly with the tools provided -by GF to derive different components of the system: - -- The type checker guarantees that all the input and output - modes match with the ontology. -- The grammar compiler generates parsers for each input grammar - and generators for each output grammar. -- Translators between GF's abstract syntax and other ontology - description languages enable communication with different - kinds of dialogue managers and cover e.g. Prolog terms and XML objects. -- Translators from GF's concrete syntax to speech recognition formats - make it possible to generate e.g. Nuance grammars and ATK language - models. - - -An example of this process is Björn Bringert's TramDemo. -More recently, grammars have been integrated to the GoDiS dialogue -manager by Prolog representations of abstract syntax. - - -==Adding multimodality to a unimodal grammar== - -This section gives a recipe for making any unimodal grammar -multimodal, by adding pointing gestures to chosen expressions. The recipe -guarantees that the resulting grammar remains semantically well-formed, -i.e. type correct. - - -===The multimodal conversion=== - -The **multimodal conversion** of a grammar consists of seven -steps, of which the first is always the same, the second -involves a decision, and the rest are derivative: - -+ Add the category ```Point``` with a standard linearization type. -``` - cat Point ; - lincat Point = {point : Str} ; -``` -+ (Decision) Decide which constructors are demonstrative, i.e. take - a pointing gesture as an argument. Add a ``Point``` as their last argument. - The new type signatures for such constructors //d// have the form -``` - fun d : ... -> Point -> D -``` -+ (Derivative) Add a ``point`` field to the linearization type //L// of any - demonstrative category //D//, i.e. a category that has at least one demonstrative - constructor: -``` - lincat D = L ** {point : Str} ; -``` -+ (Derivative) If some other category //C// has a constructor //d// that takes - demonstratives as arguments, make it demonstrative by adding a //point// field - to its linearization type. -+ (Derivative) Store the ``point`` field in the linearization //t// of any - constructor //d// that has been made demonstrative: -``` - lin d x1 ... xn p = t x1 ... xn ** {point = p.point} ; -``` -+ (Derivative) For each constructor //f// that takes demonstratives //D_1,...,D_n// - as arguments, collect the //point// fields of the arguments in the //point// - field of the value: -``` - lin f x_1 ... x_m = - t x_1 ... x_m ** {point = x_d1.point ++ ... ++ x_dn.point} ; -``` - Make sure that the pointings ``x_d1.point ... x_dn.point`` are concatenated - in the same order as the arguments appear in the //linearization// //t//, - which is not necessarily the same as the abstract argument order. -+ (Derivative) To preserve type correctness, add an empty - ``point`` field to the linearization //t// of any - constructor //c// of a demonstrative category: -``` - lin c x1 ... xn = t x1 ... xn ** {point = []} ; -``` - - -===An example of the conversion=== - -Start with a Tram Demo grammar with no demonstratives, but just -tram stop names and the indexical //here// (interpreted as e.g. the user's -standing place). -``` -cat - Input, Dep, Dest, Name ; -fun - GoFromTo : Dep -> Dest -> Input ; - DepHere : Dep ; - DestHere : Dest ; - DepName : Name -> Dep ; - DestName : Name -> Dest ; - - Almedal : Name ; -``` -A unimodal English concrete syntax of the grammar is -``` -lincat - Input, Dep, Dest, Name = {s : Str} ; - -lin - GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} ; - DepHere = {s = ["from here"]} ; - DestHere = {s = ["to here"]} ; - DepName n = {s = ["from"] ++ n.s} ; - DestName n = {s = ["to"] ++ n.s} ; - - Almedal = {s = "Almedal"} ; -``` -Let us follow the steps of the recipe. - -+ We add the category ``Point`` and its linearization type. -+ We decide that ``DepHere`` and ``DestHere`` involve a pointing gesture. -+ We add ``point`` to the linearization types of ``Dep`` and ``Dest``. -+ Therefore, also add ``point`` to ``Input``. (But ``Name`` remains unimodal.) -+ Add ``p.point`` to the linearizations of ``DepHere`` and ``DestHere``. -+ Concatenate the points of the arguments of ``GoFromTo``. -+ Add an empty ``point`` to ``DepName`` and ``DestName``. - - -In the resulting grammar, one category is added and -two functions are changed in the abstract syntax (annotated by the step numbers): -``` -cat - Point ; -- 1 -fun - DepHere : Point -> Dep ; -- 2 - DestHere : Point -> Dest ; -- 2 - -``` -The concrete syntax in its entirety looks as follows -``` -lincat - Dep, Dest = {s : Str ; point : Str} ; -- 3 - Input = {s : Str ; point : Str} ; -- 4 - Name = {s : Str} ; - Point = {point : Str} ; -- 1 -lin - GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; -- 6 - point = x.point ++ y.point - } ; - DepHere p = {s = ["from here"] ; -- 5 - point = p.point - } ; - DestHere p = {s = ["to here"] : -- 5 - point = p.point - } ; - DepName n = {s = ["from"] ++ n.s ; -- 7 - point = [] - } ; - DestName n = {s = ["to"] ++ n.s ; -- 7 - point = [] - } ; - Almedal = {s = "Almedal"} ; -``` -What we need in addition, to use the grammar in applications, are - -+ Constructors for ``Point``, e.g. coordinate pairs. -+ Top-level categories, like ``Query`` and ``Speech`` in the original. - - -But their proper place is probably in another grammar module, so that -the core Tram Demo grammar can be used in different systems e.g. -encoding clicks in different ways. - - -===Multimodal conversion combinators=== - -GF is a functional programming language, and we exploit this -by providing a set of combinators that makes the multimodal conversion easier -and clearer. We start with the type of sequences of pointing gestures. -``` - Point : Type = {point : Str} ; -``` -To make a record type multimodal is to extend it with ``Point``. -The record extension operator ``**`` is needed here. -``` - Dem : Type -> Type = \t -> t ** Point ; -``` -To construct, use, and concatenate pointings: -``` - mkPoint : Str -> Point = \s -> {point = s} ; - - noPoint : Point = mkPoint [] ; - - point : Point -> Str = \p -> p.point ; - - concatPoint : (x,y : Point) -> Point = \x,y -> - mkPoint (point x ++ point y) ; -``` -Finally, to add pointing to a record, with the limiting case of no demonstrative needed. -``` - mkDem : (t : Type) -> t -> Point -> Dem t = \_,x,s -> x ** s ; - - nonDem : (t : Type) -> t -> Dem t = \t,x -> mkDem t x noPoint ; -``` -Let us rewrite the Tram Demo grammar by using these combinators: -``` -oper - SS : Type = {s : Str} ; -lincat - Input, Dep, Dest = Dem SS ; - Name = SS ; - -lin - GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} ** - concatPoint x y ; - DepHere = mkDem SS {s = ["from here"]} ; - DestHere = mkDem SS {s = ["to here"]} ; - DepName n = nonDem SS {s = ["from"] ++ n.s} ; - DestName n = nonDem SS {s = ["to"] ++ n.s} ; - - Almedal = {s = "Almedal"} ; -``` -The type synonym ``SS`` is introduced to make the combinator applications -concise. Notice the use of partial application in ``DepHere`` and -``DestHere``; an equivalent way to write is -``` - DepHere p = mkDem SS {s = ["from here"]} p ; -``` - - -==Multimodal resource grammars== - -The main advantage of using GF when building dialogue systems is -that various components of the system -can be automatically generated from GF grammars. -Writing these grammars, however, can still be a considerable -task. A case in point are multilingual systems: -how to localize e.g. a system built in a car to -the languages of all those customers to whom the -car is sold? This problem has been the main focus of -GF for some years, and the solution on which most work has been -done is the development of **resource grammar libraries**. -These libraries work in the same way as program libraries -in software engineering, enabling a division of labour -between linguists and domain experts. - -One of the goals in the resource grammars of different -languages has been to provide a **language-independent API**, -which makes the same resource grammar functions available for -different languages. For instance, the categories -``S``, ``NP``, and ``VP`` are available in all of the -10 languages currently supported, and so is the function -``` - PredVP : NP -> VP -> S -``` -which corresponds to the rule ``S -> NP VP`` in phrase -structure grammar. However, there are several levels of abstraction -between the function ``PredVP`` and the phrase structure rule, -because the rule is implemented in so different ways in different -languages. In particular, discontinuous constituents are needed in -various degrees to make the rule work in different languages. - -Now, dealing with discontinuous constituents is one of the demanding -aspects of multilingual grammar writing that the resource grammar -API is designed to hide. But the proposed treatment of integrated -multimodality is heavily dependent on similar things. What can we -do to make multimodal grammars easier to write (for different languages)? -There are two orthogonal answers: - -+ Use resource grammars to write a unimodal dialogue grammar and - then apply the multimodal - conversion to manually chosen parts. -+ Use **multimodal resource grammars** to derive multimodal - dialogue system grammars directly. - - -The multimodal resource grammar library has been obtained from -the unimodal one by applying the multimodal conversion manually. -In addition, the API has been simplified -by leaving out structures needed in written technical documents -(the original application area of GF) but not in spoken dialogue. - -In the following subsections, we will show a part of the -multimodal resource grammar API, limited to a fragment that -is needed to get the main ideas and to reimplement the -Tram Demo grammar. The reimplementation shows one more advantage -of the resource grammar approach: dialogue systems can be -automatically instantiated to different languages. - - - - -===Resource grammar API=== - -The resource grammar API has three main kinds of entries: - -+ Language-independent linguistic structures (``linguistic ontology''), e.g. -``` - PredVP : NP -> VP -> S ; -- "Mary helps him" -``` -+ Language-specific syntax extensions, e.g. Swedish and German fronting -topicalization -``` - TopicObj : NP -> VP -> S ; -- "honom hjälper Mary" -``` -+ Language-specific lexical constructors, e.g. Germanic //Ablaut// patterns -``` - irregV : (sing,sang,sung : Str) -> V ; -``` - - -The first two kinds of entries are ``cat`` and ``fun`` definitions -in an abstract syntax. The multimodal, restricted API has -e.g. the following categories. Their names are obtained from -the corresponding unimodal categories by prefixing ``M``. -``` - MS ; -- multimodal sentence or question - MQS ; -- multimodal wh question - MImp ; -- multimodal imperative - MVP ; -- multimodal verb phrase - MNP ; -- multimodal (demonstrative) noun phrase - MAdv ; -- multimodal (demonstrative) adverbial - - Point ; -- pointing gesture -``` - - - -===Multimodal API: functions for building demonstratives=== - -Demonstrative pronouns can be used both as noun phrases and -as determiners. -``` - this_MNP : Point -> MNP ; -- this - thisDet_MNP : CN -> Point -> MNP ; -- this car -``` -There are also demonstrative adverbs, and prepositions give -a productive way to build more adverbs. -``` - here_MAdv : Point -> MAdv ; -- here - here7from_MAdv : Point -> MAdv ; -- from here - - MPrepNP : Prep -> MNP -> MAdv ; -- in this car -``` - - -===Multimodal API: functions for building sentences and phrases=== - -A handful of predication rules construct sentences, questions, and imperatives. -``` - MPredVP : MNP -> MVP -> MS ; -- this plane flies here - MQPredVP : MNP -> MVP -> MQS ; -- does this plane fly here - MQuestVP : IP -> MVP -> MQS ; -- who flies here - MImpVP : MVP -> MImp ; -- fly here! -``` -Verb phrases are constructed from verbs (inherited as such from -the unimodal API) by providing their complements. -``` - MUseV : V -> MVP ; -- flies - MComplV2 : V2 -> MNP -> MVP ; -- takes this - MComplVV : VV -> MVP -> MVP ; -- wants to take this -``` -A multimodal adverb can be attached to a verb phrase. -``` - MAdvVP : MVP -> MAdv -> MVP ; -- flies here -``` - - - - -===Language-independent implementation: examples=== - -The implementation makes heavy use of the multimodal conversion -combinators. It adds a ``point`` field to whatever the implementation of the unimodal -category is in any language. Thus, for example -``` - lincat - MVP = Dem VP ; - MNP = Dem NP ; - MAdv = Dem Adv ; - - lin - this_MNP = mkDem NP this_NP ; - -- i.e. this_MNP p = this_NP ** {point = p.point} ; - - MComplV2 verb obj = mkDem VP (ComplV2 verb obj) obj ; - - MAdvVP vp adv = mkDem VP (AdvVP vp adv) (concatPoint vp adv) ; -``` - - - -===Multimodal API: interface to unimodal expressions=== - -Using nondemonstrative expressions as demonstratives: -``` - DemNP : NP -> MNP ; - DemAdv : Adv -> MAdv ; -``` -Building top-level phrases: -``` - PhrMS : Pol -> MS -> Phr ; - PhrMS : Pol -> MS -> Phr ; - PhrMQS : Pol -> MQS -> Phr ; - PhrMImp : Pol -> MImp -> Phr ; -``` - - -===Instantiating multimodality to different languages=== - -The implementation above has only used the resource grammar API, -not the concrete implementations. The library ``Demonstrative`` -is a **parametrized module**, also called a **functor**, which -has the following structure -``` - incomplete concrete DemonstrativeI of Demonstrative = - Cat, TenseX ** open Test, Structural in { - - -- lincat and lin rules - - } -``` -It can be **instantiated** to different languages as follows. -``` - concrete DemonstrativeEng of Demonstrative = - CatEng, TenseX ** DemonstrativeI with - (Test = TestEng), - (Structural = StructuralEng) ; - - concrete DemonstrativeSwe of Demonstrative = - CatSwe, TenseX ** DemonstrativeI with - (Test = TestSwe), - (Structural = StructuralSwe) ; -``` - - - -===Language-independent reimplementation of TramDemo=== - -Again using the functor idea, we reimplement ``TramDemo`` -as follows: -``` -incomplete concrete TramI of Tram = open Multimodal in { - -lincat - Query = Phr ; Input = MS ; - Dep, Dest = MAdv ; Click = Point ; -lin - QInput = PhrMS PPos ; - - GoFromTo x y = - MPredVP (DemNP (UsePron i_Pron)) - (MAdvVP (MAdvVP (MComplVV want_VV (MUseV go_V)) x) y) ; - - DepHere = here7from_MAdv ; - DestHere = here7to_MAdv ; - DepName s = MPrepNP from_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ; - DestName s = MPrepNP to_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ; - -``` -Then we can instantiate this to all languages for which -the ``Multimodal`` API has been implemented: -``` - concrete TramEng of Tram = TramI with - (Multimodal = MultimodalEng) ; - - concrete TramSwe of Tram = TramI with - (Multimodal = MultimodalSwe) ; - - concrete TramFre of Tram = TramI with - (Multimodal = MultimodalFre) ; -``` - - - -===The order problem=== - -It was pointed out in the section on the multimodal conversion that -the concrete word order may be different from the abstract one, -and vary between different languages. For instance, Swedish -topicalization - - Det här tåget vill den här kunden inte ta. - -(``this train, this customer doesn't want to take'') may well have -an abstract syntax of a form in which the customer appears -before the train. - -This is a problem for the implementor of the resource grammar. -It means that some parts of the resource must be written manually -and not as a functor. -However, the //user// of the resource can safely -ignore the word order problem, if it is correctly dealt with in -the resource. - - -===A recipe for using the resource library=== - -When starting to develop resource grammars, we believed they -would be all that -an application grammarian needs to write a concrete syntax. -However, experience has shown that it can be tough to start -grammar development in this way: selecting functions from -a resource API requires more abstract thinking than just -writing strings, and its take longer to reach testable -results. The most light-weight format is -maybe to start with context-free grammars (which notation is -also supported by GF). Context-free grammars that -give acceptable even though over-generating -results for languages like English are quick to produce. - -The experience has led to the following -steps for grammar development. While giving the work -a quick start, this recipe -increases abstraction at a later level, when it is time to -to localize the grammar to different languages. -If context-free notation is used, steps 1 and 2 can -be merged. - -+ Encode domain ontology in and abstract syntax, ``Domain``. -+ Write a rough concrete syntax in English, ``DomainRough``. - This can be oversimplified and overgenerating. -+ Reimplement by using the resource library, and build a functor ``DomainI``. - This can helped by **example-based grammar writing**, where - the examples are generated from ``DomainRough``. -+ Instantiate the functor ``DomainI`` to different languages, - and test the results by generating linearizations. -+ If some rule doesn't satisfy in some language, use the resource in - a different way for that case (**compile-time transfer**). - - |
